DOP 90: Event Driven Continuous Delivery With Keptn

Transcript

Andi: [00:00:00]
I think for how many years and decades have we built traditional functional based or monolithic applications and you cannot expect that we can all switch over our minds from one day to the other and all of a sudden become the perfect architects of event-driven systems.

Darin:
This is DevOps Paradox episode number 90. Event Driven Continuous Delivery With Keptn

Darin:
Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darin and Viktor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can, and mix it with random buzzwords like Kubernetes, serverless, CI/CD, team productivity, islands of happiness, and other fancy expressions that make it sound like we know what we're doing. Occasionally, we invite guests who do know something, but we do not do that often, since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. PS: it's Darin reading this text and feeling embarrassed that Viktor made me do it. Here are your hosts, Darin Pope and Viktor Farcic.

Darin: [00:01:18]
Viktor, I've been hearing rumblings of this thing called event-driven delivery. Is this yet just another retelling of the story of continuous delivery?

Viktor: [00:01:31]
I mean, it's event-driven everything, I think where we're moving generally speaking. Now at different speeds, depending on the segment of the industry, especially with the increase in the usage of Kubernetes, everything is becoming event-driven something. That's almost given. Probably the bigger question is how much users notice and see that things are event-driven. Many people when they use Kubernetes they have no idea that actually they're sending events. People execute kubectl apply and they think that that's exactly the same from that perspective as if they did SSH and then executed some imperative command.

Darin: [00:02:14]
But it's not.

Viktor: [00:02:16]
It's absolutely not. I often need to explain to people and I'm now going into a completely different subject that when you do kubectl apply or helm install and you get the response code from that CLI, that does not mean that anything is running, that anything already happened. Many things might have happened or nothing might have happened by the time that you get exit code, because people ask me, yeah, so my application is up and running now that the command is executed. I don't know. Maybe. Maybe not. All I know is that you got the response from Kubernetes API that it accepted an event and what will happen, when will it happen, and how will you know that something happened, that's a completely different story.

Darin: [00:03:06]
An analog in this world is when you ship a package. You drop the package at the post, you get a shipping code, but that box hasn't arrived yet, but you have a shipping code.

Viktor: [00:03:20]
Exactly. That shipping code is basic acknowledgement that the person on the other end received your desire, let's say, that something should be shipped somewhere.

Darin: [00:03:32]
Now we're getting a little deep in this. I think we need some help

Viktor: [00:03:37]
Oh yeah.

Darin: [00:03:38]
just thankfully we have somebody to help us today. We have Andreas with us today from Dynatrace. Andreas, thanks for joining us.

Andi: [00:03:45]
Well, thanks for having me and thanks for inviting me to this deep dive conversation or whatever you want to call it.

Viktor: [00:03:53]
didn't start deep yet.

Andi: [00:03:55]
ha

Darin: [00:03:57]
Andreas, why don't you go ahead and introduce yourself a little bit more and explain what your life looks like on a day-to-day basis.

Andi: [00:04:03]
All right. My name is Andi Grabner, Andreas, but please call me Andi, and I've been working in the performance engineering field for the last 20 plus years. So my background is performance engineering. I started out as a performance tester on a performance testing product. My first company. Stayed there for eight years. So we did a lot of performance engineering and then 12 and a half years ago, I switched over to my current employer, Dynatrace, and there, performance is obviously a big topic. We are in the monitoring observability space and my roles have always been kind of evangelist role, meaning I try to not only tell people that Dynatrace can monitor metrics and put on a nice dashboard, but what else can we do with this data when we strategically integrate it in your delivery pipeline? And for the past two years, I've been also a devrel, a developer relationships guy for the open source project keptn, which I think is one of the reasons why I'm here today, because keptn is also claiming that we are bringing an event-driven approach to continuous delivery. Now you ask, how does my life look like? Well, pre Corona or pre COVID 19, I was traveling about 70 to 80% of my time and my only travel right now is between I'm sitting in my kitchen here at a counter and I've been traveling between here to my fridge and sometimes I take a walk, but I do a lot of things now virtually what I used to do physically in physical events.

Darin: [00:05:35]
Sounds vaguely familiar.

Andi: [00:05:37]
No, no, it's just, we all have to come up with with new ways, with digital ways, to get our word out and connect with a global community. So there's pros and cons to the whole situation, at least that's the way I see it. The pros is we can still connect with people even with more than before. I can do five events in a day in five different continents if I'm up for it. On the other side, though, I think we all miss a little bit, the physical interactions, the socializing because that's still, even though technology is amazingly great now with all the video conferencing, I think there's still a little bit missing of the human touch.

Viktor: [00:06:14]
What would be your definition of events? Did the beginning explain it well, or you see it somehow in a different light?

Andi: [00:06:21]
No. I agree with you, but I think you missed one aspect of it. You made it seem like whether it is event-driven or not event-driven for the end user, it may not matter, but I think it should matter because it's not just that we're replacing the way individual components communicate with each other, whether it's synchronous or asynchronous through events. I think when we talk about event-driven continuous delivery, we do not just want to replace the communication or the architecture, but we want to change it because we also want to get a benefit out of the whole thing because otherwise I think event-driven architectures are most often more complex than non-event-driven systems, especially as you explained it, you saying that kubectl is sending an event and you have no clue if it actually finished or not, so you need to build additional mechanisms in to validate, to track an event status, so you have to track the status of a transaction, of a workflow and if something fails, troubleshooting becomes extremely complex because you need to go off to all these different systems that potentially are participating in that workflow. I have to come back to your analogy with shipping a package. There are so many people involved or so many individual carriers at all, different ways of transportation, different organizations, where maybe something goes wrong. So just switching to event-driven, because event-driven is the cool new thing doesn't make sense for me. The reason why we're switching from why we want to promote an event-driven approach to continuous delivery is because what we really want to promote is a separation between concerns. We want to separate the process definition of continuous delivery with the actual actions that are executed. In the let's say first generations of continuous delivery, maybe we look at a classical Jenkins pipeline, if a Jenkinsfile and what you have described is there is a process and in that process definition, you basically say exactly which tools should then execute actions in a certain stage. What happens if it fails? What type of target environment are you deploying into? So it was very, it was all very kind of monolithic. It was a monolithic end to end process definition and you called the tools and these tools were hard-coded in that process. I think what we are promoting is we want to rip these two things apart. We want to say just as we are breaking monolithic applications into smaller services for different reasons, in the end, what we want to do is with our business applications, we want to execute business processes. So now we need to have a component that manages that process and then you have the individual services that can do things along it. I think for the continuous delivery, we want to promote the same thing. We want to have an easy way to define delivery processes and then an entity that orchestrates it. Then we have different capabilities that can then act upon an action when they are asked to do something. So for instance, I need to deliver a container into an environment. I could be a helm ability. I could be Argo. I could be Jenkins. I could be Codefresh. I could be anything. I can easily replace this. But the thing is, what we want to promote is a decoupling of process and tooling and we believe in order to obviously then let them talk to each other, we need an event mechanism because the event mechanism allows us to really connect two things to the now decoupled and you can easily change process definition, and you can change the tooling without having to think about, Oh, where is my tool currently referenced? So if I change my tooling from tool A to tool B, which pipelines do need to change, or if you want to change the process, you don't need to think about, Oh, what does this mean for the tooling? I think that with the decoupling, we solve this problem. We solve the problem of not having hard-coded pipelines that are hard to maintain, and we get a lot of permutations, at least that's what we've seen internally. We have started when we moved from our monolithic architecture to more microservice architecture. We started obviously with Jenkins because that's what we knew and then we ended up having hundreds of copies of an initial Jenkins pipeline that worked well for one microservice and then we copied it over and they were slightly adapted for each service and so we ended up with many different permutations that were very hard to maintain. So we wanted to provide a new approach and we want to think about a new approach of continuous delivery.

Viktor: [00:10:37]
When I speak with people on both subjects, there is that mental issue to comprehend how all that works and I'm now talking not necessarily even about delivery, but in general. I have a feeling that creating any flow of any type based on events is very demanding for people to understand what's going on, when it's going on and things like that, as if somehow we are wired to imperative way of thinking, again, generally speaking, not necessarily for delivery.

Andi: [00:11:13]
No, I disagree here because if you, I'm not sure which tools you're working with everyday in your business life, but I have Slack and that's all event-driven and it's channel based and it's conversation-based. So I start conversation and expect that conversation to go somewhere with a final output. I have everything in a Slack conversation. Let's say I get notifications when something changes. So that's one. Email, same thing. I start an email with the hope that I achieve something in the end and then that email thread may go on for an hour and then everything is good, but it may go on for weeks and months and over the course of the email chain, more and more people get added to it because they need it. I think it's all event-driven. I think we, as humans, we are used to event-driven systems that are asynchronous in nature.

Viktor: [00:11:55]
I would agree with that from user perspective. When you use something, that's kind of natural. But when you see people designing their systems, it's enough to look at those systems that are claiming that they're microservices and then it ends up that you need to deploy 55 of them all together because they're really not. For some reason, I don't see people still comprehending events when designing system, not when using. Like all your examples of fully valued. I really agree and I'm in no way saying that we shouldn't be designing based on events. The clear answer is yes. But still when I look at the systems and how people work, it's yes, I like to use event based mechanisms, but when it comes to me designing them, then, Oh.

Andi: [00:12:47]
I agree with you. I think for how many years and decades have we built traditional functional based or monolithic applications and you cannot expect that we can all switch over our minds from one day to the other and all of a sudden become the perfect architects of event-driven systems. Maybe the next generation that comes out of university or colleges, and they all learn just event-driven software engineering and architectures from basically, I don't know, from the beginning of their education, maybe they get better in it. I agree with you that we're not there yet. That's the majority of architects build and architect, perfect event-driven systems. But it also means that there's a big chance for the open source community or also commercial vendors to say, Hey, we see a need to support you in building your next generation event-driven system and part of that building and deployment process are tools that actually help you to ship your software into production and keep it running there in a healthy way. That's why I think if the whole industry is moving to event-driven architectures, then the tools that deliver and operate probably should also figure out other benefits of event-driven architecture also for delivery and operations and this is what we've been trying to promote over the last year and a half, two years.

Viktor: [00:14:14]
So when we talk about event-driven something, we always need some kind of a message bus and I'm using archaic term for that. What are those today? The first things that come to my mind would be Kubernetes API and Git, maybe?

Andi: [00:14:31]
I think everyone has their different implementations. I can just tell you from my own personal experience with what we have built with our keptn project. We are using NATS as a pub sub system. Every time you trigger a process, just with your analogy with the package system, or when you deliver a package, you get the transaction ID, we call it the keptn context. All events are automatically then also stored in our database. We use a Mongo database where we store every single message along that workflow. They're all obviously connected through the keptn context. Then the individual systems that can participate in the workflow, they basically just subscribe to new events that are coming in and keptn orchestrates everything. So if you send an event to keptn, keptn takes the events, first of all, persists it and then, and forces logic and says, okay, we now need to start this process. It looks up the process definition, and then it says, the first thing I need to do is send out an event, let's say for deploying that artifact because somebody triggered the process for deploying a new artifact. So then keptn sends out the events and what that actually means the tools that have subscribed to a deployment, they will then receive the event and this might be multiple tools. Helm, Jenkins. We have integrations with Helm, with Jenkins, with Argo, and then every tool can then say, is this an environment where I am allowed or configured to be able to deploy. And if so, they send back an event and saying I'm taking the job because obviously the orchestrator also needs to know who is getting the job, who gets the ticket in case you have multiple tools that can potentially do the same thing. I think your analogy that you brought up in the beginning with packaging is perfect because if you ship a package like I'm in Austria and I ship it to Viktor in Barcelona, I'm sure there's different ways, there's different carriers and there's different means of transportation and somebody will say, how do we get it from Linz to Barcelona? Well, it could be by train. It could be by plane. It could be by truck. But for me as an end user, I don't care. I'm just saying here's the package and you deliver and along the way, maybe there's a bidding process of who is the cheapest and the fastest option and then this service then takes over and it's the same with delivery and I can say, Hey, I need somebody to deliver this container in a particular environment and let's say with the additional metadata that it should be deployed blue-green and then every tool that register to that event can say, ah, I cannot blue-green I'm sorry, maybe next time. Oh, I can do blue-green but this is a production event, but I'm not certified in the organization. So then this is why I think this event-driven model is so great because you can add multiple workers or you can have multiple individual tools that can provide a certain capability and then as a process executes they then jump in and say, yes, I can do it and I can do it for that cost, or I can do it that fast and then there might even be a bidding process in the future, even though we haven't done this yet, but this would even be cooler, right? You can then really pick the best tool that has the resources available right now and it can do something in that target environment. That's all event-driven.

Viktor: [00:17:39]
So in a way, if I understood from the end user perspective, not the person who configures and what so not, you could theoretically say, Hey, here's my application. I want it deployed. You choose whether it should be blue-green with X or Y or Z or maybe it shouldn't be maybe actually within this context, it should be canary. I'm inventing now use case. If I understood it right, it makes it almost dumbed down for the end user, just like shipping. You don't know whether it's going by train or plane. You don't really care. You care that it just arrives fast.

Andi: [00:18:14]
Exactly. Yeah, because I think in an organization where somebody defines the general delivery process. You say you have dev, staging and prod and in dev this type of test should happen. In staging, these type of tests should happen and in production, this type of deployment should happen. So somebody needs to define the process, but for the end user, as a developer, I say, here's my new container. You do your stuff. The goal is production. You know what to do in each stage. And for every stage, maybe you need some unit tests, some functional tests, some performance tests. They then get triggered accordingly and the individual supporting configuration elements, like a JMeter test file or any other helm charts. Also, they will be stored in Git. That's at least what we do. That means as the process executes, not only the events get stored in a database, but there's a Git repository that is aligned with the process and this is where then individual tools can pick out, like the test files that will be executed.

Viktor: [00:19:15]
The even ts mechanism you mentioned earlier, what was the name you mentioned?

Andi: [00:19:20]
The protocol that we use is CloudEvents. So this is the CNCF standard for CloudEvents, but the library that we use internally for the actual event subscription model, or like where you can send events and subscribe, this is NATS. N A T S. It's another CNCF open source project. We initially, when we started with keptn, we looked at Knative. So we started with Knative first, but for us it was just, I don't want to use the word overkill, but it was an overkill for what we tried to achieve. So Knative could do too many things for us and therefore it was also too resource heavy and so we went with something after awhile that was basically better fitting our requirements.

Darin: [00:19:58]
You also said you had Mongo in there. Why are you mixing Mongo and Git? Why two systems there?

Andi: [00:20:05]
So Mongo is used to persist all the events. That means if you're executing 50 deployments per day, then every deployment will kick off a process and the process may have five or 10 steps. So let's take 10 steps, 50 deployments, 500 events. They get stored in Mongo because we use this then to visualize. We have a little UI. We visualize the event flows, how far along a particular process is where things failed. So this is where we persist the events, the Mongo. Git on the other side, this is where we allow the user of that system, the developer for instance, to upload configuration files like the helm chart, like the test files, your SLIs, your SLO definitions, your remediation workflows. So the one is basically the configuration entities for your delivery tools and for operation tools and the other one is simply persisting the event stream and we don't want to mix the event stream and the configuration and put it in the Git repository, or at least that's what we've chosen.

Darin: [00:21:08]
Do you have any other big moving pieces besides NATS and Mongo?

Andi: [00:21:12]
The heart at the core of keptn is really the part that is taken care of the orchestration of this processes and this thing is keptn core as we call it. It's the control plane, and that is using NATS and Mongo and Git for the persistence and obviously for sending out all these events. Another core component within our system, is what we call the lighthouse service. The lighthouse service is there to pull metrics out of your monitoring solutions and then compare these metrics SLIs against your service level objectives. So we made a strategic decision that every process we execute, whether it's a delivery or whether it's an operations for all the remediation, after every action we validate that your SLIs and are still meeting your SLOs. This is why this is also a very central component. So when you deploy and you run some tests, the lighthouse make sure that your key objectives that you define based on metrics from your monitoring tool, whatever monitoring you choose are still green. And if not, it gives you feedback and saying, Hey, this deployment is completely bad. We don't promote it into the next stage or your canary is really bad, so we don't roll it out further, but we hold it back. Or if you're doing an auto remediation workflow in production, meaning some monitoring tool detected a problem and then keptn triggers a remediation workflow, where it first triggers the first action, but before it triggers the second action it first again validates did the first action do something good or something bad? Like meaning did it improve your SLIs or did it not improve it and based on that makes a decision on whether the process should go on. So this is a very key component as well.

Darin: [00:22:54]
So there's three big pieces. NATS, Mongo, lighthouse. You've got integrations with Jenkins and Argo and you name it.

Andi: [00:23:05]
The great thing about the integrations are, and I have to give a lot of credit, first of all, to our team that decided that these integrations are just basically containers that are subscribing to an event and when they receive an event, they call an external tool. So for instance, the Jenkins integration is very easy. We call it keptn services. So if you want to write your own keptn extension, you basically write a so-called keptn service and a keptn service is listening or subscribing to events and when these events are then coming in, then the keptn service itself, that's the integration layer is then forwarding that event or doing something in the external tool. So for Jenkins, we allow you to say I'm calling a Jenkins pipeline. If keptn wants a certain thing to be done, and once the Jenkins pipeline is done, it sends back the results to keptn saying this job has executed successfully or not successfully. We have a Jenkins pipeline, as I mentioned, then we have other integrations. We have integrations with testing tools. We have integrations with notification tools, with deployment tools, with incident management tools. And we are at a stage now where we get external contributions as well. That's great. That's all what open source is really about. Having external people to contribute. We recently had somebody contributing a Splunk service for instance, where they are forwarding the results of the SLI and SLO that I explained earlier to Splunk because they wanted to have all the results of every evaluation in Splunk for whatever reason. They wrote a Splunk service in a matter of probably hours. Everybody can build this and we hope that the ecosystem with this approach will grow and grow and grow.

Viktor: [00:24:53]
So do I as a user, do I need to publish, let's say keptn specific events or you're capable of listening to broader set of events. Like, for example, if I just go crazy and do what I'm saying that nobody should ever do and just deploy something using kubectl, are you capable of listening to such events, or it needs to be always keptn specific something?

Andi: [00:25:22]
No, it needs to be events where keptn understands that this event is sent in the context of a so-called keptn project. So keptn is organized in projects then in stages and services. Because our initial thought was obviously around orchestrating processes for delivery and operations. So you have a project. Under the project, you have one or multiple services, and then if you have multi-stage delivery, then you have also every stage in there. If you send an event to keptn, I told you in the beginning, we are using CloudEvents as a standard protocol, but we extend CloudEvents with additional metadata and the metadata that we have in there for instance, is a keptn project, a keptn service, and a keptn stage. So we then know, Hey, this event belongs to this particular keptn project, service, and stage. We also have some predefined event types. Like what I mentioned earlier, a deployment finished event, a test finished event, an evaluation finished event, a remediation action event. This is true for right now, meaning with the current version of this recording. I know this is going to be published around the January timeframe, so as of late 2020, we are on version 0.7.3, where we have, I think about 10 different types of events that you can listen to or send in. With the next version, 0.8, which should go GA in January of 2021. So hopefully by the time of the recording, you are free to define also your own event types, and you can then specify how your events should fit into the overall workflow that you want keptn to orchestrate.

Darin: [00:27:01]
So everything with keptn today is all Kubernetes based.

Andi: [00:27:07]
Yes and no. That's a very good question. So keptn itself runs on Kubernetes. Keptn itself is a container based solution that runs on Kubernetes and also requires some Kubernetes specific things, like we require a configuration maps. We require secrets. We obviously rely on things that are only available on Kubernetes, but, and here's the cool, but keptn orchestrates processes. And as I told you, we can trigger any type of tool along the process. We can trigger a Jenkins pipeline and that Jenkins pipeline can do whatever it wants in any type of system. This is why we believe this is also interesting because we see a lot of new CD offerings out there that are all focusing on Kubernetes, because obviously as Viktor you said in the beginning, this is where we see more and more event-driven architectures, so we need a new approach for delivery, but we are not constrained to Kubernetes because we can trigger any type of tool along the process, which means we can also trigger processes that span deployments from Kubernetes to the mainframe if you want. We can in one process, do all these things in coordination and keptn orchestrates that end-to-end process and that's the beauty of it.

Darin: [00:28:19]
Jenkins doesn't have to be running in that same Kubernetes cluster. Jenkins could be living anywhere. It just needs to be able to subscribe.

Andi: [00:28:25]
Exactly. So in the end, it is just about making an API call. Maybe I need to reexplain or reiterate. When keptn sends an event and it should be handled by an external tool like Jenkins, you need a little layer in the middle that we call a keptn service. So that means if you're, we already have this available, but if you would write your own keptn to Jenkins integration, where you want keptn to trigger a Jenkins pipeline, you would write a very small container that runs in the context of keptn in Kubernetes, that is subscribing to these events and when an event comes in, it makes an API call to Jenkins to trigger that pipeline. Now we've already done this for you. That's one option. The other option is you can have your existing Jenkins pipeline and through an API call, you can trigger a keptn workflow. So the most common use case today is that our users are integrating Jenkins with the keptn quality gate feature, the SLI and SLO. What I told you about earlier, which is also central piece, because this part of the process alone can also be used on its own. That means you can take an existing pipeline and if you say, man, I would wish that I have a tool that can reach out to different monitoring tools or testing tools, pull back metrics, and then give me a score to automate the validation of my deployment. Then you can do this from your existing CICD by making a call to keptn therefore triggering a process, but that process can, for instance, just be this one use case of bringing the metrics in, compare them against my SLOs and give me a score between zero and 100 so that you can make an automated decision, whether you want the pipeline to go on or stop.

Darin: [00:30:06]
I think this is what's interesting about this is you don't have to have this massive Kubernetes cluster to run this. I could run this on K3S.

Andi: [00:30:15]
It's funny that you mention that because if you Google, or if you go to our GitHub repo, you'll find a keptn on K3S set of tutorials. So yes, we can run all this on K3S. That's the beauty of it. That's what a lot of our users are doing right now, because keptn is an orchestrator of processes. Most of them are centered around delivery. Most of them are centered around automated operations around auto-remediation. But because with 0.8, we're opening up your workflows to any type of events, you can orchestrate whatever type of process you want. Maybe we can even sell this to UPS in the future, whoever else and then they can orchestrate their packaging systems, coming back to the analogy that we had earlier, right. Who knows.

Viktor: [00:31:02]
What is closest to keptn among other projects?

Andi: [00:31:06]
That's a very tough question because in the beginning, when we started entering the open source community, we were heavily focused on continuous delivery and we say it new event-driven delivery and we obviously targeted the Kubernetes marketplace and then everybody says, well, how are you different to Argo? How are you different to Spinnaker or Weaveworks or whatever there is? But then we actually realized that it's hard to put keptn in a category and therefore it's hard to find the competition because we don't want to compete. We want to be the end-to-end orchestrator of these processes. We want to manage the life cycle of artifacts from the inception until it's in production and beyond, because this is why we also focus on all the remediation in production. So that's why it's a little hard to see what other tools are available. There are individual tools in each stage. We also know that we don't want to compete with Jenkins. We don't want to compete with Argo, with Spinnaker. We want to integrate because we believe that every organization needs to pick the tools that is best for them, for their stack wherever they deploy to. But I think what a lot of people are missing is an end-to-end orchestrator across the whole thing that is offering these use cases in a very easy way so that people don't have to think about what actually happens behind the scenes. It just works and whether their artifact today gets deployed using helm on this Kubernetes cluster over here or tomorrow, it gets deployed using Jenkins on another environment, this is something that the administrators of keptn specify in their workflows or whatever tools they're integrating keptn with, but the end user, the developer, whoever that might be, should just say go and do it.

Darin: [00:32:52]
So that end-user just knows that, Hey, go deploy my app and then it's an implementation detail of how that is actually done.

Andi: [00:33:02]
Exactly. It depends on the process that is defined and disconnected from that, again, this is the separation of concern and the ripping apart these two things, the process and the individual tooling and the tooling really provides capabilities for individual actions when the process gets executed. If you think about it, you can easily swap one tool for the other as long as this tool provides the same capabilities and you don't even have to think about in which of my pipelines is this used anymore, because there's this hard-coded integration between pipelines and tools is no longer there. We separate it out and we use eventing to connect the whole thing, coming back to event-driven, continuous delivery event-driven operations, automation.

Viktor: [00:33:47]
So if I understood right, you're moving towards more general orchestration of events. Something like, I always try to find the box where to put something. Something like Argo Workflows, but more based with tighter integrations with out of the box with other stuff, right?

Andi: [00:34:06]
Exactly. Yeah. And we're working very closely with the Argo folks. We have integrations with Argo Rollouts, with Argo CD. Again, I think that the message is keep the tools or pick the tools that work best for you and there's many tools that are doing great jobs around delivery around testing. We want to be the orchestrator because what you need to know is end to end when was which artifact deployed, in which environment tested by which test, what was the quality gate result, the SLI and the SLO, because eventually, if everything ends up in production and something happens, you want to know again, okay, who deployed that change? Why did it make it through that quality gate? Which tests were executed that didn't find this problem in production? Because if you have all this information, you can focus and build much better auto remediation in production for problems because we have all the context. We have the complete traceability of the inception of the artifact until it ended in the hands of the consumer.

Viktor: [00:35:11]
One of the issues, and this is not keptn but generally speaking that I find is that tools like that are either very specialized, like we can integrate with ABC only, or more generic, but when they're more generic, I always find missing that visual layer that shows me what is really happening. Because I always end up then having, yeah, okay, this is successful, but to see any additional information, click here and then go there, and this was failed, click here, go there and it's very hard to create that visualization that is not hyperlink based. I'm getting annoyed lately. There's a lot of things going on and I keep clicking links and links and links to find out what's kind of. It's it's close to impossible to get anything, but the summary in one place.

Andi: [00:36:01]
Yeah. Now that's a good, that's a very good observation. So I would say we probably suffer. Well, we have the same challenge, obviously, because we provide a UI visualization of all these processes. We show exactly which events were sent, which tools picked up the events and then the tool integration itself can then decide what level of information it is sending back that then can also be visualized in our UI, which we call the keptn's bridge. Now we have a lot of link integrations right now because they'll obviously then get you to the more specialized tool, because we don't want to rebuild all the greatness of a particular point, or it doesn't make sense. But I think we are still in the phase where we also extending our default events with more information that can then also be displayed, that can be filtered on, that can be searched on, in our UI, right? The use case of I need to go to the other tools should then really only be necessary in case of troubleshooting and you don't really have all the details in keptn. That's the idea.

Viktor: [00:37:08]
I guess, for things like that to truly work, we probably need some form of a standard. Let's say, if you go back to Jenkins, Jenkins does the job and at the end of the job, whatever the job is, should publish an event in some standard format so that it doesn't really matter anymore, whether it's keptn or this or that, listening to it, you know that you're going to receive. So almost like standard response format, depending on the industry, right.

Andi: [00:37:39]
Yeah. That's also why we are part of the CDF, the Continuous Delivery Foundation where we are a part of the interop SIG. We are part of different open source groups and one of the goals is to come up with a standard. Now we made the first steps initially to lead us down that path because we were basing everything on CloudEvents and then we came up with our events types on top of CloudEvents and we are working with the individual special interest groups to really say, what are these types of events? What type of metadata, what type of fields do they need to have so that it's not just keptn that can use it, but that everybody can make sense out of it. Yeah. That's already a group we're working on or like a direction we working on.

Viktor: [00:38:23]
Because almost ideal state and correct me if I'm wrong but now I'm thinking out loud, almost the ideal state would be that we create all the events that now you can create with keptn without really knowing whether it's even keptn behind the scene. Right. That would be. Probably we'll never get there, right, but that would be awesome.

Andi: [00:38:46]
I would say let's phrase it in the positive attitude. We may get there, especially because people like you and people like me, and a lot of others in this space that are working for vendors, we need to sit together on these special interest groups and then define a standard.

Darin: [00:39:01]
So if people wanted to follow you, Andi, where's the best place? Twitter, LinkedIn, all the things.

Andi: [00:39:08]
All the things. Yeah. Twitter, it's grabnerandi. Andi with an I in the end. So G R A B N E R A N D I. Then I also such, such as to confine me with my full name on LinkedIn. So Andreas Grabner on LinkedIn. You can also find the keptn project on Twitter with @keptnProject and there's also keptn.sh. That's the website, but yeah, I think that's about it. The most important ways to get in touch with me.

Darin: [00:39:37]
Cool. Thanks for joining us today.

Andi: [00:39:38]
Well, thank you for having me.

Darin:
We hope this episode was helpful to you. If you want to discuss it or ask a question, please reach out to us. Our contact information and the link to the Slack workspace are at https://www.devopsparadox.com/contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at https://www.devopsparadox.com/ to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.

DOP 90: Event Driven Continuous Delivery With Keptn

Show Notes

Guests

Andi Grabner

Hosts

Darin Pope

Viktor Farcic

Links

Rate, Review, & Subscribe on Apple Podcasts

Signup to receive an email when new content is released

Transcript