DOP 74: Using GitOps in Your DevOps Workflow

Posted on Wednesday, Sep 23, 2020

Show Notes

#74: Many people today are still clicking around in consoles and copy and pasting instructions from Word documents. Today, we make the case for changing your workflows to incorporate GitOps to minimize risk in managing your environments.

Rate, Review, & Subscribe on Apple Podcasts

If you like our podcast, please consider rating and reviewing our show! Click here, scroll to the bottom, tap to rate with five stars, and select “Write a Review.” Then be sure to let us know what you liked most about the episode!

Also, if you haven’t done so already, subscribe to the podcast. We're adding a bunch of bonus episodes to the feed and, if you’re not subscribed, there’s a good chance you’ll miss out. Subscribe now!

Books and Courses

Catalog, Patterns, and Blueprints

Buy Now on Leanpub Buy Now on Udemy

Kubernetes Chaos Engineering with Chaos Toolkit and Istio

Buy Now on Leanpub Buy Now on Udemy Buy Now on Amazon

Canary Deployments to Kubernetes using Istio and Friends

Buy Now on Udemy

Hosts

Darin Pope

Darin Pope

Darin Pope is a services consultant for CloudBees.

His passions are DevOps, IoT, and Alexa development.

Viktor Farcic

Viktor Farcic

Viktor Farcic is a Principal DevOps Architect at Codefresh, a member of the Google Developer Experts and Docker Captains groups, and published author.

His big passions are DevOps, Containers, Kubernetes, Microservices, Continuous Integration, Delivery and Deployment (CI/CD) and Test-Driven Development (TDD).

He often speaks at community gatherings and conferences (latest can be found here).

He has published The DevOps Toolkit Series, DevOps Paradox and Test-Driven Java Development.

His random thoughts and tutorials can be found in his blog TechnologyConversations.com.

Signup to receive an email when new content is released

Transcript

Viktor: [00:00:00]
If everything is defined as code and stored in Git that means that Git is your only source of truth. Logically speaking, imagine that we are now figuring this out from scratch, right? If Git is your only source of truth or the golden source of truth, then it is logical that all the actions are being performed as a reaction to you pushing something to Git.

Darin:
This is DevOps Paradox episode number 74. Using GitOps in Your DevOps Workflow.

Darin:
Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darin and Viktor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can, and mix it with random buzzwords like Kubernetes, serverless, CI/CD, team productivity, islands of happiness, and other fancy expressions that make it sound like we know what we're doing. Occasionally, we invite guests who do know something, but we do not do that often, since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. PS: it's Darin reading this text and feeling embarrassed that Viktor made me do it. Here are your hosts, Darin Pope and Viktor Farcic.

Darin: [00:01:27]
So I've recently been working with a client. And without using the word, I've been trying to get them to think in the terms of GitOps. And this has been a huge shift for this client, and I'm not sure that they're grokking it yet and that's okay. But at least I'm making them think in that direction. The argument for GitOps is... That's your turn, Viktor? Yeah.

Viktor: [00:02:04]
Oh, that's my turn. Okay.

Darin: [00:02:05]
the argument for GitOps is,

Viktor: [00:02:09]
uh, it's complicated.

Darin: [00:02:12]
Oh my gosh, come on. You're supposed to make that simple. It's not complicated. It's simple.

Viktor: [00:02:17]
No, it is. So I have very mixed feelings about GitOps. I hate it to begin with because it's a new term that somehow makes people believe that it's something new. And I say that this is not new for many different things, but this is definitely not new. GitOps is something that we were supposed to be doing for 15 years. For a long, long time. All of us and we were not. So bright people came up with the brilliant idea. Let's wrap things that we know that you're supposed to be doing and you're saying that you're doing, but you're not into a phrase so that we can have an expression when we talk about a set of practices and so that we can make a great marketing campaign for us, because we invented yet another term. In general, I don't like that now everybody's coming up with something ops, you know, DevOps. Okay. That was the first and then everything is now ops something ops. But GitOps is a good one, right? I like the silly ones. The reason why I'm saying it's not new, because there are two fundamental things that sound like no brainer to everyone and yet majority is not applying it. And those two principles is that everything is defined as code. Right. Can we agree on that one? As not being new. As not being new.

Darin: [00:03:47]
Hang on. It's not a new concept, but it's a new concept to many, many people. Because their idea of operations is ClickOps.

Viktor: [00:04:00]
Yes exactly. So it's a antithesis of ClickOps. I actually haven't heard that one. I'm going to steal it from now on Darin. I'm going to use ClickOps.

Darin: [00:04:11]
You haven't heard ClickOps before?

Viktor: [00:04:13]
No,

Darin: [00:04:14]
Oh my gosh. I thought I heard it from you.

Viktor: [00:04:17]
no, it's brilliant. I'm going to use ClickOps. Everything is defined as code, right? And even in those that are not ClickOps type of teams, you know, it's been, you know, operations, let's say operational part in, in the company I worked like 20 years ago, it was also defined as code. It's just that it was in a Word document from which you copy and paste segments of the document with your commands. It's code. Now, the second important thing is that if you follow that thread, right, everything is as code code lives, with no exceptions, in a version control system, right? That's a logical development of the conclusion that everything is code. Now, version control system. There is only one that matters with many variations. That's Git. Right. We're not talking about SVN, VCS. I know that there is one of you over there listening that has Perforce, but almost everybody uses Git, hence GitOps. It's just the idea that everything is defined as code and code lives in version control. And the only version control that makes sense today is Git. If everything is defined as code and stored in Git that means that Git is your only source of truth. Logically speaking, imagine that we are now figuring this out from scratch, right? If Git is your only source of truth or the golden source of truth, then it is logical that all the actions are being performed as a reaction to you pushing something to Git. Right. It's almost like, kind of, it's almost like a children's logic, right? When you start with one statement and then all the rest are just falling into place by themselves. It all starts with everything defined as code. From there on, everything kind of, I think that everybody would come up to the same conclusion without being told to if that makes sense. Not every time I speak it makes sense.

Darin: [00:06:25]
Well, so with, with the concept, and this is what I'm calling it now. You don't do GitOps just like you don't do DevOps. Terraform, done correctly, is GitOps. Ansible, done correctly, is GitOps. Right?

Viktor: [00:06:43]
Yes. And correctly means, and I must clarify this, that correctly means it's code and it is. When you make changes to the code, you push it in Git as you do. And then that triggers a process that converges the desired state into the actual state and desired state is what you push to Git. Actual state is the state of your system. Now where many are differing from that model is that many would be running that Ansible script from their laptop or from a server long before or after it is pushed to Git. And this is now, this is the most difficult change. Because if you do it before, then Git is not your source of truth anymore. Actually I could argue that it's not, even if you do it as a result of pushing something, it's not your source of truth, but I'm coming. I'm, I'm kind of thinking aloud now, but let's rename it. It's not your source of truth. Git is the source of your desired state. This is what I want. The same thing, like code, right? You develop a new feature. You put the code of that feature in Git and that feature is not already the source of truth of what your customers are running. Right. It is the desired state. From now on, my desire is for this feature that I just stored in Git to be running in my production server or in a desktop , wherever you're running that something. So it is not the source of truth. It is the source of the desired state.

Darin: [00:08:33]
Let me restate what you just said. In the first two to where you make a change to an Ansible file and you apply that. That is not GitOps. That's basically, you've written a Word document and you copy and pasted it out of the Word document. The next step of, okay. I pushed it to Git but then I still ran it manually from my machine or some other machine. No, it's not, that's still the same, same problem because it's because I made the change, but I made the change. I committed it. So that's at least that's a little bit better, but it's still no different than copying and pasting out of that Word document because I am still running the script.

Viktor: [00:09:19]
Yes. I mean, I actually differ there greatly. I think that if you would do that and nobody will be so diligent to always do it. That's the real problem, right? It's because we are dealing with humans now doing things that they're not good at doing. But if you promise kind of Scouts honor, that you will always push to Git and then run it manually, we are in a way complying with GitOps, right? Git is the source of truth and something just converged the actual into the desired state. I think that that's perfectly valid. The thing is that I don't trust you being capable, to be so diligent that you always flawlessly execute the processes that converge the actual into the desired state. I don't trust you. That's my problem. Not that the process is not valid. It's valid.

Darin: [00:10:15]
Okay. A valid process. And I'll turn the phrase around on myself. I don't trust myself to do the right thing. Just because all it takes is for me to fat finger one key and bad things could happen, because I may have just committed a.sh but when I actually sit down to type it, I type s.sh, which is actually another valid script that then fires off, which was not the desired state because s.sh actually destroys everything.

Viktor: [00:10:48]
Yeah, or you will commit changes to three different repos that represent three different environments and you're going to forget to apply one of those three. It's going to happen. It's inevitable.

Darin: [00:10:59]
Yes. So, one way to say it is we need to eliminate the humans. The Doctor Who way of saying it is from the Daleks exterminate, right? We need to exterminate humans. At this point, we want the humans getting things into Git and then we want things to flow from desired state into actual state. That's perfect world. Right. But in today's world, many companies are still at ClickOps and I'm not talking about just going into the AWS console or the GCP console or the Azure console. I'm talking about going into self-managed applications that you have on prem that do have APIs or CLIs available to you that you could automate the management of those things. I'm not opposed to you going into these systems, but you should only be going into these systems as read only and only ever escalate to being able to mutate the state in extreme and dire circumstances, like break the glass level circumstances.

Viktor: [00:12:21]
That's just a subconsciously, selfish way to do stuff. That's kind of assumes that you are the only person in that company who will ever manage it. You will never go to vacations or anything like that. You know, when you go in and click buttons, when you go and SSH into servers and build random things, that means that [a] you have perfect memory. You're a person who never forgets what you did and [b] nobody else from your team, company, what so not, will ever work on that thing, right. That those are two subconscious and I'm repeating subconscious because I doubt that many are thinking that way consciously, but Hey, that's, that's basically the result, even if not the intention of what you're doing.

Darin: [00:13:15]
I'm not opposed to ClickOps but only in a let's call it a lower environment or a way upstream environment, however you want to think about it. Sometimes you've got to get in and point and click just to figure out what you have. You've got to figure it out at some point. That's okay. This is where I differ from some of the hard liners of everything has to be code from day zero. That's a little too extreme for me.

Viktor: [00:13:45]
That's silly. That's the same thing as saying you should never, ever, ever test anything manually. Heck, I test manually all the time and I create resources by clicking things all the time. It's just that that doesn't reach production. Let's say that there is a new service right now in Azure or AWS or wherever, and I'm curious, and sounds interesting, and I want to try it, right. What am I going to do? I'm going to go to their console and I'm going to click the hell out of that thing. That's what I'm going to do. It will not even occur to me to write a script, to write Terraform something or any of those things. It will not even cross my mind. But then if the decision is made that yes, this is a good thing. We wanna move to let's say staging environment or something like that with that thing, then that's where that behavior stops.

Darin: [00:14:40]
It has to stop because all it takes is one person reading an out of date Word document, and going in and clicking the wrong thing and everything comes falling over. That's why you want GitOps. You want these precious environments. These, if this goes sideways at all, we're starting to lose four and five digits per second of money. You've seen the outages from the AWS', the Linodes, the Cloudflares, the you name it because one thing happened. I'm not saying anything bad against it. Things happen and fortunately these companies put out good postmortems and you can sort of read between the lines. Oh yeah, we made a change to a router or we had a router go sideways because of a patch that was applied. Okay, well that means you've at least tested it and you rolled it out and one of these pieces of hardware went south with a patch. Those things can and obviously do happen. And if you think it's not going to happen to you, dear friend, you are sadly mistaken. Because it not, not necessarily will it happen to you? You're going to do it to yourself. And then after you scraped the pieces up off the floor, you're going to be saying, why did I do that? Why didn't I listen to Viktor and why didn't I put everything in Git?

Viktor: [00:16:27]
This time, it's why didn't I listen to Darin. You just had a monologue.

Darin: [00:16:32]
I did?

Viktor: [00:16:33]
Yeah. So why didn't you listen to Darin?

Darin: [00:16:37]
Okay. Why didn't you listen to myself? Um, obviously I wasn't listening to myself to figure it out it was a monologue.

Viktor: [00:16:43]
It's good thing. It's a good thing.

Darin: [00:16:45]
I'm going to pull one of your phrases back up. If you don't believe you're ever going on vacation, or you're not going to be around 24/7

Viktor: [00:16:56]
and perfect memory. You know, there are those people, a brother of my ex girlfriend. He had one of those. You can tell him the page in a Lord of the Rings book and, and the, the kind of the

Darin: [00:17:10]
And he can read it

Viktor: [00:17:11]
or pot. And he can really talk from memory. Yes. So you must be one of those people. I congratulate you. This is amazing how many people, you know, that's actually impressive. I had no idea. I just realized that the percentage of people having photographic memory is much higher than, than I thought. Otherwise not so because just judging by the number of people who are applying that skill in their daily work.

Darin: [00:17:41]
anyway. That, that that was you're being sarcastic there very nicely. Um, but the one, the one state you didn't bring up is you don't think you're gonna, that you could be fired because you're too important to the company. Guess what? You're not that important. Now you're not going to care if you get canned, but the company is going to be kicking their butts that they fired you, which is the state you want to be in. But they're going to go on and they'll figure it out. Can you get to this day one? Heck no. I mean we were already talking about at day zero, we're going to be clicking when we're first figuring things out. That's okay. There's no problems with that. You got to figure out what you got first. But then start moving towards the GitOps. Now let me ask this question because this was one that's been rumbling around in my head. Desired state. My desired state can't be just a single system.

Viktor: [00:18:52]
It could be, why not?

Darin: [00:18:54]
Well, but it could be, but that's what I'm saying is I'm trying to figure out how I do that. Let's say I've got my infrastructure GCP, so that's going to be easily managed via Terraform. And then let's say I have my custom application that I'm deploying out that's not on Kubernetes. Right? It's something that's still whatever it is. Right. There's no good API, but the application developers wrote an API around that, but not enough to where I could use Terraform for it, ignoring that I could actually run shell scripts from Terraform. But ignore that. Let's say I can't do that for the moment. How would I then orchestrate these numerous forms of desired stateness so it goes as a whole?

Viktor: [00:19:46]
I mean, you just need a tool that understands the dependencies implicitly or explicitly, right? Probably Ansible or Chef, Puppet would be better suited for your scenario than Terraform to begin with. You would need to design that state, let's say define the state in a way that yeah, there is a concept of application runs on infrastructure. Therefore infrastructure needs to be converged into the desired state before application. Right? So it's kind of building a dependency tree. Terraform does that as well. Right? If you create a GKE cluster, it will know that it needs to create a control plane before it needs to create worker notes before it applies some let's say kubectl apply type of commands. There are tools that do that and you might even roll out your own. If you know how to write a shell script and a huge amount of if/else statements, you can do that. Terraform in a way is an if/else machine in a way. I mean, it's much more than that though. But that's, I guess, and this may be is continuation of the previous thoughts. If we continue following the logic from the very beginning, then we can easily make some other conclusions. The next conclusion of it could easily come to realize ourselves without really reading anything about it is that yes, if you want to have that process that converges actual into the desired state, then everything needs to be declarative. It cannot be imperative. You cannot have install-the-app.sh script. You cannot have that simply because, or that alone, simply because that works only when the app is not already installed. You can theoretically say going back to if/else statements, install if it's not running, upgrade if it is, and stuff like that, but that's, that's a waste of time. It can be done. But you shouldn't. It's so much easier to define it in declarative way so that whichever tool you use and there's plethora of tools can take the declarative statement, like I want this, this, this, and that and convert it into imperative commands. So what I'm really trying to say is that we as humans are better off defining things declaratively, and I'm not even proposing any specific format, right. It can be many things, but declaratively, and then letting machines convert declarative statements into imperative, executable commands. Declarative would be, I need my app to run in my cluster and to have version 1.2. That's a declarative statement. And then imperative would be, Hey, if the app is not running at all, then install it. If it's running then check what is the version? If it's that version, then upgrade to that version. If this and that. Imperative would be a lot of specific commands that are executed depending on the outcomes of comparing the desired and the actual state. Which traditionally we've been doing for a long time. That's the reason for the Word documents and the reason why Word documents with instructions how to operate the system persisted for such a long time, instead of everything becoming a script, because doing those types of operations in a script are actually complex. I could even argue that it's easier for me to use a Word document with instructions than just a bash shell script, because I can go to the system and do those queries. Okay. What is running currently? This, okay, so therefore I need to apply, uh, option B from the Word document, right? The reason for persistence of Word documents instead of shell scripts is because we didn't at that time come to the realization that things should be declarative and machines should do the imperative part of the stuff. I forgot the question. I don't know why I'm talking about this though, at this moment. I have no I have no bloody idea.

Darin: [00:24:05]
That's okay. Because I don't remember either. So we're both in the same place.

Viktor: [00:24:11]
What's the subject of this today's session? GitOps? Was it GitOps?

Darin: [00:24:15]
It was GitOps. So I was going to say Christmas trees, but no it's GitOps. You're bringing us into the correct position here that we should have a yaml file, a toml file, uh, some sort of declarative definition of our desired state. Something slurps in that file and turns it into imperative, big word. My word is going to be, make it so. Break out some Picard this morning. Go make it into this state and go from there. Your case of here's my declarative for my application that needs to run on this Kubernetes cluster. The thing slurps it in. Okay, well we need to get the app in the cluster. Is the cluster there? Nope. Go create the cluster. There's no extra people. This is the good thing about, from a true GitOps fashion and this is where it does we're using lots of analogies today, it does take a village because your infrastructure team, a shared services team will have already had to have given you the correct way to create a cluster. There should have been already a defined way to do that. And then it would follow through. This still gets us back to the long lived question of why can we not have the Kube API to rule everything?

Viktor: [00:25:43]
We just as well might. I think that's a, quite a likely scenario that I know quite a few projects, quite a few startups and bigger companies are working on, but at the moment it's more of an idea than reality.

Darin: [00:26:00]
Yeah. Okay. So if you're only doing ClickOps today, hopefully you're only doing it in a sandbox environment. We're not anti ClickOps but we're anti ClickOps in environments that matter. Because what happens when you go on vacation and you're the only one and you have perfect memory. Yeah. You're the unicorn. I know I'm not that person.

Viktor: [00:26:25]
I would say that ClickOps is an excellent way to provide learning experience, not to operate a system. It's a way to learn.

Darin: [00:26:36]
It's your information gathering processes to lead you into operational excellence. So GitOps. If you're not doing it today, you need to do it. If you don't think you need to do it, you need to get a new job.

Viktor: [00:26:58]
Exactly. Let me be a bit evil and put it on a higher level. If you're in software industry and you don't write code, and I think that we should have a separate session of what is code, but I'm going to leave it as a cliffhanger. If you don't write code, go and become a lawyer. It's going to be easier. There's still time. There is always time to change a profession.

Darin: [00:27:28]
Yes, there is. But a lawyer?

Viktor: [00:27:32]
I'm just trying to figure out what is very hard, but still easier than for you for to change after 20 years of clicking and not writing a single line of code. It might be easier to become a lawyer than to work in software industry.

Darin: [00:27:48]
but you have to write briefs when you're a lawyer.

Viktor: [00:27:50]
Yeah, but you don't need to unlearn things, which is the most difficult part.

Darin: [00:27:55]
That's true. Just start fresh. That's what we are saying.

Viktor: [00:27:58]
Exactly. Exactly. It doesn't have to be a lawyer. It can be, I don't know. Do something else. A nurse. You can become a nurse. I don't know why I said a nurse.

Darin: [00:28:13]
That would be hard. That would be harder than being a lawyer I think. And, and, and, and, and how a nurse ties back to GitOps I don't know. So that's sort of where we'll end this one today. GitOps. Do it. Stop, stop procrastinating. Just do it.

Darin:
We hope this episode was helpful to you. If you want to discuss it or ask a question, please reach out to us. Our contact information and the link to the Slack workspace are at https://www.devopsparadox.com/ contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at https://www.devopsparadox.com/ to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.