DOP 79: Are You Doing CI, CD or None of the Above?

Posted on Wednesday, Oct 28, 2020

Show Notes

#79: In this episode, we with speak with Ant Weiss from Otomato about the differences between continuous integration and build automation. We also dig into what it takes to culturally change an organization to succeed at continuous delivery.

Rate, Review, & Subscribe on Apple Podcasts

If you like our podcast, please consider rating and reviewing our show! Click here, scroll to the bottom, tap to rate with five stars, and select “Write a Review.” Then be sure to let us know what you liked most about the episode!

Also, if you haven’t done so already, subscribe to the podcast. We're adding a bunch of bonus episodes to the feed and, if you’re not subscribed, there’s a good chance you’ll miss out. Subscribe now!

Books and Courses

Catalog, Patterns, and Blueprints

Buy Now on Leanpub Buy Now on Udemy

Kubernetes Chaos Engineering with Chaos Toolkit and Istio

Buy Now on Leanpub Buy Now on Udemy Buy Now on Amazon

Canary Deployments to Kubernetes using Istio and Friends

Buy Now on Udemy

Guests

Ant Weiss

Ant Weiss

Ant(on) Weiss is the founder and CEO at Otomato.io - the effective software delivery consultancy. He’s been building and delivering software for the last 20 years. Frequent speaker at technology events, cloud native tech advocate and the host of the DevOps Shorts podcast - Ant believes he knows how to make our industry better and has loads of thought-provoking stories to prove his point.

Hosts

Darin Pope

Darin Pope

Darin Pope is a services consultant for CloudBees.

His passions are DevOps, IoT, and Alexa development.

Viktor Farcic

Viktor Farcic

Viktor Farcic is a Principal DevOps Architect at Codefresh, a member of the Google Developer Experts and Docker Captains groups, and published author.

His big passions are DevOps, Containers, Kubernetes, Microservices, Continuous Integration, Delivery and Deployment (CI/CD) and Test-Driven Development (TDD).

He often speaks at community gatherings and conferences (latest can be found here).

He has published The DevOps Toolkit Series, DevOps Paradox and Test-Driven Java Development.

His random thoughts and tutorials can be found in his blog TechnologyConversations.com.

Signup to receive an email when new content is released

Transcript

Ant: [00:00:00]
I like pull requests when working on iterating on code. They're definitely great, especially if you have the discipline to make the changes small and the discipline to review the pull requests because as we recently there was discussion about that in many places, you see pull requests stuck for weeks waiting for a review. We're waiting for changes on the review. And that's definitely an anti-pattern.

Darin:
This is DevOps Paradox episode number 79. Are You Doing CI, CD or None of the Above?

Darin:
Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darin and Viktor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can, and mix it with random buzzwords like Kubernetes, serverless, CI/CD, team productivity, islands of happiness, and other fancy expressions that make it sound like we know what we're doing. Occasionally, we invite guests who do know something, but we do not do that often, since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. PS: it's Darin reading this text and feeling embarrassed that Viktor made me do it. Here are your hosts, Darin Pope and Viktor Farcic.

Darin: [00:01:26]
Now, recently Viktor has been talking about GitOps incessantly to the point of I'm tired of hearing about GitOps. Do you want to take a guess why I'm tired of hearing about GitOps?

Viktor: [00:01:42]
Because Viktor is talking excessively?

Darin: [00:01:44]
Yeah, pretty much. I'm tired of hearing about GitOps because then when I talk to other people about GitOps they say like well, there's no way I could ever do that. There's no way my company would ever do that. That's just like a pipe dream. It doesn't work.

Viktor: [00:01:56]
It's, it's silly kind of, I think that there are important discussions on how to take it to the right level of how to make it work really, really well. But saying I cannot do GitOps is saying I cannot store my source code in a Git repository. If that's the level that somebody is right now then I think you should just give up, go and grow potatoes, potato, tomato, something in the garden, become something else. The difficulties of GitOps, there are many. It's ridiculously complicated. But in a nutshell, doing the base GitOps, meaning code stored in Git, and then something converges that code, does something in a cluster to make that code run at the base level. If somebody says I cannot do that and I know that many people do, then my advice is open a coffee shop. That's going to be a better business for you.

Darin: [00:03:00]
Now for every astute listener right now, you may have heard a third voice a little bit in the conversation. Today, we have Anton Weiss.

Ant: [00:03:11]
It's German. Yeah. Austrian originally.

Darin: [00:03:17]
He interacted with Viktor over one of these GitOps tweets. I unfortunately missed that interaction, but we decided to have him on today to talk about his experience, how he sees GitOps, how he sees consulting the in the world as it stands today and whatever else we land on. So Anton, we'll shorten it up to Ant, welcome to the show today.

Ant: [00:03:47]
Yeah. Hi Darin. Hi Viktor. Great to be here. So the full name is Anton or Anton because it's a Russian name. I was born in Russia and my family name is Austrian in it's origin because my grandfather is from Vienna. So that clears the name up. Okay. And, uh,

Darin: [00:04:13]
Okay. See, I would have never, again, being the dumb American, which most of the world thinks we are. Uh, I would not have ever figured that out. I really needed to study European history.

Ant: [00:04:24]
Yeah.

Darin: [00:04:26]
but anyway, that's

Viktor: [00:04:27]
It's impossible to, it's impossible to study European history because at least 57 important events were happening every week for thousands of years.

Darin: [00:04:38]
How's that any different than what's happening in the CNCF right now.

Viktor: [00:04:42]
Okay. That's a different level of crazy.

Ant: [00:04:44]
Yeah.

Darin: [00:04:45]
Okay. But anyway, let's

Ant: [00:04:47]
you know,

Darin: [00:04:48]
let's

Ant: [00:04:49]
talking about CNCF,

Darin: [00:04:50]
yeah.

Ant: [00:04:51]
world, the whole world is crazy.

Darin: [00:04:54]
So let's, let's get back to the GitOps interaction that you guys had.

Ant: [00:04:59]
Yeah. Right. So, uh, Viktor you want to say something about the tweet?

Viktor: [00:05:07]
Yeah, it actually happened last night, I think at least my night. I've been bumping into some issues and also thinking how to translate those issues into code, you know, product something. Then I reached out on Twitter. Hey, what are the problems others are experiencing with GitOps. To be honest, I haven't seen the answers yet. I posted a tweet and fell asleep. That was like three o'clock in the morning, I think, or something like that. And I just woke up.

Ant: [00:05:42]
Oh, that makes our lives sound a little bit sad if we, what we think about at night is GitOps, right?

Viktor: [00:05:52]
Hey, I could be thinking at night about COBOL.

Ant: [00:05:57]
Okay.

Darin: [00:05:58]
you'd make more money with COBOL.

Ant: [00:06:00]
Yeah. Yeah. That would be like more historical I suppose. Well, Nevertheless, the tweet you're mentioning, all I said in that tweet, knowing that we had to talk today, I said, I hope we get to discuss it today. So, that was basically my answer, but we had a little bit of interaction about not specifically GitOps, but continuous delivery in general before that, because I had another tweet that kind of blew up in which I stated that what, in most places we today call CICD is only CI with folks chipping in and saying they don't think it's even CI. It's just build automation. There is no real integration happening there. Viktor I think you had something to say about that saying that in your point of view, CI and CD, the mechanics of CICD are the same. So it basically takes a change of culture to get from CI to CD.

Viktor: [00:07:07]
Yes. I mean, from my perspective, at least, CICD, or, I mean, continuous integration delivery deployment, no matter what you do, it's all about orchestrating a set of steps that do something. I'm going to now simplify greatly and I know it's not like that, but basically it's a way to execute shell scripts in certain order, sometimes parallel or, you know, depending on the events that happen. It's not rocket science. Now of course, there are better tools. There are worse tools. We are not going to go there now, but every CI tool in my head is capable of being CD tool and vice versa. That does not make it good. So some are bad. Some are good. Right? The real issue is that companies are not there. Teams are not there. When I see implementation of what people call CI or CD or whatever, I want to cry. Okay. It's simply not it.

Ant: [00:08:03]
and you can sleep at night.

Viktor: [00:08:06]
Uh, At that time, I was sleeping very well at night because my job was to in consulting for awhile, but very short term, like few days, maybe a week and comes Friday, I can say to myself, I'm going home and I'm not going to see you anytime soon. You're on your own.

Ant: [00:08:25]
Yeah. Yeah, I know. I know how it is. I'm also in consulting and we agreed, we'll talk about this in the end if we have time, but I definitely know what you're talking about. But going back to GitOps. So you're saying, okay, there is no real difference between CI and CD. It's all just workflow automation. You could say that it's like domain specific workflow automation, right? So, um, Yeah. Mechanically. Yes. Originally, I remember when GitOps when the term originated at Weaveworks. I think it was Alexis Richardson who coined it. So I remember originally it's annoyed me a great deal. Because I was saying, how long will we stay in this pattern of taking any word that we like and joining it with ops and saying we invented something new. So we have this DevOps, DevSecOps, BizDevOps, ProdOps, whatever, GitOps. So, what the hell does it mean that Git and Ops can now collaborate effectively? That's just my linguistic education talking in me. I have a BA in German literature, so I like digging into the etymology words.

Viktor: [00:09:44]
I must interrupt in saying, don't forget about using continuous as well. Continuous testing. I yet don't understand what the heck that means. I really honestly don't.

Ant: [00:09:55]
Continuous budgeting. That's better. Leaving the wordplay aside, GitOps. Okay. I understand the principle. You'll store config in Git and then you'll change the real work to fit the desired state. So what's new there. Like we had desired state configuration management since the nineties, like CFEngine pioneered that, and then you had Puppet and Chef and Ansible and whatever, not okay. Maybe not Ansible, but a Puppet and Chef had had this pattern. Right? It was there. So, so what's new? I think my biggest beef with GitOps is it's a very loosely defined term. Because, whereas originally it meant storing your Kubernetes cluster configuration in Git, now for some reason, we have all those GitOps operators like Argo and Flux, which basically deal with deploying applications. There is no real differentiation there and I think there's a great deal of difference. Now, going back to what you said about CI and CD being the same thing, more or less, I'm saying the mechanics are the same, right? So yeah, we basically take something, some artifact and we move it around and we maybe change the running system to be in sync with whatever desire state is. But, the semantics are different, and that's why many organizations that we've seen in the past who try to use Puppet and Chef as their deployment tools usually failed. Yeah. That was a nightmare. Because Puppet and Chef were meant for managing configuration of servers. They weren't meant for deployment, for deploying applications, because the application's lifecycle and configuration lifecycle, they are totally different. The impact of these changes is totally different and we're now hitting the same thing with GitOps I think because there is no differentiation. We're just focusing on the mechanics. Okay. You have a desired state. We'll sync the actual state to the desired state and of course, there's the whole thing, how to deal with configuration drift. That's never been resolved. I think we're not focusing enough on the semantics. We don't have those tools that know how to deal with the semantics and that's why I beg to differ. I think that CD is totally different than CI and that's one of the reasons why most organizations today can do CI but they also say that CD will happen next year. Right. So it's on the roadmap. We have canary on the roadmap. Yeah, we'll get there eventually. Probably. Maybe. And so there's building those CI processes and they have that little POC of getting to CD. But while they're building that POC of having CD on Kubernetes we know that in two years from now, they'll probably move to another platform. It will be serverless or you name it and the whole POC will have to be rebuilt. And then that will happen the next year after that probably. I am very optimistic about that as you can see. Okay, that's, that's basically my rant. Okay. Now I think, I think it can be better. Actually we're trying to build something better around that today. There's this new enterprise we're starting called Canarian that's focused around making progressive delivery easier. Now Viktor, I'd like to hear what do you think of course, because you have invested a great deal into looking at progressive delivery, into teaching progressive delivery, into playing with Istio and Kubernetes and seeing how all these things work and I'd love to hear what you think.

Viktor: [00:14:01]
So I do agree. I mean, to be honest, I've been doing what we call CD today, while we were calling it CI. In my head at that time that was CI because simply we started with having only builds and then we started creating releases and then we started deploying stuff and then we started running some functional tests and so on and so forth, and it was progressing, progressing until we were deploying it to production through those same pipelines. At that time we called it CI because that was the term used. Now I think the most important thing about CD as a term and GitOps as a term is not that I invented something new. No. We know the importance of storing code in Git, or version control for like 50 years, right? That's not new. What is the important part about those things is that it resets false expectations or misunderstood practices. CI ended up being misunderstood by many companies. Wha they call continuous integration ended up being some random mumbo jumbo. So we got kind of continuous delivery and what I like about continuous delivery is that it defines not only the beginning, beginning is a commit to a code repository, it defines the end. Release deployable to production. So the importance from linguistic perspective, from definition perspective of continuous delivery, is that it defines where it ends, not only where it begins. CI never defines, yeah you're continuously integrated. It can be interpreted as anything you want. Same thing with GitOps. It's not new. The term should never have existed if it wasn't for the fact that majority I would say of companies are still clicking buttons in their favorite CICD tool to run a build, are starting their pipelines in those tools instead of a Git repository, are operating basically doing UI operations and storing things in random places. So it's not new, but I do think it's very beneficial still even if it's not new. It's like you have a person that goes somewhere and you realize that that person is going on a wrong track, and then you need to nudge him a bit. In this industry, we redirect through new terms for the same thing. Like microservices. That's also not new. For me, SOA and microservices are the same thing. But SOA, in practical terms, not how it was a vision, but how it was implemented, ended up being something completely different. So in my head, microservices is just going back and improving, but essentially going back to the initial premise, resetting the initial premise.

Ant: [00:16:56]
Yeah. Yeah. Again, especially with GitOps, there's a pitfall there because with misinterpretation, many folks start thinking that Git becomes an interface to managing production and Git can be a lot of things, but it's definitely not an interface. There's no, no UI and no UX there.

Viktor: [00:17:22]
That's true. Now, first we can argue for a long time whether there should be UI or not, but let's say there should. That's, from my perspective, more a lack of tooling right now is that I could easily envision a tool that actually allows you to do something, but actually results in pushing that to Git, instead of triggering. I think of Git as being a database of my desired state.

Ant: [00:17:52]
Okay. So why not use a database?

Viktor: [00:17:56]
Oh, yeah, we can use a database. That's perfectly fine. It's more like that Git is probably the only tool that nobody disputes today. We can have a long argument whether to use this Kubernetes or that Kubernetes. Whether to use Kubernetes or not to use Kubernetes. Let's say serverless. We can have a discussion about any single thing in the industry, but the only thing that everybody groups around today probably is Git. Nobody disputes it in a way. So it is a common ground.

Ant: [00:18:27]
This wasn't the state of things 10 years ago. A lot of people hated Git.

Viktor: [00:18:32]
Oh yeah. Yeah. But not today, right?

Ant: [00:18:35]
So it has something to do with Linux probably. Nobody disputes that Linux is the operating system, right?

Viktor: [00:18:44]
Yeah. But, let's say that I'm saying Git just because it's there, but we can replace it with version control system or we can replace it with a database. It doesn't matter. Some place where I have a state of my stuff through which I can collaborate and then see the history. It doesn't have to be Git.

Ant: [00:19:05]
Yeah, but many people imply that GitOps means automating the configuration management over pull requests and things that are not Git per se, but that are part of Git's accepted modus operandi. Do you see that?

Viktor: [00:19:26]
I mean it really depends on what you want to do. I am in favor of pull requests, when I think that somebody should review the change, whatever the change is, it doesn't matter what we're talking about. What somebody should review, posts a comment.

Ant: [00:19:45]
I like pull requests when working on iterating on code. They're definitely great, especially if you have the discipline to make the changes small and the discipline to review the pull requests because as we recently there was discussion about that in many places, you see pull requests stuck for weeks waiting for a review. We're waiting for changes on the review. And that's definitely an anti-pattern.

Viktor: [00:20:10]
That's the anti-pattern. I would say actually, if I would have that power, I would change GitHub right now and say that if there is no activity in a pull request after a day or something like that, just delete it or merge it automatically. Hey, nobody saw any problem. You had your time. Merge it. Because if you have a pull request and this is now, again, this is a problem of misinterpretation. Like if you go back to continuous something, whatever integration, delivery, it doesn't matter. You cannot call it continuous if it gets stuck in a pull request for four weeks. It's not continuous. You're faking it. You're absolutely faking it. Then you're back to weeks, months, whatever cycle. Then you say, Oh yeah, when I merge pull requests, then it's continuous. No, absolutely not. So it's misuse of stuff. Now I'm also in favor. I think that people who commit directly to the master, I think that they're superheroes. I think that that's amazing assuming that really works. That

Ant: [00:21:14]
anymore. We don't use master

Viktor: [00:21:17]
Yes, Exactly. So if you get to the point that you can push directly to the main line, that means that you are so secure in your processes, in whatever is happening after that push without human interaction that you don't even need the pull request. That's amazing. Very few got there. Many are faking it, and then wonder why things go wrong. I don't care whether you do a pull request or no. Actually I would like you not to do it because that means that you're so, so awesome that you can, you don't even need that part.

Ant: [00:21:52]
Yeah. That's basically what we're striving for with CI and CD. If you have good enough CI, the tests verifies your master and if you have good enough CD, and that takes me back to my original notion that CI and CD are not the same, especially in the cloud native world. Back in the days, we used to envision CI ending in CD. We have CI. All the tests passed and everything is green, so we can just push to production or we pushed to staging and we run the integration test on staging, and if that goes okay, then we can push to production. We don't actually push to production anymore. The whole premise of GitOps is decoupling the release from the delivery. You have a tested artifact, you have tested it as well as you could, again especially in the microservices world, you can only test your microservice. We've given up on integration tests. So you've tested your microservice and now you want to deliver it to production. You have no idea if it will fail or if it will not fail. So you need totally different mechanisms of safety in the production environment and you need to decouple it. You've tested. The artifact is okay. You don't know if it will work in production and that's a different story. That's a different pipeline and the tool that does that has to expose much more data. It has to work in much tighter integration with your observability. The tool has to allow for much more experimentation and debugging in production. The tool has again to differentiate between the changes that change the configuration and the prod. One of the anti-patterns that I see with GitOps or with any kind of deployment to Kubernetes today is that in the same helm chart we store config and the application definition and that basically makes us unable to tweak the configuration separately from the deployment. And that's the basics of being able to analyze problems in production. When we talk about observability today, it's very important and we strive to expose as much data as we can, but that has its cost. Nobody wants to see a huge New Relic, Datadog, you name it, bill in the end of the month. We only want the data to be there when we need it, so you want that ability to switch and toggle on prod when something goes wrong and get the data. How do we do that with GitOps?

Viktor: [00:24:40]
We can't. Uh, I think that tooling is not there. There are bits and pieces scattered everywhere, but it's not a mature thing. It's not a mature thing just to clarify mostly because what are our expectations today are very different than they were last year. And last year were very different than couple of years before. What I'm trying to say is that once we get to wherever we think we should be today, then again, we will say again, we are not there because now we need more and better and stuff like that, which is good. It's evolution. But going back to what you said, CICD, different, I, I agree, but that's more from maybe tooling perspective. In my head, I still have that term application lifecycle. To me there is one lifecycle of an application and that ends with a new release. I mean that ends with that existing release being replaced with the new one. In my head, I still see it kind of, Hey, there is one lifecycle of application. I have an idea. I developed some code. I write some tests. I pushed to Git or whatever. I test. I build. I release. End up with observing it in production. So from that perspective, to me, there is only one. Now from tooling perspective, definitely. Yes. The way how we deploy is not the same thing as how we manage infrastructure and it's not the same thing how we, I don't know, run tests.

Ant: [00:26:15]
I have this feeling that I'm trying to get you to argue with me. And you, you keep agreeing. Yeah.

Viktor: [00:26:21]
That happend a lot. Yeah.

Ant: [00:26:24]
Okay. Yeah. Well, in general, actually, anybody I talked to about this, everybody agrees. Yeah. There's a pain. There's a pain. There's a hole there. There's a gap there that's not closing. Well, we're hoping to change this now with the new startup that we started building.

Viktor: [00:26:43]
so what is it in a few sentences?

Ant: [00:26:46]
It's basically a new approach to continuous delivery. We're building on opensource. We think there are a lot of great initiatives. We love Argo and Argo Rollouts and we love Flux. There are some interesting ideas in Kapitan. There are a lot of interesting open source projects out there trying to tackle all of these issues, trying to understand how to make GitOps work with progressive delivery, because we have a conflict there. GitOps talks about having a desired state. Progressive delivery talks about having multiple states. Progressive delivery is basically about having multiple versions of truth, whereas you can compare those versions of truth and define if their quality is sufficient to become the absolute truth at some point for maybe a very short period of time until another truth replaces it. It's dynamic. It's much more dynamic than GitOps. How do these two things work together?

Viktor: [00:27:57]
Here's an example of things that I don't think that we can reconcile today. One day we will. I'm saying it just because you mentioned progressive delivery. You might be using Flagger or Argo Rollouts or whatever. If progressively changes the weight of the new release that is not reflected in Git. I'm going to ignore that for now. Now, rolls back when things go wrong and I want that. It's good. It goes bad. It rolls back. I'm a happy person. How do I reflect that in Git? We can argue whether that should be a good idea, but if Git is the desired state, then actually Git stops being the single point of truth the moment it rolls out and not to mention the complications I will have later on when I push additional changes to that same Git. It will go bazooka.

Ant: [00:28:49]
Exactly. Exactly. Yeah. You might think about reverting a Git commit, but everybody who's tried reverting Git commits knows it's not very good idea.

Viktor: [00:28:59]
Yeah, but even if revert the Git commit, then what am I doing? I'm actually then converting the desired state into the actual state, not the other way around.

Ant: [00:29:11]
Yeah, I think there is this issue with our systems, still being very much human centric. Because when we talk about desired state it's the state desired by who? Desired by humans. Right? We don't give machines enough autonomy there to define what they desire their state to be. There's a good example of all this also. I, as a human can define certain amount of resources for my application, but the cluster may be not able to satisfy my desires. So how do I make my cluster smart enough to define the resources appropriately itself?

Viktor: [00:29:56]
should you even?

Ant: [00:29:58]
That's another question. I believe we should. I believe we should provide our machines with better autonomy.

Viktor: [00:30:06]
Yeah, exactly. I mean, maybe I shouldn't even tell the machines. Not today, but maybe the state should be, Hey, why would I tell you how much memory and CPU

Ant: [00:30:17]
Exactly. There are projects in that line of thought like the virtual pod autoscaler and I think there are a number of controllers that provide some kind of automated resources or location. Again, what we want to do with the new project is try to reconcile all these great ideas to make things like canary easy, because you know, you look at Argo Rollouts, you look at Flagger. It's a to set up, right? You need to get all your metrics right and the integration with the observability systems, it's all very hand woven. It really requires extraordinary engineering expertise to set these things up and to make them work. We think it should be easier. Again, it's not, it is hard. It is hard today, but by the end of the day, it's not rocket science. It should be easier. I suggest we go back to talking about consulting now, because as you correctly stated in the beginning, we're talking a lot about tooling and we're talking about making our machines smarter and that's a great goal to strive for. But I also agree that continuous delivery often goes back to culture. Even if you have all the machinery in place, if you have folks in the organization whose sole responsibility it is to stamp the release before it goes to production, it's very hard to get to continuous delivery because what do those folks with the stamp in their hand do after you start delivering frequently 20 times a day and nobody has to stamp the approval?

Viktor: [00:32:08]
I would go even as far as saying that any organization that requires handover from one team to another during lifecycle of application. Doesn't matter whether you're talking about testing, deployment. If there is a handover from my team to your team, continuous cannot be used in any formal way as a term, as a word. It should be removed from dictionary.

Ant: [00:32:33]
That's why we strive to have those cross functional product teams. In most modern organizations, you start seeing this today, right? So they have microservices and there's one team manages all of the services, but then even if developers and testers work together, usually when you hit the SRE wall that wasn't built there to be a wall, but it becomes a wall. That's the question that I ask folks on my podcast. Is SRE just another silo? What do you say?

Viktor: [00:33:10]
Yes. It's not envisioned to be a silo, but it turned out to be a silo. Actually, SRE is still kind of less adopted known. I would say the same thing for DevOps. When I hear DevOps engineer, my head computes silo. In practical terms in implementation, there is a DevOps team, DevOps department, right? To me, that's sysadmin disguised. Not that there is no value in DevOps. I'm not saying that just to clarify. But it's a silo. Companies invent creative ways how to convert something into a department.

Ant: [00:33:54]
Yeah, because that's the way managers work. That's divide and conquer. It's easier to manage things when we put them in boxes. That's how you organize. In the past year, I've been helping a number of organizations as a consultant. Orgs that are trying to build their SRE function. They find it hard because those SRE engineers, first of all they're probably younger than us. They didn't go through the DevOps revolution. Most of them don't know the pain that we've experienced, but they start experiencing it now and some of them think it's normal. The pain is normal. Some of the things that when they are told that they're hold the sole responsibility for the reliability of the production environment, that's okay. But we both know that's exactly what gives birth to silos. If you have folks responsible for reliability then those folks become the guardians of the production and they won't let any changes go near production. That's what I'm trying to do with those folks when I'm coaching them. I'm saying you need to become transparent. Okay. So you need to put the guard rails in place, but you need to become transparent and if people want to break the production, they'll find a way.

Viktor: [00:35:21]
SRE's job ultimately is to help those teams

Ant: [00:35:27]
Not shoot them in the head, and not shoot themselves in the foot

Viktor: [00:35:30]
Exactly. Yeah, exactly. Help them do their job better. It's not really about doing the job for them.

Ant: [00:35:38]
and making them maybe making them aware of the chance of shooting yourself in the foot

Viktor: [00:35:45]
There is another extreme where some are going, thinking that I have a team of six people in charge of this application and they're going to be truly self-sufficient. That does not exist. Ah, let me clarify. There is no way on earth that you will be really, really good at programming, let's say in Java or Go or whatever, and writing tests and build scripts and pipelines and mastering Kubernetes and understanding Istio and networking and storage. There is no way on earth that there will ever be a single team or small team that will dominate all that. That's where SREs and many other roles are coming in, but not as a way to take control of those things, but rather to provide it as a service to simplify it, to educate, like, okay, you don't need to learn everything about Kubernetes but you do need to know enough for your application to run successfully, which is a huge difference. If you have somebody managing your cluster as a whole, a person that is developing an application doesn't need to know the ins and outs of networking, but it needs to know how to define it, which is some tricky balance to establish.

Ant: [00:37:06]
Not even that. As you described it, this is like super challenging. But you've counted so many things and that's a SRE. You have a number of SRE engineers that know all of these things. They become hugely expensive because those are folks who have at least 15 years of experience and they'll probably be bought out by Google and then you find yourself with a SRE engineers who have three to five years of experience, and even with their best intentions, they can't know, and be experts and be consultants and coaches and mentors and all those things. So how do you build that?

Viktor: [00:37:52]
That's the challenge. I mean, it's hard. It's a moving target.

Ant: [00:37:57]
That's why you need a consultant. That's where I sell myself as a consultant. You can hire a consultant. You can hire a coach. He'll take you by the hand and he'll lead you through the stormy waters of SRE culture.

Darin: [00:38:14]
that's a good line there. Let's rephrase it just a bit. Hire a good consultant, a consultant that has done it before. Don't hire a consultant that just graduated from college last week.

Ant: [00:38:27]
Yeah. Yeah. Well, yeah, that's true for any hire. I suppose. No, no, no, no. Actually you should hire folks who just graduated for college, but, but not, not as someone you can consult with

Darin: [00:38:40]
Not as a, not as a consultant, correct. There has to be some gray hair underlying whether you've colored it or not. There needs to be some underlying gray hairs there to have a reasonable consultant. Hi, I started turning well, I started turning gray when I was 30. So it was, and it hasn't changed much since then, but it's what it is.

Viktor: [00:39:07]
the real pitfall that I think that many younger people fall is that people think that somehow if you understand the latest and greatest, you're good, which is horribly wrong. If you learn Kubernetes, that's pointless if you don't understand how VMs work. Somebody ultimately needs to understand all those layers before the latest layer that we are using today, because the latest is still stacked on top of all those below.

Ant: [00:39:36]
I actually see this is happening to myself in the last five years. Sometimes I find myself sitting next to somebody who's working with a technology I've hardly ever seen. Maybe TypeScript. I hate Javascript. I almost never write Javascript, but then they're having some issue and I look at that and I suddenly. I understand what the problem is before they do, even though that's the technology that they been working with just because I've seen something similar in another stack, but these things, they ultimately all work the same.

Darin: [00:40:11]
So if anyone wanted to get in touch with you, Anton, is that, is that, is that more correct? Or does it need to be more guttural than that? Anton...

Ant: [00:40:21]
no, it's it's Anton. Yeah, yeah, yeah. Okay. Yeah. Folks usually call me Ant that, that, again, we discussed it. This goes far back to the days of when folks used to compile Java with an automation tool called Ant. With the years I've became a bit of a Maven and there's another Java joke there, but in the end you'll find me in the... no nevermind. Okay. Yeah. So where folks can find me. I'm on Twitter. My Twitter handle is antweiss, which is antweiss. My company is Otomato. It's at otomato.io and our new enterprise, which is going to revolutionize the continuous delivery. It's called Canarian. So it's at canarian.io. Not a lot to see there yet, but watch for updates.

Darin: [00:41:21]
There you go. Viktor, any final. You guys never really fought any good. That's the problem with this show is we, Oh, wait, you said you also have a podcast.

Ant: [00:41:31]
Yeah. Yeah. It's called DevOps Shorts. I'm the host of DevOps Shorts. It's a short show. Only 15 minutes and three defined questions. We talk about DevOps and love and I'll be happy to have both of you folks, of course each separately, because it's a one on one show. I'll be happy to host you.

Viktor: [00:41:53]
That sounds good. That's

Darin: [00:41:56]
Is that the first time we've been invited anywhere else after somebody has been on with us. I think it is. Most people run away screaming after they've been on with us

Ant: [00:42:06]
no, I actually enjoyed it. I really wished you fought with me a little bit more. I was expecting this, if some, somehow from, from the invitation I was expecting

Viktor: [00:42:17]
yeah, exactly. I was expecting to argue as well. I was expecting you to be unreasonable

Ant: [00:42:26]
Yeah. Well, you know, I'm usually unreasonable when there's something I don't want to do as I said, like somebody asked me, can this kind of traffic go through this load balancer and I'm not in the mood of configuring the thing, so I'm saying no, that that's not a good solution. Just use something else.

Darin: [00:42:48]
and usually that would be the correct answer.

Ant: [00:42:50]
Uh, not so sure. Not so sure. We all make our mistakes. Even the best consultants out there. Yeah, that's the important thing to realize. Consultants are human too.

Darin: [00:43:05]
well, unless you're a Viktor because Viktor doesn't make mistakes.

Ant: [00:43:08]
He's not a consultant anymore.

Darin: [00:43:10]
Oh, that's true. Okay. Ant, thanks for hanging out with us today.

Ant: [00:43:16]
Thank you, folks. I enjoyed this a lot.

Darin:
We hope this episode was helpful to you. If you want to discuss it or ask a question, please reach out to us. Our contact information and the link to the Slack workspace are at https://www.devopsparadox.com/contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at https://www.devopsparadox.com/ to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.