DOP 85: The Hidden Costs of DevOps

Posted on Wednesday, Dec 9, 2020

Show Notes

#85: It’s simple to quantify the hard costs of DevOps. Labor. Hardware. Software. However, it’s a lot harder to calculate the hidden costs of DevOps. Today we talk with Yuval Oren about what he is seeing in the industry and how technical debt may be the answer to your problems.

Guests

Yuval Oren

Yuval Oren

Yuval is a DevOps and DevSecOps consultant at PineWise, a boutique consultancy he started to help tech companies improve their developer productivity.

Hosts

Darin Pope

Darin Pope

Darin Pope is a developer advocate for CloudBees.

Viktor Farcic

Viktor Farcic

Viktor Farcic is a member of the Google Developer Experts and Docker Captains groups, and published author.

His big passions are DevOps, Containers, Kubernetes, Microservices, Continuous Integration, Delivery and Deployment (CI/CD) and Test-Driven Development (TDD).

He often speaks at community gatherings and conferences (latest can be found here).

He has published The DevOps Toolkit Series, DevOps Paradox and Test-Driven Java Development.

His random thoughts and tutorials can be found in his blog TechnologyConversations.com.

Rate, Review, & Subscribe on Apple Podcasts

If you like our podcast, please consider rating and reviewing our show! Click here, scroll to the bottom, tap to rate with five stars, and select “Write a Review.” Then be sure to let us know what you liked most about the episode!

Also, if you haven’t done so already, subscribe to the podcast. We're adding a bunch of bonus episodes to the feed and, if you’re not subscribed, there’s a good chance you’ll miss out. Subscribe now!

Signup to receive an email when new content is released

Transcript

Yuval: [00:00:00]
You're making trade-offs. Yes, I think technical debt is something that you can actually think about and forecast in a lot of cases where it's going to affect you later, but it's a business decision and a decision to accumulate it.

Darin:
This is DevOps Paradox episode number 85. The Hidden Costs of DevOps

Darin:
Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darin and Viktor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can, and mix it with random buzzwords like Kubernetes, serverless, CI/CD, team productivity, islands of happiness, and other fancy expressions that make it sound like we know what we're doing. Occasionally, we invite guests who do know something, but we do not do that often, since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. PS: it's Darin reading this text and feeling embarrassed that Viktor made me do it. Here are your hosts, Darin Pope and Viktor Farcic.

Darin: [00:01:15]
Last week, we talked to PJ about Mattermost, which is a Slack competitor or a Microsoft Teams competitor, but most people would not think of it as a DevOps tool. I think in most places today, using real-time communication tools over email, it quickly becomes a DevOps tool. Whether it's a person to person, or a machine to a human, whatever it is, it's usually a lot faster. However, a lot of people don't think about the cost of actually running those types of things. Viktor, today we've got a guest with us. Another guest.

Viktor: [00:02:00]
You mentioned that

Darin: [00:02:02]
I did mention that. It's still early for me when we're recording this. So I may not remember what I said. I may not remember what I said. Wait, did I say that? But anyway, we have Yuval with us today. How are you doing?

Yuval: [00:02:16]
Hi, thank you for having me today.

Darin: [00:02:18]
Yeah. Why don't you go ahead and introduce yourself and set up how you ended up with us today.

Yuval: [00:02:25]
Okay. So, I'm Yuval. I'm a DevOps consultant. I own a small DevOps consultancy in Israel. We help engineering teams become more productive, hopefully. I started as a information security engineer then moved to be a full stack developer and then back to the more opsy side and became a DevOps and DevSecOps consultant. This is what I've been doing for the past eight years now. I became really interested in the whole idea of value in DevOps and decision-making. Everyone talks about value when it comes to DevOps. It's something that comes up a lot. I don't think people actually know what it means and they don't implement it. For example, writing some soggy title on your JIRA story doesn't make for value. Changing the color of the header to blue is not necessarily business value. It could be, but not necessarily. I think some companies are doing it. These are the DevOps unicorns. They're doing everything right, but I think for most, it's not really defined. So I became really interested in the whole concept of value and also the decision-making of how do you make the right decisions when it comes to new projects. What to implement. How to implement. What are the trade-offs? That's when I started thinking about hidden costs. It's something to be aware of. When you're making decisions, I think that most managers, DevOps engineers are not seeing the full picture and sometimes are making the wrong decisions by taking the wrong tradeoffs. So I wrote this little post about the hidden costs in DevOps and the value of DevOps. I had a few ideas about how you may improve your decision-making by actually taking into account this mental model of hidden costs.

Viktor: [00:04:28]
So what is the gist of it, or not the gist, depending on how far you want to start.

Yuval: [00:04:34]
Okay. So the gist of it is that sometimes when you're making decisions, you don't have all the information. I'll give you an example. A recent, very good example is Pinterest just had this blog post, I'm not sure if you've seen it, about how they reduced their build time by 99% by changing just one line of code with how they fetch the code from Git. You're reading it and you're thinking, okay, that's a very neat improvement. This is something that we should be looking at. But then you sit back and think, wait, so they were spending 40 minutes for just pulling things from Git. They were doing other things. They were working on features. They were working on a lot of things and yet no one actually took the time to, I'm guessing, I'm assuming, don't blame me for that, but just from the blog posts, so why didn't they address this sooner? You can look at it as, and again, I'm making these numbers up. Suppose that they have 1000 engineers working with that code. Maybe 500 of them are pushing their code and waiting 40 minutes for it to complete. The build. So you have a lot of wasted time for developers. You can fill up a few positions with just the amount of money that developers are sitting idle. When they were making other decisions about new features, maybe they were trying to save costs somewhere. Maybe they were looking at their cloud bill and saying, Whoa, wait, we're paying so much for this cloud bill. Maybe we should address it and put two people on that or three people on reducing costs. But is this the right thing? Is this the actual problem that we have? So this brought me to think more about that because I'm seeing a lot of companies and sometimes their decision-making is questionable. As an outsider, it's easier for me to criticize it. I'm not there. So that's the idea here.

Viktor: [00:06:40]
Oh, you were wrong. It's not sometimes questionable. It's questionable most of the time. Okay.

Yuval: [00:06:46]
Well, um, I'm a, I'm a consultant. I earn my living...

Viktor: [00:06:47]
but you have to be nice.

Yuval: [00:06:49]
Yeah. I need to be diplomatic here. Right? I have clients. Uh, yeah. And, and, and I think that it's more than that. This is something that is simple enough to understand. I was talking about direct time of the developer, but you have other things like opportunity costs and other more businessy things that people don't take into account too much, I think.

Viktor: [00:07:15]
You know, those news are very often false and exaggerated or come from companies who are completely lost, at least in my experience. I hear those news all the time coming from company X who has a product Y and then some company adopts it and two weeks later, they have seven times increase in productivity and all those things, which is, Hey, you're trying to justify the cost you just made by inventing the numbers, or you were so bad that that actually a slight investment in something suddenly increased the productivity. I'm very skeptical about those news, you know, seven times increased productivity type of stuff. In reality, it doesn't happen often really. There are companies who do really turn one knob somewhere, and that ends up being, I don't know, five millions cost saving. But those are the types of companies that most of us cannot take as an example, I believe, because there are some highly optimized companies where actually those five millisecond saving here and there turns out to be Google size. Five milliseconds multiplied with five millions, that's this much. Then a bank or some other enterprise company takes that news and says, Oh, I can do the same. No, you cannot.

Yuval: [00:08:40]
Yeah, you can not. I can give you more. Let me give you a little story, a little example from a past client just to illustrate more my thought process. I had this client, this was a very small startup. It was funded. I was working with the head of engineering and the CEO. The CEO was involved in most of the decisions. At one point, he calls me. They were talking about implementing a firewall to this end point for one of their vendors. At one point, he called me to brag how he managed to save $600 by spending the day, and this is the CEO, spending the day implementing and created his own firewall from open source. I told him, wait, stop. You spent a whole day as a CEO to save $600. Is this really the trade off? This is how you quantify your time spent. This is someone who was constantly on the line, or visiting customers, investors. Yet, he wanted to save $600 by doing the work himself. That is the mindset I think. I have more examples that are more concrete. A lot of people are trying to save on their cloud costs. Usually there's this process where you first hear about the cloud, then you migrate to the cloud, and then when you're done, you receive your AWS bill or Azure or whatever. Panic sets in and you're suddenly in a project that is trying to reduce costs. But are you reducing the right things? Are you removing build servers that make your developers wait, or your pipeline slower? Maybe you're, for the sake of reducing cost, you are switching to spot instances where it's doesn't make sense for production, and then you have more issues in production. It's not always just the sum. I think some of it is just part of inertia. Some people are just making decisions because that's what they have to do and that's life. Sometimes it's because people are not exposed in a lot of companies because of hierarchy to the actual business needs. Is this something we should be working on? Really? Is this the most important thing? They don't question it. They are being handout this feature, this task, and they never question it. It's just, okay, let's do this. In some of these cases, it just doesn't make sense. I think you have to have the whole picture in order to make the right decision. Is this gonna affect churn somehow? Churn is something that is really important for businesses. They're working very hard to reduce churn for SaaS companies, for example. That affects things like customer lifetime value and then when you compare it with customer acquisition costs. There are a lot of things that you may not be exposed to when you are making the decisions. I think we could do better when making decisions. I think it's not just the thing where you do this magical optimization and then everything is better. I think it's something that you do every day. This is how you decide on what goes into technical debt or not. This is how you decide if this project is the thing you should be focusing on. I think there are a lot of examples here that you can look for. That is, I think why it's really important to talk about hidden costs.

Viktor: [00:12:19]
So what would be example of a cost reduction that made a lot of sense?

Yuval: [00:12:25]
Most of them will make sense. Some resources are never used. If you're paying for capacity that you're never going to reach, I think that's cost that is wasted. If you have some abandoned projects that are still running, you can kill them. You may make the decision and discover that they're not worth it. I think that most direct costs and reducing them are valid. They are the things to do, but sometimes it's not as simple as that. So I do think that yes, most direct costs are important to take care of.

Darin: [00:13:05]
So in taking care of those direct costs, we went at it from one angle. What are the indirect costs that most people don't even think about? I sort of leaned on it when I started today talking about Mattermost and these other things that are tools that are costs, but a lot of people just think, well, they're free. Just because something is open-source and we didn't actually pay physical money for it, it is running on a server. We're having to pay people to maintain it. What are other types of things that are those true hidden or indirects that people just don't think about?

Yuval: [00:13:44]
So things like over complicated workflows. These are the things that you're spending time on. Let me start with, when we're looking at, specifically for tech companies, I think that is more relevant, engineering teams are force multipliers. They're allowing your company to grow and to create a product. For every dime you spend on development, you can then grow your revenue without too much scaling. You're not a company that needs to have inventory necessarily. You're not a company that needs to hire maybe too many people at point of sale or things like that. So when you're making developers more productive, you're basically making a huge investment. I think just to clarify, this is where most of it is coming from. This is where I think and why it's relevant for DevOps because we're not necessarily talking about how they could develop faster or create more complicated features or the level of engineers, but laying the grounds for engineers to work faster, to work better and like laying out the pipes for them, the infrastructure. So this is where it comes from in DevOps. Then when you look at that, you think, okay, so workflows. It takes a lot of times because you have workflows that are not working. You have too much bureaucracy. That is one thing that is causing developers and development to slow down. You're not putting out the features you set out to create because it gets complicated. So that is one kind. I think the other kind is technical debt. It's something that accumulates and usually you don't account for it unless you really suffered from it. You may be making a decision to do something quicker and not take a little time, but then later on, this could be a huge problem that will slow you down later on. So I think technical debt is another thing that you're not necessarily thinking of and the effects of it.

Viktor: [00:15:58]
The problem with technical debt is usually that you don't know really whether it's a debt or no often, in terms of time. Like if I say, okay, on average, it takes three days to develop a feature here. If you don't have a point of reference that you can use to measure that, you don't know whether three days is too much or too little. Too fast or too slow, because that's how much it takes to develop here. Maybe in some other company, the same thing would be developed in a day, maybe in a week, but you don't know that.

Yuval: [00:16:29]
Yes, but you do know the decisions that come into that because I've been in so many standup meetings where the developers were saying, yeah, it will take me this much time. Then they started to discuss why. I'll need to implement that certificate and create a certificate authority. Okay, you know what, let's just use a self-signed certificate. So these are the things that you come into when you were aware that you're accumulating technical debt. You making a decision to build things that are not as scalable, which sometimes is that's the right decision. I'm not saying that it's not, but you're piling on basically incompetence sometimes. I mean, having things that are not production ready maybe, for the sake of speed. At one point, that will be a problem. For developers that might be not developing using the right patterns or taking other trade-offs for maybe performance, for us this could be from the DevOps perspective, maybe not having a production ready cluster for that. Maybe you're making trade-offs. Instead of using a vault, for example, for security, you're using hard-coded secrets. You're making trade-offs. Yes, I think technical debt is something that you can actually think about and forecast in a lot of cases where it's going to affect you later, but it's a business decision and a decision to accumulate it.

Viktor: [00:18:07]
I agree on that completely. It's just that I have that feeling that very often companies, teams, people are not aware that something is going to increase the technical debt. You used the self signed certificates as example. I know many cases where actually companies are not aware of existence of self signed certificates. This is normal. This is how we do certificates. This is how we were doing it always. That's the only way to do it. So therefore, it's not a technical debt.

Yuval: [00:18:40]
The fact that you're not aware of it, it's still technical debt. I totally agree that they're not necessarily aware of it. This is exactly my point. I'm not saying that you can solve everything by just saying, okay, there are hidden costs, like technical debt. I think that you need to be aware of the technical costs. As you're thinking, as you're making decisions, you have another tool to look at. So I'm aware that there's this thing called technical debt and I now know that when I'm making a decision, I should take it into account. Maybe I'll think of it and maybe I'll prevent this technical debt from happening and maybe I'll be able to say, you know what, it's still something that I'm willing to take on, the debt, because releasing this feature is super, super important and it needs to go out now. That's okay. But you've been aware and you had this other angle to look at your decision when making it. So this is exactly the thing for technical debt. Just having it in your mind to think about.

Viktor: [00:19:47]
Absolutely. It's about, mostly about, the awareness. In my head, it's like taking a loan from a bank. That can be a horrible thing to do, or it can be a great thing to do. Loan is not necessarily a bad thing as long as I'm aware that actually taking a loan will help my business grow faster or whatever that is. I agree that people would max on their credit cards without really having a plan behind it or a clear benefit behind taking a loan and then we are running into problems as human beings. To me, that's very similar to what is happening in our industry. It's just accumulating debt without the benefit, because debt is not a bad thing, as long as it's lower than the benefits it creates. I could even argue every line of code you write is a debt in one way or another.

Yuval: [00:20:40]
Exactly, yes. As long as you're paying it off and you're handling the interest, that's good. But I've actually seen cases where companies basically cease to exist because of technical debt. They were slowed down that much at critical times that being nimble enough with their product actually brought them down and I've seen multiple examples of that. I say, it's like backups. You don't back up. It means that probably you've never had a big enough crash or a big enough disaster to warrant a backup. Once you experienced that, it's something that you take you more seriously. So I think that's with the technical debt. If we're thinking about other examples of hidden costs, I can give you another example. This is tied into opportunity cost. Let's take this company. This is a small or not that small startup and they're creating a product. Now they have this opportunity to get this new client, and that means setting up a new cloud account, maybe spinning a Kubernetes cluster and doing some logging and preparing it for the customer. Now we as DevOps people, we think, okay, we need to spin this off with Terraform and we need to have infrastructure as code and we need to have all the bells and whistles because that's how we do things. That's the paradigm. We know that we have our guidelines as best practices in DevOps. Now this is something that is one-off. You're going to do it only once for this specific customer and this is a very important task for the company. You could just go into the user interface. You can spin it in a few minutes. You can have something running within a day, but it's not the nicest thing. You're doing things that you may usually frown upon. But is this the right thing to do or not? What you may not know is that this company is running out of runway. They had investments and now they have this deal with the investor, this is something called milestone investing, which means that if you reach this arbitrary milestone set by the investor, you're going to get another cash infusion. A huge chunk of cash the company will get. Usually that ties into the number of clients you have, the number of projects, not necessarily revenue or how nice your Kubernetes cluster is set up. You're at the brink. You're running out of runway. You need to do this and if you complete this, you will get this client and by doing so, you're going to get a lot of money. The company now has three more months, six more months of runway. Which decision do you make? My point is that I think that a lot of the engineers under the DevOps hat, they're not exposed enough to these kinds of decisions. Had they known, yeah, I would just go ahead and click like a madman on the console and rig something up. Yes. This is technical debt that we're taking on, but this is something that is one-off. So I could spend, so we have a few types of hidden costs. I'll be spending three days maybe to create it fully create it with all the infrastructure in code and create a pipeline and have logging and have everything that is super duper nice or maybe it will take a week. So that is cost for me, the cost of work. I'm not going to do it again. I'm not going to replicate it again. Ever. This is just one off demo. I don't know, maybe. The other cost is the opportunity cost of receiving that milestone, because you were insistent of taking these three days and when asked, how soon can you deliver this, you said three days. In your mind, that's how long it should take minimum at minimum, right? I can't go below because these are my standards. I think that with more information and having a clearer picture and less rivalry between units in the business, you can make better decisions. So that is, I think, another good example of the hidden costs of DevOps. You're scaling things that you don't need to scale. You're limiting your opportunities for the sake of doing DevOps.

Darin: [00:25:24]
Let's park it right there. Where are you ended up at is in that example at the end of small company running out of runway, but the person doing the work has no idea. That's the problem. Having that transparency that everybody at key milestones and not waiting until the day before you're running out of runway, doesn't help. Especially, if it's a small company, you need to be able to have that transparency that everybody needs to understand. Hey, am I going to get a paycheck next week or not? I think that's important to everybody and usually that will help mitigate that problem.

Viktor: [00:26:03]
Those are difficult, really difficult, decisions to be honest, because it's easy to fall into either extreme. It is very easy to say this is how we do it, so therefore it needs to be a week, even though it could be a day, but it's also easy to continue with that it could be a day and then actually you end up never getting out of the mess that you're accumulating. For that never to be manageable, let's say right?

Darin: [00:26:34]
Right. Yuval, if people want to follow you right now, where can they find you?

Yuval: [00:26:38]
Oh, they can find me on Twitter at yuvalo, Y U V A L O, or on my blog, that's pushbuildtestdeploy.com.

Darin: [00:26:48]
Cool. And we'll have links for both of those down in the show notes. Thanks for joining us today.

Yuval: [00:26:53]
Thank you for having me.

Darin:
We hope this episode was helpful to you. If you want to discuss it or ask a question, please reach out to us. Our contact information and the link to the Slack workspace are at https://www.devopsparadox.com/contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at https://www.devopsparadox.com/ to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.