DOP 106: The Difference Between SRE and DevOps

Posted on Wednesday, May 5, 2021

Show Notes

#106: There seems to be a great debate about what a DevOps engineer is and what a SRE is. Today, we throw our hat in the ring and attempt to dispel the myths that we see running rampant throughout the industry.

Rate, Review, & Subscribe on Apple Podcasts

If you like our podcast, please consider rating and reviewing our show! Click here, scroll to the bottom, tap to rate with five stars, and select “Write a Review.” Then be sure to let us know what you liked most about the episode!

Also, if you haven’t done so already, subscribe to the podcast. We're adding a bunch of bonus episodes to the feed and, if you’re not subscribed, there’s a good chance you’ll miss out. Subscribe now!

Books and Courses

Catalog, Patterns, and Blueprints

Buy Now on Leanpub Buy Now on Udemy

Kubernetes Chaos Engineering with Chaos Toolkit and Istio

Buy Now on Leanpub Buy Now on Udemy Buy Now on Amazon

Canary Deployments to Kubernetes using Istio and Friends

Buy Now on Udemy

Hosts

Darin Pope

Darin Pope

Darin Pope is a developer advocate for CloudBees.

Viktor Farcic

Viktor Farcic

Viktor Farcic is a member of the Google Developer Experts and Docker Captains groups, and published author.

His big passions are DevOps, Containers, Kubernetes, Microservices, Continuous Integration, Delivery and Deployment (CI/CD) and Test-Driven Development (TDD).

He often speaks at community gatherings and conferences (latest can be found here).

He has published The DevOps Toolkit Series, DevOps Paradox and Test-Driven Java Development.

His random thoughts and tutorials can be found in his blog TechnologyConversations.com.

Signup to receive an email when new content is released

Transcript

Viktor: [00:00:00]
DevOps is not a role and that's why I don't like term DevOps engineer. It's not a role. It's an idea. It's just a description of an idea that we are yet to implement and I could even argue that SRE, just like containers and microservices and continuous delivery, those are all potential implementations of that idea.

Darin:
This is DevOps Paradox episode number 106. The Difference Between SRE and DevOps

Darin:
Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darin and Viktor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can, and mix it with random buzzwords like Kubernetes, serverless, CI/CD, team productivity, islands of happiness, and other fancy expressions that make it sound like we know what we're doing. Occasionally, we invite guests who do know something, but we do not do that often, since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. PS: it's Darin reading this text and feeling embarrassed that Viktor made me do it. Here are your hosts, Darin Pope and Viktor Farcic.

Darin: [00:01:23]
Viktor recently had a video that he did titled what is the difference between SRE and DevOps? My short answer is one is a real thing and one is not. SRE is real thing because Google wrote books on it.

Viktor: [00:01:40]
Exactly. Both are real concepts, some better defined than others, but yes, SRE is something you can touch. SRE is something that you can say, I'm SRE and DevOps is not something you can say I'm DevOps engineer.

Darin: [00:01:54]
Those are fighting words. In case you don't know what an SRE is a site reliability engineer. So it's defined. In the acronym, it has the word engineer and it defines what that engineer is doing. That engineer is providing site reliability. A DevOps engineer is a made up term that means nothing.

Viktor: [00:02:21]
and yet everybody has it in his or her title right? So here's the question and I agree completely with you if SRE engineer is a thing and DevOps engineer is not a thing, how come that we have infinitely more DevOps engineers than SREs? For every time I hear that somebody is an SRE, I hear at least 10 people being DevOps engineers.

Darin: [00:02:46]
Because HR doesn't know how to actually create a role for people to do. Neither do the hiring managers because a DevOps engineer if you were to try to say what it is, it's basically the person that takes out the trash.

Viktor: [00:03:00]
Exactly. It's a person who makes sure that Jira is up and running or that creates pipelines for everybody else in a team. What would be the closest to majority of DevOps engineers in terms of previous titles? Would that be Oh my mind stopped. What was the name?

Darin: [00:03:21]
It would be a well I don't know either. The way I see a DevOps engineer in my head is the person that worked in the data center four levels underground and was just running around and re-powering servers and doing that kind of work. It's not a real thing. I mean those people were a real thing cause we needed them back in the eighties but in 2021, it's a useless term.

Viktor: [00:03:47]
It came back to me, the word I was looking for. Shared services.

Darin: [00:03:52]
Shared services. Good one. Yup. Yeah, what is shared services?

Viktor: [00:03:56]
It would be typically a team that has different members and each of the members of the team would be in charge of one or more applications. There would be a person who makes sure that Jenkins is up and running and creates templates or shared libraries that everybody else will use or the one who manages Jira and creates workflows for Jira tickets that everybody else would use. At least my experience with shared services is that it has two primary roles to make sure that some application provided as a service to the rest of the company is up and running and simplifying the usage of that application.

Darin: [00:04:38]
Well in a highly functioning shared services team that would be true. It was rare that I saw a highly functioning shared services team.

Viktor: [00:04:45]
Yeah. There is always discrepancy between this is the idea and this is how it really works. But at least that's my experience with shared services and many shared services are now DevOps, right?

Darin: [00:04:55]
Because it's the new hot title and shared services was the new hot title. I don't remember what it was before shared services. Maybe it was the data center operator because it goes along with time. We've talked about this. The consumer expectation has increased not only in the amount but also in the velocity.

Viktor: [00:05:19]
You know why shared services was a hot name? Because before it was IT.

Darin: [00:05:24]
Oh right and you can't just use IT because that's too general because we replaced one general term with yet another general term and another general term that doesn't that actually means less than IT. I'm just shaking my head right now.

Viktor: [00:05:38]
But generally at least from my perspective and everybody has a different opinion and this is Patrick's fault because he never really defined what DevOps is which is cool at the same time but anyways the way how

Darin: [00:05:52]
Patrick is Patrick Debois the godfather of DevOps Okay go ahead In case people don't know who Patrick is.

Viktor: [00:06:00]
My perception or idea of what it really is is that it's a way to join operations and development into one team or together so that we can have self-sufficient teams that are in charge of an application from requirements until they are running in production. It's about building self-sufficient teams and you cannot build self-sufficient teams without operational knowledge. That means that everybody in a team knows how to write Java and deploy something or the team has expertise for deployment in one person and writing Java in another. That would be a separate subject but it's all about removing the separation between operations and development within the concept of being able to deliver an application from beginning to the end to production.

Darin: [00:07:00]
When I watched that video you actually had two graphics up. You had a stack of items that called out what SRE is and what DevOps is. But what you just said helps reinforce that. So if you haven't watched the video, the link to that video will be down on the show notes but you were talking about a team a self-sufficient team being operationally sufficient.

Viktor: [00:07:29]
Yes being sufficient and that includes operations.

Darin: [00:07:33]
Okay There are some highly functioning teams that do that but I don't see that very often. As much as we like to say that it's happening, it really isn't. Just like we like to say everybody is using Kubernetes in production. That is just not true.

Viktor: [00:07:50]
Yes. When I say everybody that usually means everybody who is within a five years range of current situation in technology in the past. If it's more than five years I don't include them in the word everybody. I don't know how to call it.

Darin: [00:08:08]
This is going to be one of those scenarios that for the companies that still aren't up to 2016 standards using your five years, there's still a lot of big money people that still aren't to 2016. Large revenue type companies.

Viktor: [00:08:24]
Oh yeah. That's the majority. From the revenue perspective that's literally a majority and from a workforce perspective as well. Going back to that operational experience being a necessity in a team. Let's forget about production. Does it make any sense for me not to be able to deploy my application on my laptop?

Darin: [00:08:47]
Hmm. That depends.

Viktor: [00:08:50]
My application not other dependencies I'm not going Just what I'm working on.

Darin: [00:08:58]
Well you should be able to deploy it on your laptop. Yes.

Viktor: [00:09:01]
Exactly. Okay cool. Then, if I know how to deploy my application to my laptop, shouldn't I be able to deploy that same application to a test environment where I'm going to run functional tests? It's a very small difference right? It's the same application. Same method of deployment potentially.

Darin: [00:09:21]
Technically the answer should be yes

Viktor: [00:09:24]
Logically speaking not

Darin: [00:09:28]
Okay Logically speaking, yes. I'll just stop there.

Viktor: [00:09:31]
And then from test environment to staging also it's the same thing right? and staging is supposed to be the same as production, so I should be able to deploy to production as well

Darin: [00:09:40]
Logically, yes.

Viktor: [00:09:42]
What I'm trying to say is that I know that in practice it's not like that but those are very small differences between what I do on my laptop, what I do in test environment, what I do in staging or preproduction, what I do in production. Here's the problem. If those differences are huge and they often are because how I deploy to production is completely different than how I deploy to test environment or locally then we have a serious problem that actually we do not know whether that will work in production until it gets to production. So we have to make those processes very similar. I cannot say the same. It's never going to be exactly the same in test environment as in production but needs to be very similar. If it's very similar and I'm capable of deploying it in one of those environments then I should be capable of deploying it to other environments including production.

Darin: [00:10:41]
I think the key word that you're using there is capable.

Viktor: [00:10:45]
Yes. Now the problem is that many teams I worked with when they deploy on their laptop they never even opened the manifests or the build scripts. People deploy on their laptops without knowing anything about it and that's a problem. I still see people that expect other people to create their Maven build script for their application because that's somehow magic that is beyond comprehension of a Java developer. That difficulty is not conceptually different than difficulty to define how to deploy something. Conceptually, it's the same thing. You should be able to understand how something works that you need in a sufficient level to be able to tweak it and change it to suit your needs. Now I'm not saying that all those teams that have now operational knowledge and we call them DevOps something I'm not saying that they necessarily need to be able to write everything from scratch. Hey I'm going to create a cluster from scratch. I'm going to create my manifests from scratch. I'm not saying that. There are other people who can help but you need to have sufficient level of understanding and knowledge to be able to at least use those things. That might be a better description. Let's say that you are using AWS and let's forget for a moment some special rules and norms and policies that companies have when they use AWS. Just a simple example right? Shouldn't everybody be able to know how to create a VM in AWS? I'm not asking anybody to understand how VMs are created behind the scenes. I'm not asking anybody to manage their hypervisors. No. AWS created a service but you cannot expect somebody else to have the knowledge how to click a button create VM and you cannot learn that.

Darin: [00:12:56]
Well, I disagree. I don't believe some people are actually capable of knowing how to push that button to create a VM.

Viktor: [00:13:04]
Some people are capable or incapable?

Darin: [00:13:06]
Incapable of,

Viktor: [00:13:09]
I would change that sentence. I would say that some people are taught not to know how to push that button. Some people never tried to learn that but I cannot believe that anybody is not capable of learning how to push the button. There are reasons why they don't. There are reasons why they don't know how to push the button but capable? Everybody's capable. It's not rocket science.

Darin: [00:13:38]
Well, again, I would disagree. It is rocket science for people that are unwilling to learn.

Viktor: [00:13:44]
Yes. Yes. Yes. Exactly, but it's rather unwilling than incapable.

Darin: [00:13:49]
I was trying to back you into a corner and it didn't work. Yes It's the the incapable shows up as not that they're incapable, but rather that they're unwilling. Whether that's unwilling because or unwilling because of pressure from outside forces, meaning management saying no, don't do that. That's some other team's job.

Viktor: [00:14:16]
Since emergence of agile and before that as well but let's say that that's when it became mainstream, we realized that the biggest inefficiencies that we have in companies are due to handovers. That's the killer of all productivity. Handing over something to somebody else to do something. If that's the biggest overhead then the only way to avoid it is not to hand it to anybody else and the only way to be able not to hand it to anybody else is to do it yourself and the only way to do it yourself is to use services provided by others. I'm not against a team that will create some base Helm charts for you, but you need to be able to tweak what's on top of the base Helm chart in order to deploy your own application. You can expect others to create the service that you consume, whether that others is AWS or a different department but you cannot expect others to do the work for you. Actually you can expect others but then we are dealing with handovers and that's where we spend more time waiting for something to happen than actually doing our jobs.

Darin: [00:15:30]
Let's go back to one of our favorite subjects on this. If you are stuck in a handover scenario in your day job that means that your company is acting inefficiently because that handover should be a service that is provided by whoever it is that you're handing it over to. It should be self serviced. We're talking about shared services earlier. I think where shared services went wrong is it was a receiving was like somebody catching a ball versus saying hey get rid of the sports analogy completely and it's like your shared services teams should be self service teams. They should just be AWS. You've talked about this before. We don't get on the phone and call up AWS to create a VM for us. We don't open up a ticket for AWS to create a VM for us. There's an API there that we call that AWS provides. And is this where one of the breakdowns of SRE versus DevOps at least as a title goes? A good SRE is going to provide service. A DevOps role is going to accept work.

Viktor: [00:16:48]
Yes. You can maybe describe SRE as a type of person that works in a team whose product is a system. Your product is a product like front end application and SRE's product is a system. My job is to make sure that the system as a whole is running. I'm not really into individual applications unless they are actually breaking down my system. A cluster or a group or fleet of clusters. That's my product. I'm making sure that they are up and running and behaving as they should. For me to do the job and this is almost revolutionary for some companies even though it's so obvious and silly is that in order for me to make sure that the system is healthy I might actually go back for a moment and help you as a person or as a team that is developing an application getting to the state of that application that is runnable in my system. So instead of waiting for you to deploy your application yourself in my system and then discover that some terrible things are happening, I'm going to come and help you design that application to work well in my system. But the keyword is help. I am not going to do whatever you need to do for your application. I'm going to help you.

Darin: [00:18:17]
And in that helping as a SRE you are discovering how you need to make your systems more robust, more self-serve, whereas if you were a DevOps engineer, you're just waiting for the next Jira ticket to land in your queue. That's what we see a lot of. I've also seen companies use the SRE term and they're also just waiting to accept Jira tickets.

Viktor: [00:18:45]
It happens with every single profile that some roles in some cases work well and some others don't. The reason why I believe that SREs tend to work better than DevOps engineers which I don't believe should exist in the first place is that SRE is better defined as a term. Even if you look at the simplest possible definition which I'm paraphrasing right now SRE is when you give a software developer a task to do operations that's already much more descriptive and clearer and better defined than any DevOps definition that I heard. It's known what SRE is. It's not known what DevOps is.

Darin: [00:19:28]
And isn't that sad? You would think and no disrespect to Patrick because the term gets stuck on him. Patrick, you're our friend but DevOps is just a lazy term. It's a catch all term.

Viktor: [00:19:45]
Depends on how something starts. If you look at DevOps, DevOps started at least that's my interpretation started as an idea that development and operations should be one thing or joined. So DevOps describes an idea. SRE describes a role in Google. This is the role just like you have roles in any company. If you take any role in any company we can very clearly define what that role is in most cases. Now we can agree or disagree whether that role is good or bad, whether it's silly or not. We can have a lot of discussion but describing a role, existing role, is straightforward. DevOps is not a role and that's why I don't like term DevOps engineer. It's not a role. It's an idea. It's just a description of an idea that we are yet to implement and I could even argue that SRE, just like containers and microservices and continuous delivery, those are all potential implementations of that idea or helper tools of that idea or different roles that can help us get to the idea that there are no barriers between development or operations because that's what it is. On the other hand, yeah, we know what SRE is. You just go and look at if Google is hiring SREs, you just look at the description of the role and that's what it is.

Darin: [00:21:17]
I'll go back to the earlier analogy, even though it was poor. The quote unquote DevOps engineer is basically a catch all term. I used people that take out the trash but it's a catch all term because the hiring company doesn't really know what they need or want. They just have a role and this is the new buzzword, so we must be hiring the new buzzwords.

Viktor: [00:21:42]
Like full stack engineers right? Unlike DevOps I understand what full stack engineer is, so that's improvement for full stack but I never saw it really working because it's impossible. It's unrealistic expectation.

Darin: [00:22:00]
To expect someone to know JavaScript, 15 different front ends, know middleware, know how to create APIs, know how to create Kubernetes clusters, know how to manage databases, know how to...ok, I'm running out of breath. You have got to be kidding me. I think full stack engineer is even worse than DevOps engineer. At least we know full stack engineer, we know what you're supposed to be doing. DevOps engineer, nobody knows what it is but a full-stack engineer? Starting salaries if you're a true full stack engineer, your starting salary should be a million to 2 million a year if you're doing all of those things.

Viktor: [00:22:42]
I would consider myself from certain point of view as close as it can get to full stack but that's because I'm more in a high level advisory role than doing development. I think we do need people who understand the big picture but maybe better description for full stack engineer would be an architect? You know a bit of everything so that you can generate some high-level picture but I would never trust you to manage my database which is just a fraction of what you described as your role.

Darin: [00:23:21]
But that's not how that role is typically hired and should step away from this because they're expected to do everything.

Viktor: [00:23:27]
Exactly and that's what's what's wrong about it. Understanding full stack is important. You need to have those people. Working on full stack? That's just stupid.

Darin: [00:23:40]
And usually the full-stack engineering I was talking about should be a high pay. Usually it's one of the lower paid roles that I've seen in most companies yet you're on the hook for every stinking thing.

Viktor: [00:23:52]
It's a way to save money. Why would we have 10 people because we have 10 different areas we work on. We can have it with one person. He can know everything.

Darin: [00:24:02]
How hard can it be? It's running on AWS. It's in Kubernetes. Kubernetes solves all the problems, right? Geez. Okay, SRE and DevOps. SRE is a real role that can be mapped to a person doing work. A DevOps engineer doesn't exist. It's like a unicorn right?

Viktor: [00:24:22]
Exactly. Rainbows exist, but unicorns don't. My daughter happens to say rainbows and unicorns all the time as if those are the same thing.

Darin: [00:24:36]
So what is a rainbow for a DevOps engineer? There is none right?

Viktor: [00:24:41]
There is no DevOps engineer, so there is no unicorn, so, unicorns lacking rainbows is completely irrelevant, right? There is the need not to have separation between development and operations, and there is a strong need for operations to develop and then that's SREs and there is a strong need for teams working on an application to have operational knowledge and that's the idea behind DevOps.

Darin: [00:25:11]
Listener, do you agree or disagree? Go over to the Slack workspace, make a comment on the post about this episode.

Darin:
We hope this episode was helpful to you. If you want to discuss it or ask a question, please reach out to us. Our contact information and the link to the Slack workspace are at https://www.devopsparadox.com/contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at https://www.devopsparadox.com/ to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.