DOP 72: Mastering Kubernetes with Gigi Sayfan

Transcript

Gigi: [00:00:00]
This is something that I really appreciate about the Kubernetes both the community and the project and all the leads. It's organized in a way that's really very methodical and right from the beginning, it was designed for large scale and enterprises and all those issues. Just this process of deprecation having API groups, even though sometimes it's complicated, but this is the right way

Darin:
This is DevOps Paradox episode number 72. Mastering Kubernetes with Gigi Sayfan

Darin:
Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darin and Viktor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can, and mix it with random buzzwords like Kubernetes, serverless, CI/CD, team productivity, islands of happiness, and other fancy expressions that make it sound like we know what we're doing. Occasionally, we invite guests who do know something, but we do not do that often, since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. PS: it's Darin reading this text and feeling embarrassed that Viktor made me do it. Here are your hosts, Darin Pope and Viktor Farcic.

Darin: [00:01:25]
So Viktor, as we're recording this today, we are recording this on August 24th, a new version of Kubernetes is coming out. Kubernetes has been finally stabilizing a little bit. It will be interesting to see what 19 gives us. One of the things that we've talked about and saw of 19 is a longer LTS, which is going to make our guest's life much easier in the future. His name is Gigi. Gigi, welcome to the show.

Gigi: [00:01:57]
Great to be here

Darin: [00:01:59]
Gigi is an author. His title that he has is with, okay, Viktor you gotta help me here. Don't walk away. Don't walk away.

Viktor: [00:02:07]
It's Kubernetes something. There is a word Kubernetes in it. What else do you need to know? It's Kubernetes.

Darin: [00:02:13]
It's Mastering Kubernetes, Third Edition, from help me with the P word. I can never say that. Packt.

Viktor: [00:02:21]
No. Is it Packt?

Darin: [00:02:24]
Gigi, is it Packt or is it Packet? Which one is it?

Gigi: [00:02:28]
I think it's Packt but it's hard to tell. Different people have different opinions.

Darin: [00:02:32]
Okay. It's those guys. Gigi wrote the book on Mastering Kubernetes and we're sort of poking fun, obviously at the Kubernetes community because having a full release come out every three months and trying to keep a book up to date is near impossible. Correct, Gigi, I would assume? Because you don't want to update the book every three months.

Gigi: [00:02:53]
Yes, absolutely. And actually it's gotten better now, but when I wrote the first edition, actually it's not updating the book, just writing the book, everything was completely out of date by the time I was going and reviewing a previous chapters. So Kubernetes itself now is much more stable. The core that most people use, and even some of the more advanced features. But still it's a struggle. Kubernetes has this concept, which is a very admirable of API groups. So you have versions and within different groups of APIs, you have different types of versioning. Over time things that one day eventually get to the desirable v1, then you know that they are stable. But before that you have long period of APIs that are maybe for years in a v1, beta1, or they were kind of extension/v1 beta1, beta2. So it's always running full speed just to keep up with. Kubernetes and most of the time it doesn't matter that much, but just recently, so this is not the latest Kubernetes, but between Kubernetes 1.15 and 1.16, there was removal of several critical APIs. So specifically deployments extensions, v1beta1, and then also the apps/v1beta1. So everything became required. Those APIs were deprecated for a while, but in 1.16, they were actually removed. So that was a very interesting, and I just am actually right now in the middle of upgrading a GKE, a set of GKE clusters, not just one, multiple GKE clusters from 1.15 to 1.16.

Viktor: [00:04:41]
In the past, you could not not use beta versions of API s. You cannot not use deployment because it's not v1. But now people shouldn't have that excuse anymore. Now it's so stable that probably people should be waiting for v1 for serious usage. Right?

Gigi: [00:04:59]
Yes. So deployments are actually, people have used it almost from the beginning because this is a very core piece of Kubernetes. You have a Deployment, then a Deployment manages your ReplicaSet and your pods for you. You don't want to manage it yourself, and don't want to deal with the ReplicaSet on your own. So this is kind of an internal implementation. So people have used deployments right from the beginning. What's different now is that Kubernetes throughout all those versions until 1.16, it was always backwards compatible. If your YAML manifests had a previous API groups for the deployment, everything worked just fine. Now with 1.16 it's not anymore. Now Kubernetes just will not accept them. One of the things that makes it actually a problem for automatically upgrading is that apps /v1 deployments, now they require the selector. So you have the selector there with a set of labels and before it was optional. So this is really a breaking change. And then there were a few other things like StatefulSet and a few other API groups that are now not supported.

Viktor: [00:06:06]
You probably saw in 1.19 that they introduced more warnings about deprecation. Do you think that that will help?

Gigi: [00:06:14]
Yeah. Yeah. Definitely. I think this is a great step in the right direction because Kubernetes. I mean the different organization used it in different capacities, but now it's really, not now even while ago it became really mainstream. Once all the large cloud providers started offering managed Kubernetes, then a lot of people from more traditional industries felt that, okay, this is a justified. Now we can justify to our IT managers and we can actually put serious production workloads on Kubernetes. So now the backwards compatibility and the upgrade paths become a much, much bigger deal for those companies and organizations with critical workloads that they can't just move fast and break things. They have to do it properly.

Viktor: [00:07:02]
Here's the thing that I'm curious about. We have even more warnings about deprecations. The surface of warnings is a bit bigger. But what happens with really advanced users who have everything automated, let's say that everything is going through some CI/CD pipelines or some form of automation. You're still not going to find out that something is deprecated unless you run it manual or you read the release notes for every single release.

Gigi: [00:07:30]
Actually they are even bigger problems that I just learned first hand. One thing is that a lot of people use helm for templating the deployments and it turns out that helm history is unable to do the normal upgrade. So if you upgrade your cluster but you didn't updated all the resources to supported versions then Kubernetes will do some upgrades for you automatically, but then you won't be able to use helm anymore because your most recent deployed version of helm contains those unsupported resources. The approach that I tried to follow in this current upgrade I'm dealing with is to create this little tool that scans all our clusters and for each cluster, it gets all the resources and then compares them against all the resources that are being removed right now and then doing the same for helm. So this is something again, I discovered it after the fact, but luckily it was in staging. So now for production actually going and scanning the helm releases the history of the helm releases as well. There have been all kinds of interesting stories there.

Darin: [00:08:37]
Too many stories, I think.

Gigi: [00:08:39]
Yeah, yeah. Too many stories. This is the real world. Kubernetes is awesome. But when you start to deal with those more brittle operations, like upgrading a backwards compatibility and trying to deal with the evolution of everything, then you run into those issues.

Viktor: [00:08:58]
But it's getting better isn't it kind of like, at least from my perspective. Latest releases are not exciting anymore. And that means that it's actually more stable, right? If I'm not excited about some release, that means that it must be focused on stability rather than new features.

Gigi: [00:09:17]
Yeah, absolutely. Yeah. I totally agree. When you're in infrastructure, you want everything to be boring, right? You don't want the excitement of waking up at 3:00 AM and dealing with some exciting issue. What you want is really stability and then you want to push the envelope on the edge, but not to break everything that's already there. So I think this is exactly what's happening. This is something that I really appreciate about the Kubernetes both the community and the project and all the leads. It's organized in a way that's really very methodical and right from the beginning, it was designed for large scale and enterprises and all those issues. Just this process of deprecation having API groups, even though sometimes it's complicated, but this is the right way because you have the right granularity and you can do fine grain upgrades, and you have those multiple versions at the same time. So you're not forced to upgrade most of the time. When you think about the scale and the breadth of where Kubernetes can be deployed. It could be run on your laptop. It can run, obviously in the cloud can run in any data center. Now we start to see Kubernetes running on the edge, RaspberryPI and trying to keep everything together. This is something that's pretty much amazing. The way it's still getting out those releases every three months and making sure that you don't break, everything that has been done before. So I'm really impressed by that.

Viktor: [00:10:46]
So what's your favorite distribution platform? Do you have a preference or everything's the same to you?

Gigi: [00:10:53]
For me, it's the same. I don't have a favorite. I really try to roll forward and enjoy the goodies of each and every distribution.

Viktor: [00:11:04]
That was the most politically correct answer I ever heard in my life.

Darin: [00:11:08]
okay. So maybe you don't have a favorite, but are there some that you prefer over others? Maybe not a favorite because there's always trade offs. Right? Come on. I'm asking it in a different way now.

Gigi: [00:11:20]
Sure. When we talk about distributions, I assume you mean managed distributions, right? GKE is obviously is still ahead and you have Microsoft with AKS. They started pretty early too, and they invested a lot in Kubernetes and tooling and the whole ecosystem. And then EKS, Amazon, AWS, they kind of grudgingly eventually they relented. But now I think they are fully embracing it. I like to be on the cutting edge as much as I can. Right. So again, still keeping the boring parts. So we're stable, but I'm always following up what's coming next and see if I can fit it into my work and take advantage of it. So I prefer in this respect, I prefer GKE, but as far as just the popularity and where most of the workloads that Kubernetes deployment are then I think AWS is catching up pretty quickly.

Darin: [00:12:14]
That was polite. And that's actually in alignment with what well, it's actually not in alignment with what I see in the real world, but it's in alignment with how we think, at least between Viktor and myself.

Gigi: [00:12:26]
Yep. So what do you see in the real world?

Darin: [00:12:30]
uh, in the real world, I see EKS more than anything else. It's rare to see Google ever.

Gigi: [00:12:35]
So what I saw is a lot of people are just deploying Kubernetes directly on AWS. So using Kops and deploying it on EC2, but now EKS is really picking up steam because that was the whole point. So AWS, they noticed that a lot of people running Kubernetes on their platform and then said, okay, we'll provide you our own managed services and give you the benefit and then obviously cut the little benefits from there.

Viktor: [00:13:05]
What's next for Kubernetes or not Kubernetes. What's coming? Because Kubernetes itself has been relatively silent for a while, right? Are we now already talking about layers on top of Kubernetes?

Gigi: [00:13:19]
Yes. So I think a few trends that are going to be very prominent that they already are and I mentioned them in the third edition of the book. I added two chapters just for those particular topics. So we have the serverless, and then we have the service mesh. So those two are very big trends. There's actually something behind them. It's not just fluff and people are excited about the idea, the concept. They're actually very valuable and they are used. So with serverless, we have two types of serverless. There is one serverless, obviously that always servers behind it. Right? The thing is that we as developers or administrators, we don't have to manage them. So that's one type of serverless. Kubernetes it takes care of scheduling all your pods. You don't need to know exactly where your workloads are, unless of course you want to, and then you have all kinds of affinities, et cetera. By and large, you can let the Kubernetes scheduler schedule the pods. But what happens now when you want to scale? When now your volume increases, right? So now you need to add nodes to those clusters. This is when you try to walk the fine line between having enough capacity, but not over-provision because you don't want your nodes to be under utilized because it costs money. This is where it becomes interesting. Kubernetes had the kind of cluster autoscaler that can add nodes, but it's still a lot of rough edges and when you try to make it work at scale in more sophisticated scenarios. So you have a different type of node pools with different capabilities for different workloads. You still need to do a lot of management yourself. So I think now what we see with, for example, Fargate on AWS integrating it and then Microsoft, they have their own, uh, I believe it's called the ACI Azure Container Instances and Google now with Anthos, et cetera. So we see a lot of movement towards taking this part of managing the nodes of the cluster, the physical resources behind the cluster, and making it also part of the Kubernetes experience. So I think that that's huge. If it's really successful then for the vast majority of people, it will completely eliminate any need for capacity management. You can just kind of trust your cloud provider. Ideally. Obviously it will be a long road, but this is the direction. The other type of serverless is more about function as a service. Here we're talking about not even having a long running services for some of your workloads, but having those ad hoc functions that are only invoked just there when they're needed. There's a lot of benefits because you don't need to deal with all the foundation of running a long running services, upgrading it, security, patching, et cetera. You have kind of your sandbox and then pretty much your code executes when it needs to execute, can be triggered by anything. And that's another, I see it as a major trend as well. So it's not appropriate for every workload. Sometimes you do need long running service. You need caching, you need keeping connections to a various databases or other services. But then again, there are a lot of workloads that really work well as a function as a service paradigm.

Viktor: [00:16:41]
Does that mean that we are moving towards a future where people will not even know that they're running Kubernetes? It will be so abstracted like, you know, like memory management. Nobody manages memory anymore. Right. It just happens somehow. Is that where we're going?

Gigi: [00:16:58]
I think for some groups of people, you want to be able to completely ignore it, just like for a long time. People talked about the network, right? I'm coding to an interface. I don't care if it's running on my machine or in the same process or another part of my machine or it's remote process. Those are kind of leaky abstractions. Someone will have to know about it and know how to deal with it. Especially when they have a problem and a need to do a debugging troubleshooting. So you can't ignore it completely. But ideally, most developers don't really need to know that they run on Kubernetes or the whole architecture or the way things are provisioned. For them, we can provide a more high level abstraction and just, okay. I need those services. I need those data stores. Give them to me somehow. I don't care what's underneath. Is it really Kubernetes? Is it ECS? Is it something else? I just want my view of the service level. So I think we'll get there. Depending how much you invest in it in your own organization. Kubernetes itself is not a really opinionated at this level.

Darin: [00:18:06]
You said you're going through and upgrading a number of GKE clusters from 15 to 16 right now, correct?

Gigi: [00:18:13]
Yep. Yep.

Darin: [00:18:15]
Is GKE your only production provider at this point?

Gigi: [00:18:19]
It's the main one. We use also AWS because we try to be really cloud agnostic. It has many benefits. The reason even to pick Kubernetes is that you can run really large scale distributed applications in a way that you're not vendor locked in and have redundancy. If you run really critical systems, then you don't want an outage or partial outage of your cloud provider to completely leave you incapacitated. This is something that's very important for us, but most of our software is running on GKE right now.

Darin: [00:18:49]
okay. So if you are running in both GKE and are you on EKS or are you rolling a Kops flavor on AWS?

Gigi: [00:19:00]
AWS we're just starting now. So we're still evaluating the terrain.

Darin: [00:19:04]
So is your plan to be able to run hot, hot in both providers and then use somebody like Akamai or Cloudflare out front to give you some Anycast to get you to the closest?

Gigi: [00:19:16]
We have actually our own networking layer. So it's based on Istio. We do our own network. So there is some front end in front of us, but all the internal communication between clusters, even on GKE, and it's going to be the same with AWS, then it's going to our own networking layer. So this is something that's pretty unusual, but we need a lot of control. And the trick there is , I mean, not one trick, but one of the main things is that you need to establish a VPN connections between AWS and GKE or the Google networking layers. That's another piece of complexity that needs to be resolved before you can do that level. Because if you just use public endpoints, then it doesn't really matter where you run. Right? If your services on different clusters communicated just through public endpoints, load balancers, et cetera, then you can run anything anywhere. But when you try to do things that are a little more involved, either because of security or because of performance, you need to go the extra step.

Viktor: [00:20:24]
That's one of the things I'm curious. The whole story about running multicloud, multiprovider and all those things and there are benefits like you're not being locked in and leveraging the advantages o f one for one thing, the other, for another thing, I understand all that. What I'm curious is how much that is a reality for tangible reality, kind of for majority of companies. Does that make sense today?

Gigi: [00:20:49]
So there are a couple of reasons to use multiple clouds. So one of them is like I mentioned not to have a vendor lock in. Another one is sometimes just because of regulatory reasons. So if you operate in different geographical locations, GDPR, or if you operate in China, then you're forced to use maybe another cloud provider. Another good reason is if you want to run some workloads in your own data centers. There are all kinds of interesting scenarios here. Maybe you already have established data centers, but you want the elasticity of the cloud. So if suddenly you get the big spike, then you can spill over to the cloud. And then there is the opposite scenario. So most of your workloads are running in the cloud, but then you have something that's a very secure. You can't even trust the cloud provider or some special requirements from your customers that those specific workloads you need to run on-prem or in a private data center. Those are situations that you're forced to work with multiple clusters and you can't just choose one. And then the final reason is really, if you operating at a really large scale, then a Kubernetes cluster, they are limited, right? So the limits are not that small, but you have something like a 5,000 nodes . I think maybe people tried a little more, but it depends also kind of on your pod density, right? So how many pods you want on each node? So in the end, it's not infinite. A cluster can manage a certain number of workloads and if you go beyond that, then you must use multiple clusters.

Darin: [00:22:24]
So you said you're just now getting to multiple clusters, right?

Gigi: [00:22:28]
So we have multiple clusters, but they're all on GKE. Now we're starting to look into expanding to multi cloud provider.

Darin: [00:22:35]
Are you planning on putting the same workload in both?

Gigi: [00:22:38]
So that's an ongoing discussion because one of the things is that we want the redundancy, but the question is, do we need that for every workload, the full kind of cross cloud provider redundancy, or is it good enough to have a multi-region redundancy or maybe even more? So some regions are more critical than others and some can just, if you need it, you can bring them up. It's definitely not going to be a simple yes or no. It's going to be some combination and we have to do it workload by workload.

Darin: [00:23:09]
Yeah. So if you've got your most critical in a specific region, you're probably going to have multiple of the same provider in that region, along with a secondary provider as well, just in case.

Gigi: [00:23:20]
Yeah. And then the regions they don't match exactly. It's not at least in the same geographical location, but I think those for the most part, it's going to be important only for very small number of workloads, if any. Kind of the latest differences between if you're running something in Oregon or LA, it's still West coast. So those are typically not that important. But obviously we'll run our metrics and evaluate it and then we'll see if there was something that's really important for us.

Darin: [00:23:53]
So again, the book is titled mastering Kubernetes. The latest edition third edition is out. Now, go buy it at any of the favorite places you buy books. Buy it from an indie book seller, if you can. I'm not saying don't buy it from the big people, but buy it from an indie bookseller. They need help too. any other final thoughts or comments about Kubernetes? you're all in on multi cluster. If necessary, you're moving towards multiprovider if necessary. what is the next if necessary that you think's going to happen?

Gigi: [00:24:31]
So the next if necessary and again for different companies could be different is I think Kubernetes on the edge. You have your big processing center in a, kind of a, somewhere in the cloud, but then if you collect a lot of data sensors, et cetera, from lots of multiple locations, you may want to do a lot of processing, more local processing. So it depends really on the situation, but I can see this as a very interesting and important use case for Kubernetes. Running local processing and then sending some digest of all the data after filtering and after pre-processing, sending much more compressed version of the data back to your cloud provider, where you do the rest of the processing.

Darin: [00:25:15]
The link for the book will be down in the show notes and or in the description, wherever you may be listening to this. Gigi, thanks for hanging out with us today.

Gigi: [00:25:26]
Yeah. Thank you. It was a pleasure.

Darin:
We hope this episode was helpful to you. If you want to discuss it or ask a question, please reach out to us. Our contact information and the link to the Slack workspace are at https://www.devopsparadox.com/ contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at https://www.devopsparadox.com/ to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.

DOP 72: Mastering Kubernetes With Gigi Sayfan

Show Notes

Links from the episode

Guests

Gigi Sayfan

Hosts

Darin Pope

Viktor Farcic

Links

Rate, Review, & Subscribe on Apple Podcasts

Signup to receive an email when new content is released

Transcript