#102: Are you a rule-maker or a rule-breaker? Hopefully we all agree that having guardrails up help us live a better life, whether personally or professionally. However, sometimes those rules get in our way of getting things done. Today, we take an introductory look at Open Policy Agent and Gatekeeper and try to figure out how using Gatekeeper can make not only our lives but the lives of our end users much easier when managing our Kubernetes clusters.
If you like our podcast, please consider rating and reviewing our show! Click here, scroll to the bottom, tap to rate with five stars, and select “Write a Review.” Then be sure to let us know what you liked most about the episode!
Also, if you haven’t done so already, subscribe to the podcast. We're adding a bunch of bonus episodes to the feed and, if you’re not subscribed, there’s a good chance you’ll miss out. Subscribe now!
Viktor Farcic is the Open-Source Program Manager & Developer Relations (Developer Advocate) at Shipa, a member of the Google Developer Experts and Docker Captains groups, and published author.
His big passions are DevOps, Containers, Kubernetes, Microservices, Continuous Integration, Delivery and Deployment (CI/CD) and Test-Driven Development (TDD).
He often speaks at community gatherings and conferences (latest can be found here).
His random thoughts and tutorials can be found in his blog TechnologyConversations.com.
You remember in the old days we had Excel sheets with tens or hundreds or thousands of rules and then we would have one person at the end of a six month iteration going through all those rows in a spreadsheet and then you would be trembling whether how many things he will find that will prevent you actually from going live. At least that was my story in the past. Now if those rules are executable, then we do not need that person. We need that person to create those rules but not anymore to enforce those rules.
This is DevOps Paradox episode number 102. Getting Started With Open Policy Agent
Welcome to DevOps Paradox. This is a podcast about random stuff in which we, Darin and Viktor, pretend we know what we're talking about. Most of the time, we mask our ignorance by putting the word DevOps everywhere we can, and mix it with random buzzwords like Kubernetes, serverless, CI/CD, team productivity, islands of happiness, and other fancy expressions that make it sound like we know what we're doing. Occasionally, we invite guests who do know something, but we do not do that often, since they might make us look incompetent. The truth is out there, and there is no way we are going to find it. PS: it's Darin reading this text and feeling embarrassed that Viktor made me do it. Here are your hosts, Darin Pope and Viktor Farcic.
Who enjoys rules? I don't enjoy rules.
Only those who make them usually.
Why is that?
Because I guess nobody likes being told what to do. Unless somebody else is told what to do.
Now you may be questioning why are you talking about rules on a DevOps podcast? That's a good question. We're wondering the same thing.
DevOps is about anarchy. Everybody does whatever they want, right?
As long as you do it in the way that the anarchist wants you to do it. Viktor has been playing with OPA, Open Policy Agent, and has found a number of interesting use cases, could we say?
Yes. I think that I see it slightly differently than many. For many. OPA is a way how to define what cannot be done for their own usage. While from my perspective, OPA is a great tool to shift left. We have that constant problem, I believe, that there are some people on the right that know how to do certain things and they're afraid to enable people on the left to do those things because people on the left might have less experience in that area. Let's say if you're a sysadmin, you know Kubernetes in and out. If you want to enable developers to use Kubernetes, then you're afraid, Hey, if I let them deploy this and that, then they might mess it up because people on the left might actually just choose to run a hundred replicas of something and break my cluster or they might open a port to outside world and that is not a good idea. If you look simply at Kubernetes RBAC, then what we can do is say, Hey, you can create this type of resource. You cannot create that type of resource, but that's more or less where Kubernetes itself ends. You cannot say you can create a service, but service must be this type. If you want to go deeper, then you can or you cannot create this, but say you can create that, but only if those certain conditions are met, then you need something beyond what Kubernetes offers out of the box. In this case, that's OPA. Actually, I don't think that OPA is Kubernetes specific but let's say that for the purpose of this conversation it is.
Well that's where I was going to ask you to take the rest of us because I haven't worked with OPA at all or at least other than just doing some reading. What is OPA? What's the purpose of it because you just said RBAC can only do so much but then you threw the twist on the end Is this really isn't Kubernetes specific.
I'm not sure to be honest whether OPA is Kubernetes specific or no for a simple reason because I've been using it within the context of Kubernetes. Theoretically I guess I don't see a reason why it wouldn't be applicable outside but I'm not really sure. I guess that all it tells you how much I'm focused mostly on Kubernetes for a while now.
Well that's fine. So, within the context of Kubernetes, let's keep the guardrails easy for you. What is OPA?
Open Policy Agent. It's a standard for defining policies and those policies can be almost anything. It's based on a let's say semi-language called rego. Don't ask me what it stands for. I'm not really sure. Rego would be a language to describe the rules and agent is the one that applies those rules. Then we have different implementations of that whole agent. The one I've been using the most is OPA Gatekeeper. Implementations can be many. That's the big advantage of making something very, very open I guess.
So what's an example use case? What is the Hello World for OPA?
So let's say just to stick with the example I already mentioned. You can say if the resource is Service and the type is NodePort, deny creation of that resource. Then that type of rule would be created as admission controller in Kubernetes which means that it would be executed after Kubernetes receives a request to create something, but before that something is created. So admission controllers are like hooks that can be executed under certain conditions before Kubernetes creates the actual resource.
It's sort of like if you were to compare it to Git to a pre-commit hook, I guess.
Yeah. It's like pre-commit hook for commits. It's precisely that.
So your example there of don't allow NodePorts to be created is a good one but what if we were to use. You were talking about shifting left. What does that because that I don't care so much about the NodePort. That's just keeping people from doing stupid things.
But it's precisely that. It's keeping people from doing stupid things. If you now got back to the right side of the equation, sysadmins, operators. If I'm an operator let's say, I probably do not need to create rules that will apply to me, because if I'm going to spend 15 minutes, half an hour whatever, creating a rule that NodePort cannot be used, then I probably already know that rule myself. I'm experienced enough to know that I need that rule and if I know that I need that rule then I can just apply it without having the rule enforced in Kubernetes. I don't see how it could help the person who creates the rule. Now what I can see is saying Hey, you, all of you to the left, do whatever you want and if it happens to be something that is not allowed you will receive a message when you execute kubectl apply or whatever you're executing, you look at the message. I couldn't create this subject because this and that. So it's like pre-commit webhooks that helps you avoid committing things that you shouldn't in case of Git. Same thing in Kubernetes. To me as a rule creator or enforcer, it would save me from reviewing your code because without something like that, I would need to go behind every pull request and merge it myself to double check that all those one, five, 10,000 rules are followed. Then I'm a bottleneck. We're going back to the problems that we tend to have continuously. How do we enable people to do stuff without them coming back to some central authority to review and confirm everything? From that perspective it's from I believe enabler. Hey, you can try to do whatever you want and the system will guide you towards doing the right thing. We do not know what is the right thing for you but we know what is the wrong thing.
In that case if all we're doing is enforcing the negatives, we're stopping people from doing stupid things, why do I need OPA Because I could RBAC most of that, right? Is that the phrase though that I'm missing is most of it and not all of it?
No. You can RBAC Kubernetes resources. You can say whether I can not create a service to stick to the same example. It's binary. You can not create a service or if there is no rule like that then you can create a service but you probably do not want to say Hey you cannot create a service because there is a chance that you will do it in a wrong way. That would be bad. Actually that's what many organizations are doing. I cannot be sure that the service you will create is correct, therefore you cannot create a service, which is very limited. It's going back to shifting to the right and silos because most of the time it's not enough. Actually when I think about it, if I exclude service accounts and maybe only a few other things, I cannot see why everybody shouldn't be able to create almost everything because that's not really the question. The question is whether what you're creating is not going to harm the system whether it's not going to harm others and service itself is not going to harm anything. Service with a NodePort will. So service accounts works only on the level of create, delete, read, update, maybe a few more rules of certain resource. That's where it ends.
Not the type of and that's the reason why OPA is there as the type of service.
It doesn't enforce which properties something needs to have. OPA can basically you can configure it to do anything. You can say Hey name needs to start with Viktor. Basically you can control any well I don't know how to say it maybe parameter or argument or any part of a definition of a resource. So service account allows you to control resources and OPA allows you to control definitions of those resources, let's say.
Is this a use case? I have a service account and I'm ready to do my kubectl apply. Did you as the OPA administrator set up a rule set that with this service account I can only do work number one in this namespace number two there's a basically a deny list of things that I am not allowed to create and I don't know if there's something else Is that how it would work?
Both examples you said would be service account not OPA. With service account, you can say you cannot create a namespace. You cannot create this and you can only delete that but you cannot create this. You can read this. So read write operations on the level of resources. OPA, if you have permissions to create namespace let's say and that would be service account I can have OPA that says yes but for you to create that namespace you need to specify memory and CPU of that namespace and it cannot exceed 10 gigs. I'm inventing the rules on the fly.
That's okay But I'm thinking this would be the other one where you can do it but you also have to create an annotation that is your billing code because we're doing charge backs Right It's it's the business things not necessarily the technical things and help enforce that. Okay.
I think that we spoke about already Crossplane. That part is what fascinates me in terms of OPA because we knew for awhile that we can use those things to create Kubernetes resources. But how about if I can extend those roles to apply not only what is running in my Kubernetes but run basically anything. What if I could say Hey actually anybody can create a database as a service, Postgres or whatever your vendor is offering. Anybody can do this. You don't need to call me to do that for you. That's absolutely awesome. But then we get into similar situation that we already mentioned. Hey but who says that you're not going to create that database in a way it will cost us a $100k a month. That's where OPA can jump in again because I can control. Hey, it cannot be bigger than this. It cannot be that. It cannot be this. The fine tune the rules beyond you can create this or you cannot create that. At least in my case I'm using it only exclusively on the Kubernetes level, but if I'm creating infrastructure as Kubernetes resources then me using OPA on a Kubernetes layer can effectively control everything, not only what is running in that Kubernetes cluster.
And that was going to be my question is unlike Crossplane which I could just run Crossplane on a minikube, Docker Desktop, whatever, it doesn't matter. It doesn't need to be in fact it's not meant to be inside of the cluster correct?
Minikube is a cluster, right?
Well exactly but what I'm saying is for the places where my application would be running Crossplane doesn't need to be in that same cluster.
No It doesn't have to be in the same cluster.
Yeah it could but it doesn't have to be. What about OPA? Does it have to be in the cluster as well or is it sort of like Crossplane to where it's got fingers into those other clusters?
It needs to be in the cluster where the rules are applying, so it cannot be cross-cluster. You cannot have OPA rules in one cluster that are affecting another cluster at least not as far as I know. You could for example have a control cluster or a bridge cluster whatever that runs OPA the runs Crossplane maybe Argo CD and Flux and controls the rules what is happening in that cluster, but if the tools in that cluster are affecting other clusters then you're effectively controlling everything.
I was just trying to see if the fit of OPA and Crossplane were sort of command and control versus not or some other angle I don't know
Yeah, so you would theoretically need to have OPA in every Kubernetes cluster where you're creating some resources.
Right and that makes sense. Now does it work like Crossplane? You feed it something and then it's applying it or how does it work?
It has templates and instances of those templates. You create a template that says which service types are allowed and then you have instance of that template and say sorry that was actually wrong. You can have a template that says Hey certain service types are not allowed and then you have instance of that template that says LoadBalancer type is not allowed and maybe somewhere else no LoadBalancer and NodePort types are not allowed. So it's a template and then you can have as many instances of that template as you need. That kind of makes sense because rego the language that is used to do that to create those templates is a bit painful. Not really nice but then since it's split like that, it's doesn't really matter because using those templates is very, very easy and straightforward. For majority of people you would just go like if you use OPA Gateway you start by installing some 50 templates that they already have and then you create rules based on those templates and later on you might start creating your own templates.
Is OPA gateway the only implementation right now or there are others out there?
There were a couple of others that I tried a long time ago and discarded. I don't remember any more why. It was a very quick win for me for OPA Gateway.
I'm looking right now and of course nobody else can see over my shoulder including Viktor. There are a lot of integrations, use cases. Here's one that looks interesting to me. It's around Spring Security. It says authorization for Java Spring Security. That makes sense or there was one other one that I saw like a Gradle build plugin but then at the same time there's one for CloudFlare worker enforcement of OPA policies using WASM. It's all over the thing. So this is a rules engine that can be created to do anything.
You remember in the old days we had Excel sheets with tens or hundreds or thousands of rules and then we would have one person at the end of a six month iteration going through all those rows in a spreadsheet and then you would be trembling whether how many things he will find that will prevent you actually from going live. At least that was my story in the past. Now if those rules are executable, then we do not need that person. We need that person to create those rules but not anymore to enforce those rules. It's similar like if QA in the past would check your code and now QA is more concerned about whether your coverage is not below 80%. If it's below 80%, your build fails. If it's above 80%, then you can continue. That's the simplified version, but it's the same logic. How can we apply certain rules so that people can find out whatever they're doing breaks those rules immediately without involving the rule creator in the process.
Interesting. Who would have thought that rule management would be so fun or am I just weird?
That's going to one of our old subjects That's one of the big benefits of having a well-defined API and potentially declarative syntax to do stuff that it's relatively easy once we all agree on certain API it's relatively easy to create things like that because if we all use the same API then we can all use the same set of rules whatever those rules are and find out whether we're doing something wrong immediately.
And this is one of the cases you were just talking about the Excel spreadsheet but with OPA you can't do a UI as far as I can tell. Everything has to come through I mean I'm sure there is a way to do it but okay everything from an OPA perspective is code.
Yeah everything is. Now somebody could create a UI that translates fields and drop down lists into code. There can always be a UI that translates something to code which is basically what UIs do in general.
Well looking at what I'm seeing right now, it would be nice to have a UI to help write the policies
Yes, policies, yes.
To generate the policy and then apply the policy through our normal workflow of whatever that may be.
Yes not for creating rules necessarily. Rules are very complex and really require coding I guess or writing them but policies yes policies are straightforward once you have those what in Gateway they call templates.
Okay I guess I just got confused. What's the difference between a policy and a rule?
Actually policies and rules are the same thing. My bad. Template would be a template of a rule let's say without specifying specific values. Template would allow you to specify that certain types of services are not allowed and then a rule is this type of service is not allowed.
I have a feeling this may be revisited over time because a lot people I know that especially just in Kubernetes they don't always think this way. They're thinking service account but they're not thinking beyond that. For people that are new to Kubernetes, they're not thinking about OPA. They're just thinking service account because that's all that they read about.
Exactly. It's kind of the easiest thing to do. Hey, you are not allowed to do anything except to use this namespace. That's easy. Within that namespace you're allowed to create Deployments but you cannot create StatefulSets. That's easy, but that results in people not being productive saying what the heck. That's the reason why people want to use local clusters instead of real clusters or remote clusters because they're too constrained. But if you're more specific about what cannot be done then the scope of what can be done is much bigger.
And that's paradoxical, isn't it? The tighter you tighten the screws, more freedom you have.
The more precise you are what cannot be done, the more freedom you have. Easy thing would be Darin you cannot cross the street. That's easy, but then you would be miserable because you wouldn't be able to go anywhere except around your block. The right thing to do is you cannot cross the street if the light is red and then you have much more freedom to go where you want.
There's just certain times to where you can't. There are prescribed paths that you can take
or you can go further and say if it's not a busy street you cannot cross on a red light unless it's not very busy street and there are no cars on the left on the right. The deeper you go the more freedom those who need to follow the rules have.
Using that red light rule, if I'm standing there and there's a red light but there are no cars anywhere to be seen then I am taking responsibility for myself stepping out across the street. It's a risk because that doesn't mean that that rule may be taken away at some other point because I did break the rule.
Let's say that there are rules that must be followed and there are rules that are beneficial for you to follow but you're not forced to follow.
That's what I was trying to say. Now how we bring that back to Gatekeeper and OPA, I don't know.
No we can. In Gatekeeper, we can issue warnings and we can also deny certain things. We can say Hey this is a bad thing to do but go ahead if you want to and we can say you're not going to pass. This cannot be done period.
So the warning model gives you the ability to add in things that you are adding in rules for that you found out were bad over time. Let's say you didn't know that NodePorts were a bad thing. I'm not saying they're always a bad thing, but vast majority of the time they are, and you hadn't put that rule in place to begin with. Well, you could start by adding in a warning with another message saying, as of this date, we are going to not allow creation of that anymore and added us as of this date, we are going to be removing all NodePorts.
Think of it as an engine that you say Hey if when this thing is created updated deleted when certain action is requested on certain type of resource, check whether certain properties of that something are like this and that and if they are then do this. Now the most common do this is deny. That's the most common action of OPA but it could be at least in theory it could be anything. It could be just print out the message for whomever did it which would end up being equivalent of warning or you can say Hey actually I might actually create this thing as a result of that. It can be potentially any action you want as a result of you not passing the rules or failing the rules. Now the most common action is to deny creation of that something but it can be anything else.
Let me ask one more question and I'll let this be the last question and if the answer takes 30 minutes, so be it. Is it like firewall rules to where based on order? So if I wanted to allow if X, Y, and Z allow, but if only X, deny.
Exactly. Something like that. So service account would be firewall rule that just says port closed and OPA would be firewall rule that says port is closed if it's TCP or if it's HTTP without TLS and if it comes from this source. So service account is like binary. You can create this. You cannot create that. Like firewall. Port is open. Port is closed. OPA would be additional rules. When are those more higher level rules applied? Under which conditions you can pass through this port?
So it's fairly typical. Going back to a firewall rule is you deny all at the bottom and above the top, you have specified certain allows to where if it matched. So you're saying with OPA the typical rule to write is deny and I'm just questioning does it work like a firewall rule chain?
So it works slightly different than if I would make analogy with firewalls I could say that actually a firewall would allow ports all the ports and I would have something above those ports and between those ports and the destinations whatever the destinations are that would validate whether that something is okay or not to pass and then open or close port depending on those rules.
Right. So always accept, but then decide what to do with it.
To define it slightly better when a request comes to the port, do not let it pass or do not deny its pass, wait for a microsecond second, let me validate the rules and then let it pass or no. So it would be validating rules for each request.
There we go. Okay. So this is our first step into OPA. I have a feeling it will be back at some point in the future. Are you using OPA? If you are join us over in the Slack workspace and on this episode, go ahead and put in your comments about how you are using OPA.
We hope this episode was helpful to you. If you want to discuss it or ask a question, please reach out to us. Our contact information and the link to the Slack workspace are at https://www.devopsparadox.com/contact. If you subscribe through Apple Podcasts, be sure to leave us a review there. That helps other people discover this podcast. Go sign up right now at https://www.devopsparadox.com/ to receive an email whenever we drop the latest episode. Thank you for listening to DevOps Paradox.