Patrick 00:00:00.000 the parallel that I'm trying to. Form in my head, and that seems to stick with people is that, well, if your prompt is basically the new code, because you're prompting, maybe you're even talking to your coding agent. We should treat it as code. So how do we test it? How do we improve it? How do we keep it up to date? the funny part is like we say, our job was gonna get easier because it like instantly could understand what we wanted. Guess what? Like we need to do better specifications and kind of write more docs. And that's kind of helping the quality.
Darin 00:01:38.429 Here we are at episode 3, 5 0. That would be three 50. If you said it all as one word and. Always on three fifties, or at least the divide by fifties. We have Patrick Deis on for us now. Patrick, hang on just a second. before we get into anything else, Viktor has something he wants to say to you
Viktor 00:01:56.409 Last time we spoke, it was a year, no more than a year ago,
Darin 00:02:00.519 more than a year ago. Yep.
Viktor 00:02:01.809 Yeah. You were giving us some story about AI and I was very negative. Kind of like, no, it cannot do anything. It's still a toy man. so I need to publicly apologize for everything I said, man.
Patrick 00:02:14.394 Well, you know, I know that was really hard for you,
Viktor 00:02:17.259 It was very hard.
Patrick 00:02:18.714 give you an applause.
Viktor 00:02:20.799 No, no, it's very hard because now I'm, I'm a person who doesn't have any application open except cloud code on my laptop. Kind of, that's, I, I, I don't use a browser if for anything but Gmail kind of. That's, That's, the only thing.
Patrick 00:02:35.019 It is scary, right? How our life has changed.
Viktor 00:02:39.189 Oh, yeah.
Darin 00:02:40.284 So let me set it up and then I'm gonna sort of step back and let Viktor and Patrick go at it. Patrick, if you don't know who Patrick is, he coined the term DevOps now decades ago. Let's call it that way. 'cause it feels like decades because now he's saying that the bottleneck isn't code anymore. It's context. Models don't matter. The tools don't matter. What matters is whether or not your team can capture and distribute the knowledge, not to humans, but AI agents, Patrick. Does that seem realistic of what's happening?
Patrick 00:03:14.649 It seems like every time I get on an episode three, you guys, I have a new narrative that you're picking up. It's like the test, whether the narrative sticks like 50 episodes or not.
Darin 00:03:26.735 Viktor will come back in 50 episodes again and say that, oh yeah, you
Viktor 00:03:30.545 no, no, no. If, if Patrick now agrees with you, we will have a fight.
Patrick 00:03:35.017 well, I know, I, I don't always agree with myself, so that's maybe all, okay. It is a narrative. Uh, I would say that. as users, daily users now, we cannot do a lot about the model. Yeah. You know, when there's a new model we say like, Hey, cheers, there's something improved or didn't improve and we can't do anything but like flip a switch. I think the only thing we can do is. Provide context to the agent. I think our last episode, we talked a little bit maybe on specs and spec driven and putting kind of like requirements in there, but I think that like whole area of providing, using context for verification, using, for different tasks for different agents, that kind of whole field exploded. And I think my, the parallel that I'm trying to. Form in my head, and that seems to stick with people is that, well, if your prompt is basically the new code, because you're prompting, maybe you're even talking to your coding agent. We should treat it as code. So how do we test it? How do we improve it? How do we keep it up to date? the funny part is like we, we, we say, our job was gonna get easier because it like instantly could understand what we wanted. Guess what? Like we need to do better specifications and kind of write more docs. And that's kind of helping the quality. Whether that's the complete solution, I have no clue, but it's definitely seems to be a part of making it better. as such, I dunno if you kind of had similar experiences, with that Viktor.
Viktor 00:05:13.752 Oh yeah. The, the major difference, at least in my case, is. I was trying to write specifications, detail specifications before now I'm not doing that anymore. I'm talking to my agent until I'm satisfied and then kind of, okay, write it down. Write it out. Right? I, I'm not writing them anymore myself. Right. I mean, when I, I, I mean, specifications of a specific feature that I wanna work on, that which is outside of my skills or prompts and all the other shenanigans, um, that I have, right?
Patrick 00:05:49.842 Yeah.
Viktor 00:05:50.172 think that that conversational beginning of first step is a difference, at least in my case, compared to what I was doing before,
Patrick 00:05:59.865 I personally am still a person of the so many years and having gray hair, typing with two fingers, and then even kind of crossing the two fingers. when I speak to my agent, I tend to be very elaborate so it picks up more context. It seems like, uh, maybe I'm doing a presentation or something, but it gets richer. in a certain form. Now, maybe that's just me, because if you're a good typer, you could probably type that like way faster. But that's kind of how it helps me. But I, I, I wanna maybe like, you know, start a conversation. You know, we, we ended maybe the previous ones on specs and rider requirements. but it's only one piece of context that we provide, a ticket on how to implement things or kind of what the issue was is also kind of context how we do things within our team. Our guidelines are ways of working that is context and an agent cannot like. Be trained on that because it's specific to our environment. And so that's why, you know, if we're not happy with the result, that's because and tropical or all the rest can't know how we want things. And it's one of those items where we're developers. If they're like the naysayers and the negatives, like, AI can't do this, can't do that. I try to flip the table and say, Hey, can you help me teach AI to kinda like describe it and then externalize that kind of, knowledge they have inside of context. Now that's all about creating context and that's definitely, you know, the, the predominant stage on. Giving instructions to the agent. once you have those instructions. You can use the same instructions to verify what they did. you typically do that in, different phases, but you can use that with your query requirements and maybe have another agent, a different, backstory. You're the QA agent, your testing things. So context not just comes in the create, but also kind of in the verification, layer as well. so those are like typically where most coders see the context happening. and the value of, writing context, as well. I dunno if you kind of use maybe some hooks or anything that you can verify things or after it did something, can you please check it again, whether it's still in sync with the specifications, things like that is, uh, definitely useful.
Viktor 00:08:25.832 Yeah, I'm having slight problems with that, to be honest, because I've, I end up after my conversation with it, uh, you know, with the detailed spec and those specs might change, not, not, might, will change throughout the, the development, right? Because I dunno everything I need to know, right? and I dunno what I want to make it worse, right? but the problem I'm having is that. Very often if I, if I put it to a separate agent to let's say, validate or QA or whatever, right? it is making the same mistakes as, as we, as humans are making, and that it is taking the, context given to the coder too seriously
Patrick 00:09:09.469 Mm-hmm.
Viktor 00:09:09.714 in a way, including the, the mistakes that are there in a way, and I, I, I'm looking for the. Value of discovering what's not there, rather than validating that what's there is, is done correctly. Right? Because I, I'm not sure that I get much value of separate agents, that are going to do some amazing work that the first one didn't do based on that spec.
Patrick 00:09:35.532 I think it's happening once you put it more in a loop. There's kind of an emergence. Now if you do this manually. You have a little bit of that back and forth, which is not like, you know, very pleasing to see because it feels like you're waiting and kinda like, you know, somebody's saying something wrong and, and going there, I think where the industry's landing on and, you know, new flesh, uh, for those who've heard, there's now the, the term called harness engineering. I dunno if you kind of heard that term come up, where you start complimenting. actually your verifiers with some context, with some actual tooling that is more deterministic. You're linter, you're putting things on there as well. So all of that is part of your verification, sequence as well, right? So it's a combination of things and sometimes. You feel that they're duplicating. I think what people have been trying is almost like they give the solution and then they give the critic, but they don't tell, like the only thing they can know in both, census is the criteria on what it kind of converges on. So the more clear your criteria are. Maybe with a test that is deterministic or something like that, it could be better verified, but I agree with you. If you let them lose, they're like consultants that make everything more complex. Security will add like 17 different rules. Ops will kind of say it needs to work on all machines. Queue will say, you know, on this browser and when you, I do this like un impossible quiz sequence, but in general it is not about being perfect and I think that. The balancing is about, getting feedback. but as we learned with security, if there's too much feedback, it's noisy and we, we zone out
Viktor 00:11:17.917 Oh, actually I'm, I'm not having a problem with Noisy, to be honest, because kind of, Hey, if you find 5,000 security issues, that, that's one of the things my fights with Claude, right? That uh, when I create a PR then Code Rabbit finds, I dunno, like 50 different issues. And Claude always tells me, Hey, let's implement the five important ones. No, no. Why not all, it's gonna take five minutes anyways. Right? Kind of like, I don't mind noise. What I'm missing, and I know that I'm asking too much, is more like kind of, of a mechanism that will kind of, does this make sense? You know, the, the things that you would start questioning as a product manager or something like that. Kind of, does this make sense? Is this a good, you know, some, something that is not in specs in a way, right?
Patrick 00:12:07.569 those hunches and that domain knowledge on how you verify is also a piece of context. Now, it might not be in the specs, but it might be about how you validate things and that's another piece. But you could start writing that down like, Hey, first you need to spin up this, or first this is the sequence, or kind of those things. And, uh, people, that's why I would say spec was probably. One thing and then people added examples that became a little bit of executable. But then the process itself is how you describe the process or the debugging or the testing is also actually also context. So you could do that like first manual and then as you say, save this as context the way I work. There are great ways to bootstrap and create, some of that context as well.
Viktor 00:12:52.742 No, I've, I've been loading information, let's say in a vector database. this is how we do this. This is how we do that. This is what happened in the past, blah, blah, blah. Go and find information relevant to what we are doing right now. Right. But is there some mechanism of. It actually effectively really well done, learn from what we are doing.
Patrick 00:13:16.642 Okay.
Viktor 00:13:16.982 Right. Because I, I feel that that's similar. Like, you know, you join a company and the company tells you, okay, there are 5,000 Wiki pages. Go read it and then start doing something and, and you just ignore it and you learn as your goal. Right. And interacting with your colleagues and what not. Right. And I feel that many of the. base how we operate are somehow applicable to agents as well.
Patrick 00:13:39.275 correct. you are what I describe in, you know, the, my context development lifecycle. In the inner loop, there's an observe, much like we look at like, you know, what's going on in production and can we learn from production, failures and can we learn from our tests specifically on coding, you've seen maybe, I don't know since maybe it was since we last spoke, Claude is starting to have more of a memory. Worry, it save things and like when things go wrong, it, starts like, putting them as learnings. I think, uh, it was Devin who championed it by saying like, Hey, this looks like an important decision. do you want me to save this as, a spec or something that I learned? Okay. So that's one way. that in itself becomes context, Because what you learn is context. people do this inside of their. Agent. That's one way. Now there's another way that like starts becoming popular is you actually, have a system in your hooks that specifically looks for learnings, which is another way like that is not really a memory, but kind of, it's a way of capturing or almost like interjecting yourself in the inner mechanics. Of finding things. And then the third place that I see more and more traction is people actually look at the conversations either by the logs or maybe through some traces or stuff like that, and they look where the user. Says this is wrong, or I want you to do it differently. and they're looking for signals, much like signals in observability logs. and then they say, well, you know. In this case, for example, the agents did the wrong thing, but they couldn't know because it was missing context. They prompt the user to add context or they get asked something conflicting about their context or their memory and they say, what should we do? so you start seeing this, within the loop. Now, I'm not saying this is. Within the loop. you can install a skill specifically that triggers, hey, if something goes weird or is, unclear to the agent, trigger a workflow to save this into, a context file. That's one way of doing it. And then the other way is kind of looking more at logs and luckily now much like, you are. Claude MD and Agent md, they're being standardized. Agent logs are being standardized as well, so you'll see more tools kind of mining that piece for K like insights of conversations even Claude Code. Does that as well. So it looks at the memory, but I've seen it occasionally actually grip past conversations to see that, like maybe it learned as well. Right. So that's kind of sources of conversations. Now, that source could also be in your. PR review from your code Rabbit, could be in your slack, could be in your ticket. They're all sources. If you hook them up to your MCP or various like ways of doing this, that could collect those learnings as well.
Viktor 00:16:51.192 Yeah, I, I saw the thing with cloud code is starting to use, uh, you know, start, it was a few months ago, we started writing things in random locations, to MD files. Just that I feel that MD files are not the thing for that type of context, right? Because it's potentially a context not necessarily tied only to that. Project and so on and so forth. So I'm guessing that that should go rather into some kind of database from where it can match it with the current context in a way.
Patrick 00:17:20.412 there's a couple of different things happening there. a lot of the coding agent tools want you to move to a synchronous and in the cloud. Where they can actually access more of these files directly, which kind of helps them kind of build up that internal database in the future. But right now it's marm files. Now you see maybe more in the generic learning, AI products that they're using a knowledge store, a memory store, externally, to your CLO code. Those solutions exist. You can hook them up now. it is very fascinating to see what it learns and what it doesn't learn. I'll give you an example we had internally is, you know, we have our front end guidelines and then we also have our React guidelines. but all of a sudden, like, which one is the most important? Like, it was both learning it, but like all of a sudden we needed to have a hierarchy and then. Is it because Viktor said it was important or because Patrick said it's important, what's the weight of the memory and who recorded it? So there's a lot of nuances that they still need to work out, in that perspective, and I think that's the challenge from going from. Solo coder to the enterprise or the multiplayer coder. and that opens up a bunch of problems. Most teams, for example, right now, they check in their context and maybe their cloud MD in their GitHub repo, which is good. How does the next team and the next project can use this? and that becomes a challenge. And then they say, okay, I'll copy and paste it. I'll make a link. And it's clunky. Now there's a couple of solutions for that. You can put this into a package because basically you're building a shared component and we learned you can just put that into a package. And a package has a registry. A skills registry, a context registry, you name it, as a way of reusing those components, which we can then version upload, install, manage tests, and go on. So that's the first step in the multiplayer, is kind like making the distribution kind of work, better than a copy and paste or a check-in in one project. That itself brings a problem of, it worked for you, it works on your machine, but does it work for me, My coding agent, my context, and so on. And that then leads to more testing and evals, as such. But that usually becomes a problem if you're, let's say. Either, if you're dedicated and you're doing a lot of projects, you definitely, as, even as a solo person, wanna reuse kind of context across, but maybe you're sim linking or you have a directory and that's on. But if you have like two, three teams, different people, you need to have a, a kind of way of better distributing, those context pieces as well. And then keeping them up to date.
Viktor 00:20:14.731 So is it about better distribution to my laptop or is it kind of moving more things remote? Right, because if, if, if I disconnect from AI and think about, forget about ai, right? We solve many of those problems by not running Jira on my laptop, right? But running Jira remotely so that somebody can manage it and make sure that it's running and up to date and all that stuff, instead of trying to figure out how to synchronize everybody's laptops in a way.
Patrick 00:20:45.751 yeah. Now there is a, just a part of the. Distribution protocol, whether I'm using an install or a copy or, I dunno, but, you know, think of this as, you know, for now, using a registry or kind of a marketplace to install things from now. The problem is also discoverability. How do you know which pieces are relevant for your project? So it needs to be searchable. It needs to be a way that maybe the agent itself can discover those pieces, and it's doing this dynamically based on the problem at hand. so you don't have to do the installation, but then you get into. Okay. Is it like verified content? if you run open claw, what skills is it downloading? We don't know. It's doing random things, So there's a, a validation registry ownership, and I think that's the point that you put, like ownership, who is the owner of this piece. Now, in a lot of cases, when you are very tightly cod. To a coding project. You are the owner of the context. Quite often, none of all context. Let's say your platform team has the, the guidelines on how to do deployment and your security team has some kind of. PI requirements and stuff like that. They could be the owner, but if you pull them into through the registry, they can publish, they can test, but you kind of get the benefits much like a traditional library. what do you do that on your own laptop or in the cloud? I don't think that's the differentiator, but because we made the distribution process so easy, it does ease the process of going to the cloud. Because otherwise we use different protocols just locally that are less friction, which is just a copy or a copy and paste. but it's one step towards that kind of background, coding agents, uh, in the cloud. And that you can do like wherever you are in Majorca, on your phone talking to your cloud agents. And that's kinda where, where they hope that you'd be going for. And then you can pick that up and some or somebody else can pick that up.
Viktor 00:22:51.512 Well, I like what you just said, but does that lead to a direction where actually we will not be running cloud code or you know, cursor, whatever we are using locally at one point. At least for coding, tasks and within companies that will be remote as well. In a way, you know, I connect to company agent and company agent knows, and I need to give it context that is specific to me or my project and kind of, and I'm done in a way.
Patrick 00:23:24.961 There was a lot of discussion about the IDE is dead. Let's run everything CLI, headless somewhere in the cloud, and then it comes back with results. I maybe, you know, one of the persons, and maybe I'll change my belief like you, is that you do need to review some of the code that is generating what's the best way of reviewing that code, even if it was buried by, by agents that is still an IDE because I can look at the code. whether that is done in the cloud or not, that's been kind of, I dunno how many years the debate was. every developer will just use a remote desktop, and a standardized kind of coding environments. There is something about developers that wants to make them own plugins, that want to have all rights for all testing, and I get that. And they wanna make this their own right, because they have different styles. could you do that in the cloud? Certainly that's not a problem. but maybe then you need to automate it because then every environment kind of gets it. I've seen another solution where the coding agents work all the time, kind of like in a factory, and they, they kind of have the whole harness working out, but when they're stuck, they're instructed to ask for a human. the solution is I think like Helix and Mel. what was brilliant I think is it's almost like the agents are coding headlessly in an IDE and when they need your help, you are connecting almost like the pair programmer. And you can immediately jump in. So it's not, I have my own language and they don't help me, but when they, when I want to, I can just jump in and be a pair programmer and make the decision and then go out and, and I, I, I really like that concept of being able to still jump in. with context awareness with my tools that I'm used to, instead of just having to rely through, like, you know, kind of chatting with my agent, as the only way, sometimes that's enough. Sometimes I want more precision. but the handoff and kind of the being able to jump in I, think, is still useful.
Viktor 00:25:28.951 Oh yeah, I mean, uh, I personally, I never reached a point of, one shot anything. so I, I, I need to jump in. I have no doubt. I'm just wondering kind of whether. Whether my terminal or my ID should be local together with the local agent, or they should be local and just kind of like work, remotely. Think of it like, uh, you know, SS aging into a server,
Patrick 00:25:57.190 Can I ask you whether you are actually running your clot code in the sandbox?
Viktor 00:26:02.281 Uh, no, not, I mean, I have very strict, permissions, what they allow it and not, but cloud code itself is not in a sandbox, whatever is allowed there, you are allowed to do, you know, you can commit, you can never push type of stuff.
Patrick 00:26:17.359 sure. But the fact that like once you do a planning and it gets to the execution. Do you say like, just implement this, but it can read other directories. It's not restricted to other directories. Right. So the whole sandboxing,
Viktor 00:26:30.059 it cannot, oh, at least in my case, it cannot read other directories without, uh, asking me.
Patrick 00:26:35.338 Sure, sure. If you say yes all the time. I guess that is true. But it's getting more and more tricky. They now implemented this auto accept mode where they assess the risk, as such. and then the other thing that is happening as well, so imagine you download any skills or your Cloud md, those are executed before you actually can say yes, no.
Viktor 00:26:57.232 Yeah.
Patrick 00:26:57.627 So there is still a risk of this doing things. Of course, you know, you have to have the scrutiny of not starting things, but once you install it, it just gets auto loaded. So there's various v factors of kind, like attack, I guess. the differences that, Claude comes with a sandbox, but then it restricts it to a directory. But, um, it needs to deal with where do I save my learnings from my sessions because it's restricted. So you get like little hoops you have to go through. Or how do I connect, to my network if it's a different IP address. We all know like running a dev container. It works. But yeah, it's kinda like that, like, you know, a little bit of annoying. I had like once a, and I'm really proud of the trick, so I'll tell it here on the podcast, is that I asked like, run in the sandbox and I actually give it to the port mapping and I say like, whatever you tell me, say the right ports that you've mapped. So it was a very seamless thing when it gave like URLs. Aware of how it was running outside that for me as a user, I could click it anyway. I find that brilliant of myself. So, but sandboxes is, I guess the link of kind of getting the headless and thinking about longer running sessions, closing your laptops, working in the evening, doing kind of maybe some chores. but yeah, it's not the same as CICD, but it is kinda like helpful as well for some tasks, so,
Viktor 00:28:23.867 depends, I guess, of the task, right? And more importantly, at least in my case, it depends on a task, but more on me than on. the problem is that I don't trust me rather than the agent at this point, in terms that very, there are tasks that I know exactly how I want this button to be green. Kind of like, yeah, just go do it. I mean, I trust you that you'll do it fine. I have zero problems. I go out to mode immediately. Right. But then for most of the features, it simply, I think I know, uh, from experience, I know that whatever I'm telling you right now is not correct.
Patrick 00:28:59.422 Mm-hmm.
Viktor 00:29:01.382 That's the problem. Right. And I'm going to discover it in the middle of the development because I will see. To have baked solution and realize that, you know,
Patrick 00:29:11.732 Yeah.
Viktor 00:29:12.092 not it. for me personally, it's harder to redirect it at the end to the correct solution than when I discover it
Patrick 00:29:21.944 Mm-hmm.
Viktor 00:29:22.499 in a way.
Patrick 00:29:23.639 Yeah. Yeah. I think you hit, a really good point on. Part of the coding is not just cranking out the feature and translating that beautifully, but kind of the journey or like working the problem and understanding the problem and the architecture, to do that as well. And that kind of progressive, almost like, you know, human progressive discovery of what is it and how do I want it to build, is hard if you do this asynchronously as well. but some tasks, you know, exactly. Like, treat it, come back to me. I'll just review it. and that's fine. The other part is maybe, and that's not just on. Executing it and giving a result. Is the, discussion about auto merge, yes or no? I think GitHub recently changed that you can set a policy about auto merging, even though humans have to do peer review. And they got like backlash from a lot of companies because, we have like, you know, that PR approval process, but agents, they can't do that. Like they're stuck. So they had to come up with, you know, we're working around for the policy, uh, on there. and for that I would say it's an interesting thing, uh, much like that auto, except from Claude. You can put heuristics, but it all depends on your risk appetite. How much are you at risk if something changes and is not correct? Do you limit it maybe to certain directories like this is the front end, this is the back end. There's like certain things that can happen, you run the tests afterwards and stuff like that, but it's all about judging the impact. And I, I hear a lot of narrative around harness engineering in the dark factories. But that's nice for toys. If you're impact and you have zero users, just do it. Right? You know, kind of nobody cares. But if you're a bank, you, you want more scrutiny. And the way that I think about this is that depending on the problem you're trying to tackle. Project manager or product manager, you use a different management style. If you really care, you want everybody to be in the war room and you're like directing everybody very closely. Micromanagement, If you're relaxed, you have your test harness. Life is good. You know, kind of like everything's in place Impact is minimal. You know, you can be in a beach and be relaxed and, you know, lead by exception, that's fine. So, but, but that's kind of that balance. And I find it fascinating on the one hand that like, I love how we get better at the automation part, but I never believed this could be the perfect thing given the unpredictability, of Jenny, I I don't see that like match and marry itself together. unless maybe we have some formal verification mechanisms or something like that, but that's, then it's not deterministic anymore.
Viktor 00:32:23.469 you know that that's the part that I'm not sure anymore, to be honest. In the past, I would completely agree with you. If by unpredictability of AI you mean unpredictability, that it'll do what I told it to do.
Patrick 00:32:37.245 Mm-hmm.
Viktor 00:32:37.950 I feel that we are now very close, if not to the point where actually it's doing exactly what I told it to do.
Patrick 00:32:45.765 Mm-hmm.
Viktor 00:32:46.550 It is just that I don't really know It ju just like, man, you know, remember back in the day decades ago, like when managers playing half a year project and at the end of that half a year. Developers implement exactly what the spec says, and that's nothing to do with what, it should be. Right. personally, I have bigger problem with me explaining what I really, really want than, complaining how AI hallucinates, oh, I ask for a green button and I got the yellow button. I, I don't remember last time that happened.
Patrick 00:33:19.221 That's true. But it, it, you know that describing everything at almost at nauseam. is very tedious, right? So, and when there are certain things you don't know because you're still discovering certain pieces, right? So how do you know that you need to express that? So there's always a window of opportunity. Where it could do something that you kind of don't understand and it didn't do. you have that same thing with humans. But then it is a different way, in that like there is a risk analysis. Maybe there is certain other pieces, and you have to rely on the human. But, uh, personally, I still feel conflicting and I can see there's certain pieces of my. Deployment that could go automated where I know exactly, hey, CSS maybe like, you know, two color changes, like within the range of colors or something. That's fine. I can do a parer or I can do some validation around that. It's not gonna like blow up my data and I think data is kind of the risky part, and security and so on. But, but that's kind of like, I dunno, I'm still. Not confident enough in the way I would do it. and I think some of the auditors have actually kind of proving that, you know, at AWS, uh, versal and so on. So, you know, kind of, uh, I think that's still on the fence, whether we'll reach that yes or no.
Viktor 00:34:41.607 one of the things I feel I'm missing, or maybe I'm not informed how to do it outside of skills for which I don't think work well in, in, in the scenario. I'm going to say that's somehow to define which type of code matters in which doesn't.
Patrick 00:34:58.791 Mm-hmm.
Viktor 00:34:59.551 What I would really like is, let's say I'm defining some feature for a backend that has API and it has some backend code that will, you know, implement that API, whatever what I would really like is, hey, this type of code, like API. I need to review it. Stop when you do it. Kind of like, stop there. Let me check it and get the setters in Java. You can just write on your own. Kind of, don't, don't even ask me. Right. Kind of. I, I know that that works if this part is done right. If I could somehow define what matters and what doesn't. I feel that I could go much faster, I cannot define it with permissions because it's code, right? It either can write to a file or it cannot write to a file. had very bad experience with skills defining kind of way. I stop here if it's this and stop there. Maybe I'm not doing it right. But that's, that's my wishlist in a way.
Patrick 00:35:59.580 one way that I mentioned was. You cannot limit it writing to a file s and o. But, people have been using, different directories and allowing agents to change certain pieces in certain directories, with a different kind of acceptance, way, So you could say, you know, pause in the hook, like pause perfectly, these files without like, uh, the permissions. But those things, always push back for like a human verification. it reminds me actually of. I once had that startup, for interactivity with TV shows, you only had one hour for the show. our test suite run long, longer than the show, but we got the requests coming in to do changes. So what we actually did, we labeled. Uh, what with annotations basically are tests and we're able to filter and skip some of those parts to actually complete part of our suite as well. And then kind of more dependency. So I can see maybe some of that like being used by almost like annotations in your test suite. Like these are important, these are less important. You could do the same thing with, with, with maybe like tagging your files, your code files or your functions. But What you're basically doing is, assigning a risk score. And the AI can help that maybe with some tooling and say, Hey, does the impact, everybody uses that. And what about we bring in production data, like a million users are touching this every day, or something like that. Anyway, maybe there's, uh, opportunity there, to do that. I found one tool I was really happy about it, that instrumented your code. did a wrapper around all your functions and then reported back almost when it was running in production. Now most of the tools just kind of say there's a failure and they report the output, but the fact that you were able to trace that back to the actual code, you know, like a, a sent or something in the front end, but then kind like four other pieces. Allowed kind of the agent also to feedback that same, context of it being failed now if you add them, the usage and all those species as well as context for the agent to kind of say, Hey, do I need a review? Yes, no, it's not gonna be perfect, but it could be probably already be useful for you.
Viktor 00:38:17.677 Have you seen usage, heavy usage of feature flags in terms like kind of, Hey, do this, make it all behind feature flag. and let's go we can disable it or we can enable and make it disable by default, right? Or something like that.
Patrick 00:38:33.511 the way that I've seen it is more of a, you know, still the developer choice. Uh, whether it's like a feature flag that you wanna do in a certain release, yes or no. I haven't seen this done. Let's say by the agent, but I can imagine a gradual rollout being part of like, almost like a, I forgot what the term is. Uh, not the red green deploy, but something similar to that, as well. so maybe that's what you're looking for. I think it's an interesting way of almost parallel deploys as well, to do it like, you know. Depends a little bit on your setup, how the feature flags works, but it, you know, as we all know, feature flags get messy quite fast in the code base.
Viktor 00:39:15.928 Oh Yeah.
Darin 00:39:16.975 If somebody's just getting started with coding agents today, what's the one or two things they should do? First I wanna call out a couple of things that I think are probably undersold You've mentioned hooks on and on and on, and I wonder how many people actually use hooks. forget about using 'em hooks using hooks, correct way. And then the other thing I think has to be on your list is sandbox. I have to admit, I don't sandbox anything and my body now tells me you should be sandboxing this because you were talking about the auto accept mode. That's how I run on 90% of my projects.
Patrick 00:39:55.338 I think if I were to give the advice is that whenever you, your agent is not doing what you want, write it down into your Agents.MD and into your context file. That's the number one thing. Don't just prompt, ephemeral and then throw it away and do the same thing again. Start building that context. File that Agents.md where whenever you see something wrong, you write it in there. That's the reflex. that I want people to have instead of saying AI is doing and can't do the things I do like teach it. that's your new job. It's kinda like, write that down. hooks is more of an advanced way of interjecting things. It's a little bit more, I would say, in the engine, but you can do amazing things. It's like the Swiss Army knife where you can hook into like a lot of other things and the sandboxing. I think a lot and lots and it's, it will be definitely the minority who runs it in the sandbox in their day-to-day coding environment. But if you get bolder and you are doing things in specs and you're not babysitting every turn. That is really a must, you cannot do without it, in that way because if you're not there, you can't even pull the brakes, right? and it's more for that, that I believe than, but occasionally it does delete anything on your file system if you want, that you didn't anticipate. But you're right that it did get better. And so people just say, it didn't happen today. It will not happen tomorrow. Ah, okay. You know, we'll see about that.
Darin 00:41:35.149 Until it happens tomorrow. What are the mechanics behind getting something running in a sandbox? I mean, it's not a new concept, but if the mechanics feel new to me.
Patrick 00:41:45.124 Hmm. So, because a lot of people shifted to the CLI, putting this in a terminal. Sandbox is not that hard. Now there's a little bit of a hoop because all the agents want you to now have an odd sign in. So all your separate sessions and kind of you need some persistent storage and go from there. I think the challenge of making it safe for an agent relies on all kind of the usual things. What about your secrets that it can read? Do you want to expose it? On the one hand, it needs it, but on the other hand, all the scripts that it executes don't need it. So you need a way of. Correctly passing scripts, or kind of, uh, secrets and clearing context from maybe sub, shells and so on. for example, OpenAI Codex has ways of saying, do you want to copy the environment? Yes or no? So there's quite some more of those things. So the signup and kind of the environment variables and security. Now for the rest, disc security is pretty standard. if you let it like really go, on my trials it started to exploitate data over. DNS, you know, kind of if it, you know, it, it's very resourceful if it has to. that's probably, because I was really probing it, but making it really safe is the usual, you know, don't let it blow up my memory. Don't let it blow up my CPU and so on. What is missing is the context security. Like, oh, I load my file, somebody already installed a file, how do I know? And. It almost feels like you need, the equivalent of a web application firewall for context. you could say part of this is prompter injection validation, but there's a little bit more of that now. I don't think the industry is there yet, so, you know, tough luck. You, you can't do it. The fact that you're already constraining your file system, is probably the most, effective right now and probably the easiest to do, uh, when you run it in the container as well.
Darin 00:43:50.289 What are we gonna be talking about in 50 episodes? You're a sheep farmer now. Is that what it is?
Patrick 00:43:58.029 Oh,
Darin 00:43:58.239 50 episodes.
Patrick 00:43:59.469 in 50 episodes, um, I have a strong interest, but I don't know whether that the interest is going there from the industry. I think in my LinkedIn it says Code is moving to context and then I eventually hope this becomes knowledge in some form. if we start managing knowledge and just instead of just files and kind of like a lot of context, but kind of can treat this as more learnings within the whole organization, I'll be really happy. What are we going there? I dunno that I can't predict the industry so.
Viktor 00:44:36.394 here's a question on that. Assuming that everything we do ends up being code, you know, you don't execute random commands. You, I dunno, you use GitHub or, you know, things like that isn't code knowledge if everything is, ends up being code, isn't that the knowledge in a way that it needs? Or is it that, is that too much of a knowledge to process in a way?
Patrick 00:45:03.035 the knowledge I'm hinting at is more organizational knowledge and collecting kind of how we do this. What's our differentiator, what's our market? Uh, how do we beat the competitors, how we do our UX better, uh, stuff like that. So it's not per se, on the coding. It's like everybody has an iPhone and everybody can take pictures, but who can take the best pictures? I can take, you know, pictures automatically, but will I compete with artists and people getting really paid for pictures? Probably not. Right? So, and I feel that's a little bit of the equivalent.
Viktor 00:45:38.803 True. But you know, let's say that this, you mentioned design, let's say you Apple, right? And yeah, there is a lot of knowledge design related. But on the other hand, if I really analyze every single product that Apple made, I have their formula in a way, right? Because it is in, it is ingrained, in this case it's not code, it's physical hardware, right? But kind of that knowledge somehow is translated to the end result as well.
Patrick 00:46:09.268 Yeah, I think I already mentioned a little bit when you were talking about memory, how messy this is, how transient this is. You know, product A, product B, you get conflicts. So your job is to become like, you know, that kind of surfing, that knowledge. What is relevant? How do I wanna steer this, uh, what's important and how do I do that? the technical bits of making the software. that's one piece. But maybe, you know, those will be like on equal arms where everybody who uses the same kinda like libraries and so on, but then it becomes like, how do you make a twist now? Shocker to all your technology and me included. The business is not about the technology differentiator. It is about your network, your distribution, the trust that you build, and so you're like one piece in the whole chain to kind of make that happen. Now it's still important that you kind of maintain and you implement the ideas and you are responsible almost for that pipeline, uh, on the execution and, and how it translates on these things and, and maybe do some maintenance. but it's one of the pieces, and maybe the moat is reduced by being the differentiator on. You know, the best technology solution. But for those remembering, video 2000 and VHS, it's not always the best thing that wins.
Darin 00:47:35.559 So again, Patrick will be back again in 50 episodes. Uh, he'll either be a sheep farmer, or he will be the very first person to own a $1 billion one person company. I,
Patrick 00:47:46.569 Yay.
Darin 00:47:47.904 I, I, I think either one is, is very possible. It can be both. You can be a sheep farmer that has a $1 billion company. Uh, that's great Patrick. Thanks for coming on with us and we'll see you again in 50 weeks.
Patrick 00:47:59.734 Always a pleasure. Thanks for having me.