DOP 309: Using AI Agents in Daily Development Tasks

Episode: 309

Published: July 30, 2025

Viktor 00:00:00.000 So what made me flip is when I reached the point, I said, okay, this is. 5% faster. Let's go small, right? Or 10%. I'm not talking about, hey, I'm 10x engineer now, but I am faster. I finish something faster with than without it.

Darin 00:01:22.520 So here we are. It's July of 2025. If you've paid any attention, you know, I, my life has been turned upside down this year and we're starting to record some new episodes and I figured, you know what, when I left all this stuff was going, Viktor was still, what is AI about? And I return and all of a sudden this is the only thing he's doing.

Viktor 00:01:48.107 Yes. I wasn't, what is AI about? I was. Why the heck would anybody use this?

Darin 00:01:54.527 Right, and that's what we're going to sort of delve into over these next 4, 5, 6 episodes that are just me and Viktor. Now, typically, me and Viktor are every other episode from a guest, so it won't be back to back to back. So you'll get to see those or hear those as they come out. But this first one. I want to rewind back to January, February again before my world turned upside down and go back to that point. We had GitHub copilot. We had a handful of other things. We had windsurf, we had, I've lost track of everything that was outta cursor, but it just seemed like a bunch of toys. Am I wrong in that thinking?

Viktor 00:02:38.957 That was either before or just when agents started being a thing. back in that time, you could, uh, if I remember, I might be now missing it for a month or two, right? But if you go copilot, you could chat and you could edit, You couldn't run it in agent mode where you just go crazy. Oh, let me check the files on your file system. Let me execute this command. Let me go and check what's happening in your GitHub. Oh, blah, blah, da da da da. And then after 5,000 things he did, it says. I should change this file, the period you're talking about is before agents or when they were just starting?

Darin 00:03:21.022 Right, and again, that basically I've been out since end of February. Here it is, we're recording this on July 1st, so that's not that long. I. March, April, may, June, four months,

Viktor 00:03:36.957 it's nothing in calendar days. It's everything in the speed of how fast things are moving right now.

Darin 00:03:45.795 but are they moving in the right direction? let me set this up here a little bit too. I just saw yesterday, I believe it was June 30th or June 29th, 2025, Gartner put out a report saying that 40% of Egen AI projects are gonna be canceled by 2027.

Viktor 00:04:08.812 Yeah, and there should be, I think that more should be canceled. Yes.

Darin 00:04:11.962 Okay. It said at least 40%. If I remember correctly. If I remember, I'll put the link to that news release out and then show notes. But again, that's a agentic ai, but let's scope it. What we're gonna be talking about is just what we do, development or that kind of thing.

Viktor 00:04:28.802 let me just quickly comment on what you just said about Garner or whomever it was. What is happening right now. Actually, what happened a while ago is that companies were investing heavily in models and then they realized that that's a full servant, right? That they are never going to, that. Most of them are never going to reach even. Part of capability of existing models, And then they said, okay, so I'm not going to build my own model. I'm going to use Tropic. I'm going to use open ai, I'm going to use gr, or whatever. and then if you go back in time I'm pretty sure that there were similar news like a year ago that how actually the investment team models is dropping within enterprises. Now what is happening with that report is that that companies are realizing the same thing for agents. Oh, agents appeared, I should build my own agent. Excellent. And then you face reality. A reality is that you are likely, let's say that this agent for coding, you're likely not going to build escap agent as, uh, plot code, and then the investment in agents drops. But that does not mean that the investment within enterprises in AI in general is dropping. Does that make sense?

Darin 00:05:43.591 That does make sense. I agree with that. But for our conversation. I'm just trying to step back into the world and again, my world upside down and now I'm stepping back into it and it's like, who am I here with? It's like everything has changed in four months, four calendar months,

Viktor 00:06:09.936 Yes.

Darin 00:06:10.466 And it feels like I am completely lost. I mean, I knew some of the things before. It's sort of like when. You step away from something like, let's do this. You and I today in 2025 are not DBAs. Correct? Unless you've got something going on the slide I don't

Viktor 00:06:29.636 I'm not DBAI know about databases, but I cannot call myself, expert, DBA? No.

Darin 00:06:35.383 Right? So when we think about running sql, it's like, oh, we can do the basic SQL stuff. You'll never forget your first select count star, right? You'll just always remember that.

Viktor 00:06:47.068 Yes.

Darin 00:06:47.698 And you'll remember your group buys, but now if we step back in, it's like, I don't even have to write that anymore. It's like when ORM showed up or like Hibernate or some of the other ones, I didn't have to write anything anymore. It just automatically created it all for me. That's where I feel like I'm at right now is like, okay, I was used to writing sql, putting it in my code, and then all of a sudden an ORM showed up and I don't have to write that stuff anymore. That's where I feel like we're at in the AI world. Yes,

Viktor 00:07:16.105 You don't have to write anything, but depending on what you're doing, the level of commitment you need to put on checking what is written varies.

Darin 00:07:26.410 Okay. Stop. That's where I want to go because I'm trying to learn this. I'm trying to get back up to speed. I. This is much for everybody else that's listening. Viktor and I are basically having a conversation right now because I'm really trying to get my head around what's happening in the world. What does that level of commitment look like? Because I've seen some of the prompts that you've written, and I'm going, how did he have so much time to write that? That's, I, I, I, I don't understand. I mean, what, what is your thought process? What, what made you, actually, we'll start with this question first. What made you flip? While I was out from, this is a toy to, I can't imagine my job without doing this.

Viktor 00:08:08.646 What made me flip is when I reached the point where actually save more time than lose with ai. Now many people will tell you, Hey, you know you, you have those stories in blog posts and videos from a year ago or two years ago. Oh, this is now a hundred x engineer, right? I can build a Tetris like up. 10 minutes. That was never a real world. That's why I was negative. I, in the past, I had to spend more time with AI than without AI when working on an existing app, That has certain complexity that you cannot just decide to change the framework for testing. You cannot make random decisions, you need to follow the rules of that application, and you need to figure out what's in hundreds of thousands of millions of lines of code, right within that context. Until, let's say beginning of this year, I would spend more time with AI than without, and then things changed. It improved. So what made me flip is when I reached the point, I said, okay, this is. 5% faster. Let's go small, right? Or 10%. I'm not talking about, hey, I'm 10x engineer now, but I am faster. I finish something faster with than without it. And that's the same rule that applies to anything else, right? Give me an idea, new idea. If I'm faster with that, ID, I'm going to switch from whatever I'm using, right? That's how it works.

Darin 00:09:46.064 But you're not using an IDE except when you are,

Viktor 00:09:49.684 Oh, that's now a special story.

Darin 00:09:52.399 okay. Let's, let's hear the story.

Viktor 00:09:54.617 I used, first, I used VS. Code with copilot. That was the mistake that held me away from ai. Let's start with that. It's not good, It's getting better, but it's not good. It's not the choice. Right? Cursor amazing. You're more productive with it, but it still makes a bunch of mistakes. And then I discovered Cloud Code, which is terminal based. And it's not a pleasant experience, right? Because you know, there, there are things in ID that simply make your life easier. How you see the code, how you see the diffs for changes. There's a bunch of things that will most of us like about ID as opposed to writing in, let's say Vim, right? But then back and forth between cursor and cloud code and. Coming to today, I'm completely in cloud code. It's so much better as an agent, and that just proves the importance of agents, just to be clear, right, that I'm willing to remove all the benefits of using ID because terminal based agent happens to be so much better, right in results. So the better results outweigh the benefits of id. So I'm now fully in code, code. I'm, I'm not in cursor. I'm not in uh, in VS. Code or any of those things terminal based and not because I'm a terminal guy. I mean, I'm a terminal guy for operations, but not for code. I never liked the idea of coding, but it's just so much better now. And now comes the important part. In my opinion, CLO code ular based agent, significantly better than cursor. Agent inside of an id, which is fork of VS code, right? Both of them are using exactly the same model. Both of them in my case, are using Sonet four, and that means that actually the agent itself matters a lot, right? If models are the same and results are different, and the only difference is actually different agent than actually. That's almost like a proof that actually there are differences between agents. It's not kind of, oh, which model is better?

Darin 00:12:12.037 Right. But if, even if you've got two different front ends, for lack of a better term, going to the same backend, we already know that models are non-deterministic, It doesn't matter if you're using Cursor and Claude Code or Via or vs code, whatever. It doesn't matter because I would expect the answers to come back differently. But what you're telling me is I should expect the answers to come back the same. Is that true or

Viktor 00:12:36.988 no, no, no. The the answers are going to differ, right? Depends also, yes. But the quality of answers. differs from one agent to another, how it collects code, how it keeps context, and so on and so forth. And one of the big differences is the price. Uh, first design of the agent, but also the price because cloud code just eats your tokens like crazy, right? Cursor works on subscription. I think it's 20 bucks a month or something like that, right? Uh, eventually it'll give you a slow request, but it's a one time subscription. Now, cursor has all the interest to com compact that context all the time, right? To ensure that as, as little as sent through model as possible, so imagine that it went through thousand lines of code. There is. Few thousand lines of text that you were comm talking with it, and so on and so forth. plot code will send all of it to sonnet and tropic model while cursor is going to compact it as much as possible so that the amount sent there, which translates to tokens is smaller. that's not the only difference. I mean, simply cloud code agent itself is better, but that's that, that's one of the differences as well.

Darin 00:13:57.727 So let's rewind for just a second. You're throwing around the word agent, like I really completely understand it now. In other context, agent makes total sense, but the way you're using it right now is confusing me and I don't know if it's confusing anybody else that's listening. So what do you mean by agent in the context of Claude Code and. Cursor. I mean, I'm, I'm, I'm sort of getting lost now.

Viktor 00:14:21.369 So if, if you take us as an allergy LLMs models would be brain, right? They contain information. You ask it and you get the question. Agents are doing stuff, they're like arms or legs, right? Hey, um, you want to check this, uh, you want, you wanna fix a failed test? And I give it information about the failure, right? Or not the issue in GitHub, right? I give it information. So what does it do? It combines communication with the LMA. So what do they do with this? And Ellen tells it, oh, well check the code. Right? Find the instances of that method. Excellent. And then agent goes through the code, finds that all the information it needs maybe lists all the files in a directory, maybe goes to GitHub to get that issue in the first place. maybe, maybe checks the com. It's right. It performs operations primarily, uh, pro operations and. Significant part of those operations is generating context that is sent to LLM, And then it send to LLM, Hey, this is the issue. This is the code that matters for that issue. This is the file system, this is this, this is that. Tell me what to do next. think of it as, uh, something that performs operations. And it's synchronized with models, basically Models provide the information it might need.

Darin 00:15:45.644 Okay, that makes sense. So it's just a different tool that I'm using. And now I'm not using tool in the AI phrasing or this whole agentic thing. I'm just saying a hammer or a screwdriver or something.

Viktor 00:15:59.011 essentially you have three important parts, model. That's where information of the whole planet and whole history of the planet is located. You have agents that can perform operations, right? Whatever those operations, gathering data, doing something, whatever that is. Go to my Gmail, right? And there is a third component, which is called MCP or Model Context Protocol, which can provide interface to something. Think of it like. API is designed for agents, let's say that I have two options. Let's say that I want to a, tell me what are all the issues I have in this project, right? Option number one is agent by consulting, uh, model can find out, oh, okay, so I should execute some GI command and GH commands. And, uh, I, I do that. I execute gh, get the issues, whatever, and I give it to you. Right? Another option is you have MCP that is protocol designed for agents sitting between you and GitHub. Then agent exactly knows, because each of those CPS expose what they call tools. Like there is a tool get issue, right? There is a tool this, there is tool that and uh, uses that as an interface to talk to, usually to an API. Now, in case of GitHub, that's a still example because agents are more than. Capable of using CLI. But now let's say that's Slack. How do I post a message on Slack? I mean, API, nobody knows what API or Slack is. Uh, CLI, I, I don't have a Slack, CLI on your, on my laptop and probably you don't have it either. I dunno. Even whether it exists. So how does an agent, like fit with intelligence from models talk to Slack? Well, MCP. Protocol designed precisely for that API for agents.

Darin 00:18:02.999 for Slack you could use Curl, but you'd have to know how to send it and all the

Viktor 00:18:08.109 Yeah. Eventually it would, uh, go through the whole internet to find the curl examples. Eventually, it would do it right. This is No, no, no. Actually there are those five tools. Like read a message, uh, post a message, uh, get the, the whole thread. Whatever the, the tools are. They call it tools. Think of them as a P endpoints, and it does it immediately.

Darin 00:18:29.684 So an MCP server basically speeds up our accessibility for things that cannot be quickly found in the model, or maybe they don't exist today. Like you were saying, the GH CLI exists today, so there's a high probability that the GH CLI might be able to do everything. You don't need an MCP server to do stuff with

Viktor 00:18:49.019 Yeah, I like, I rarely use GitHub MCP, but I do use other cps that are not so easily available. I mean with, for destinations that are not so easily available.

Darin 00:19:01.409 right? So there, there are use cases potentially where you would use the GitHub MCP, but it's only because the CLI doesn't cover that use case.

Viktor 00:19:08.024 Oh, that's a special now story. agents tend to have a limitation how many MCP tools they can load anything between 40 and a hundred. above hundreds start, they start struggling seriously. Right? And then one MCP can be, let's say, for Slack, can be, and I don't use slack MCP, but so, I mean, I'm half inventing here. Let's say it has 20 tools, like, post, uh, something, read something right? So you can easily reach that limit of how many CPS u have connected to one agent. Then I prefer to, if I need to kick something out, oh, let me kick GitHub agent out, right? So I, I would probably use GitHub, sorry, GitHub, MCP if I have no other CPS in that project or for that agent. But as soon as, as I start reaching those limits, that's the first one to go out simply because the agent can do stuff with, get with, uh, CLI.

Darin 00:20:03.118 That seems okay. We can go down the rabbit hole here and we'll probably go a little bit longer. Why do we have problems when we're crossing that 40 to a hundred or whatever magical thing is? What's it seems like I'm just defining a list of servers to integrate with. Why is it capping out?

Viktor 00:20:24.605 It's for performance reasons, uh, because it needs to keep all that in memory, right? When it starts, it loads all cps. There are descriptions there, there is a bunch of information provided by each of those cps, right? That's why it's not. Kind of a hard limit. Oh, a hundred, exactly. Or 40, right? It really depends on how much information should be loaded and kept in memory for all those cps. Right? And also, memory is context, and context tends to go out and get compacted or removed. And context management is actually the real challenge over there.

Darin 00:21:03.514 that being said, if I was running on a low power, call it eight gig machine, 'cause that's low power today, I might be able to load in X number of MCP servers. I. Just using X, but then if I had 128 gig machine with a higher number of cores and a more beefy machine, I could get to X plus Y number of MMCP servers loaded in just because I've got more capacity, more ram, more CPU, more whatever is necessary for that MCP server to actually

Viktor 00:21:40.114 Whatever that something is, yes. But in any case, you will reach you. You will easily, if you start doing ai, you will. I guarantee you will reach the limit of CPS in no time.

Darin 00:21:52.839 but then wouldn't I just in, can I have multiple instances of cloud code running? Because it's just an agent, right? So I could have one instance of cloud code running that is doing GitHub, mm, whatever else. I could have another. Agent that's dealing with, I don't, I'm just thinking about this. SDLC. Right. You know, I'd have an agent that sort of covers it. Am I

Viktor 00:22:18.679 Now we're coming to a special conversation. And that's, uh, are, are we talking about agents that are disconnected? Let's say you have two terminal tabs. Is that what we're talking about? Or are we

Darin 00:22:31.259 Yeah.

Viktor 00:22:31.969 connected agents?

Darin 00:22:33.289 Okay. At the moment I'm talking about disconnected agents.

Viktor 00:22:36.769 Yeah. Then you will have a problem with context, right? Because oh, you do something in one agent and then you want to do something in some other, but that other has no idea what you just did there.

Darin 00:22:45.454 Right. And that's fair. Okay. That, that makes sense. But,

Viktor 00:22:47.854 That would be like two different people doing different types of tasks without talking to each other, but all within the same project. Right.

Darin 00:22:56.554 Right. Same pro I from, from a meta level, not, not meta Facebook, but a meta level. It's all one project, but I've got two standalone people. I guess it would be putting it in the physical world. I've got a guy working in an office in Seattle and a guy working in an office in Miami

Viktor 00:23:14.759 And one of them is reviewing your prs and prs, and another one is pushing those prs. But the one reviewing prs never finds out when the, when you push a pr.

Darin 00:23:26.049 Right. Okay. So that, that is a fair analogy. In that case I thought computers were supposed to make our lives easier, not harder.

Viktor 00:23:35.814 Oh, you want a story that happened right today that illustrates, okay, so let's say you work on a project or a project, you start a new project and you work on it for, I dunno, four weeks a month. Heavily every day, eight hours a day, you go like crazy and then you discover that you are on the wrong path. It's a wrong design. It happens sometimes, right? What do you do? Do you throw it all to trash and start over, or do you try to fix that design knowing that it'll likely cause even more suffering in the. Future. Right? Because fixing a wrong design is is a challenging thing, right? What do they do?

Darin 00:24:27.354 more than likely I'll end up in the latter. I'll just try to fix it 'cause I, it's sunk cost fallacy. I've already spent all this time, I can't

Viktor 00:24:34.509 Yeah. Everybody does. I mean, I'm not giving up a month of heavy work every single day for whole day, but I'm going to end up in a worse place probably right now. That happened to me. Today I'm in a true iteration of something I'm working on, right? True ation. Meaning that for a third time I started from scratch because I was not going in a direction that I really think I should be going. I discovered that over time. Now the difference is that when I said earlier, one month of work, that's what I would, I'm estimating that it would take me. And in reality, it took me probably two days of work, three days of work before throwing it to trash. Right? If that's not the help, I dunno what it is. And on top of that, you throw something to trash, you wanna start over, but then you need to struggle with your internal memory. Kind of like, okay, what did I learn from this? Right? Kind of like, how should I, you're not documenting first month of development of a new project. That's not your. Primary focus. Nobody is right, but this is okay. After two or three days, this is not going where I want it. Document everything we did, all the conversations we did everything that happened. Create a lessons learned pro, uh, document. Throw it to trash. Let's talk about it again.

Darin 00:26:04.131 With the new context being what we learned.

Viktor 00:26:06.943 Yeah, yeah. Kind of everything. Current design, current implementation, everything about the project as it is right now. All the conversations that led us to conclusion, mutual conclusion that this is not it, right? Document that as well, and so on and so forth. And then literally. I'm not joking. Delete all the code. We are starting over. I could never do that after a month of work. I mean, I could do it, but I would never make myself do it.

Darin 00:26:43.895 This is where we're gonna stop this episode because we'll pick up from this point because I want to know a little bit more about, not say all the details, but I, I wanna understand that and understand. The tooling I have to air, we can't use the word tools anymore as a generic term because a tool means something now. I wanna understand what are those things, how do we think about that? Because if we started out with what we thought was a good requirements document and we went down the path and we found out it wasn't good, then how, you know, what is our step by step that we gotta do? So I think that's what we'll do in the next episode. Did you, you have anything else to say to sort of wrap this one up? So I, I agree with you. There's a 0% chance if I worked on something for a month that I'm gonna completely throw it away and start all over again.

Viktor 00:27:34.648 But you should,

Darin 00:27:35.758 but you should, right?

Viktor 00:27:37.048 had it at least once in your life.

Darin 00:27:39.088 Oh yeah.

Viktor 00:27:40.528 Oh,

Darin 00:27:40.708 More than once. More than once. But a lot of times that choice was not mine to make, and that's the hard part. It's like we really should ditch this and other things come in and say, Nope, we have to keep going because fill in the blank.

Viktor 00:27:59.261 Yeah, we don't have another month for you to

Darin 00:28:01.346 No.

Viktor 00:28:01.721 that it might not work again.

Darin 00:28:03.946 That's where we'll pick up the next episode. So if you're listening to this and you're just starting to get into the AI thing, maybe you're like me and you're just saying, you know what? I'm gonna let this. Just keep going. I've seen this story play out numbers of times in my career. I'm just gonna let it play out. I think I have to get on the train now. I don't know that I needed to be on the train much before Now. What do you think about that, Viktor?

Viktor 00:28:24.667 Let's put it this way. Let's say that there are no future advancements. We will never be better than we are right now, which is ridiculous thought, but let's say that, everybody should still adopt it.

Darin 00:28:39.114 With that in mind, head over to the Slack workspace. Look for episode number 3 0 9 and leave your comments there.