Home Blog GenAI Beyond the hype: real-world AI development workflows - practical insights by Michał Czmiel

Beyond the hype: real-world AI development workflows - practical insights by Michał Czmiel

In this episode of Agile Product Builders Tech Edition, Piotr Majchrzak sits down with Michał Czmiel, senior software developer and tech lead at Boldare, to explore the evolving landscape of AI in real-world software development.

Michał shares insights from his experience building AI-first products, experimenting with the latest AI tools, and shaping best practices for developers. From one-shot coding tasks to complex multi-step workflows, he dives deep into how AI agents are integrated into modern development pipelines, transforming productivity, collaboration, and code quality. Check out the full transcript and watch the episode.

Table of contents

Piotr (host): Cześć! I’m Piotr, co-CEO of Boldare, and you are watching Agile Product Builders Tech Edition, a 25-minute series for product builders. In this episode, we go deeper into real-world AI development workflows, as we are interested in how developers actually use these tools beyond the hype. Today’s guest is Michał Czmiel, senior software developer and tech lead here at Boulder.

Michał is joining us, as you guessed, for the second time on the show, so we have a chance to talk about what’s changed in the state of AI since we last spoke. And that was already four months ago. Michał has been with Boulder for over seven years now, and he has shaped some of the most successful products we’ve built.

He’s part of the guild of AI early developers, where we experiment with the newest tools, and at the same time, he’s building an AI-first product from the ground up. I think it’s a nice welcome chapter for you, Michał. It’s great to have you—welcome to the show.

Michał (guest): Yes, very nice. I hope I can live up to it, but yeah, nice to be here.

The AI revolution: Hype, progress, and the pace of change

Piotr: It feels like the AI revolution is moving at lightning speed, maybe even faster than JavaScript frameworks. Remember the last decade? What do you think about it?

Michał: I think you can approach it from different angles. In terms of tooling, it’s definitely very fruitful. We had a period when new AI tools for building applications were appearing constantly — things like LLM callings, cost analysis tools, proxies, and gateways. Now I feel like we are in the hype cycle for coding agents.

Maybe this is connected to the large investments and evaluations, with many believing you can build your own agent software, sell it, and succeed. But yes, I do feel we are progressing.

Some people expected bigger leaps with the release of GPT-5, as if it would open a completely new frontier, but I still think each new model is an improvement over the last one. Overall, I think it’s a very interesting time right now.

AI in action: From one-shot tasks to complex workflows

Piotr: It’s really interesting. So tell us — walk us through your typical coding workflow with AI. Let’s say, from small one-shot changes to more complex tasks where you use plan documents or step-by-step scaffolding. How does it look?

Michał: All right, so I think what changed from the last webinar to this one is I try to use the AI-first experience more often. And in terms of adding new code, developing new features, I would highlight three main approaches, three workflows as you said.

First is just one-shot, where you specify a prompt, usually for small changes like adding a new test case to an existing suite, or adding a new field to the database and returning this field from the REST API, or adding a new page in an existing onboarding flow. So you have to feel in your project what is the best scope for this one-shot. But I would say most of the tasks can be one-shotted by agents right now. And when I say agent, I primarily use the Cursor agent, but we’ll talk about other tools later.

And if the output is big—sorry, if the output is wrong, if it couldn’t be one-shotted—often it’s that the task was too big or just your prompt was not specific enough or too vague.

Then the second approach I like to employ is the plan document. So either I write the plan myself and AI just refines it, pokes holes in the plan, suggests improvements, or AI prepares the plan and I read through it, edit it, and then the agent executes this plan. This plan is almost 100% of the time written in a Markdown file. It’s just better to edit. You can open it in full screen rather than having a small prompt window. You can use your keyboard shortcuts for this plan and also tab completion, which also speeds things up a lot.

And here, for the plan, I use it for larger features. For example, we already have a Gmail sign-up flow, let’s add sign-in with Microsoft. Or also for something I call intelligent refactors. So not only search-and-replace, but also search-and-replace if/when conditions apply.

And just a final tip on the plan approach: often the plans generated by AI have lots of fluff that is necessary for humans, like estimations, observations, risks. In my opinion, the best way is to prune it once before you feed it to the AI agent, so it doesn’t pollute the context.

And my normal workflow that I use for maybe complex features or a big user story: I research the plan, I research the feature. In the previous webinar I mentioned that I was using AI Studio by Google. Now I switched to using the Claude UI because I like that it gives more compact responses. You can specify whether it should be “thinking” or not, and I also like the conversational style.

So once I have the idea in my mind of what I want to build, I manually scaffold most of the structure in the code, because I like to keep control and guide the AI into specific approaches. Then the agent does the unit of work: I prompt the agent to do something, then it runs the automatic validation—linting, type checking, tests, building the app—and this is triggered by the LLM.

Then I validate this piece of code myself. I review the code. If something is terribly wrong, I run it again with an improved prompt. If it’s good enough, I make manual changes. Then I commit this piece of code and repeat this flow.

So instead of creating a crazy prompt saying “create this whole onboarding flow with X pages,” I still try to use the AI agent but work in these unit-of-work approaches.

Piotr: Yeah. And I still think we are at the early stage of discovering these workflows, especially within a team. For more complex tasks you have a lot of artifacts you can share — not only with other engineers, but also with quality-assurance engineers. Like last time we had Milena, for instance. Documentation is becoming something that is not only needed for users but also for the creators. Whether it’s an LLM creator or yourself, it doesn’t matter — you still own the code. You’re not handing ownership over to an agent.

Keeping AI coding secure: Human oversight and best practices

Piotr: So do you already have security measures that you apply in this setup? I’m even thinking of hashtag AI-generated line markings, or obviously code reviews, which should be mandatory. But things like API keys, or personal data being pasted into LLMs — what do you do in this case?

Michał: Yeah, so you can spot AI-generated code. Often it has a lot of comments, but it’s not necessarily bad. In terms of security measures in a team, you said a very good thing — it’s a team effort right now, right? Everybody has a coding agent.

So one of the security measures is to make sure that all of your developers know what the limitations are and have agreed on certain rules or standards. This way, code that could be malicious, isn’t performant, or could cause issues won’t enter the codebase.

I think the crucial thing is still a mandatory code review culture. You should be the first reviewer. Even if you only have a fragment of AI-generated code, even just tab completion, you should look at it as if you didn’t write the whole thing. So I think code reviews should spot a lot of issues.

Of course, you can also add GitHub Copilot or another tool that can pre-review the code. Lots of tools like static analysis could be included as an automatic step after the LLM finishes. There are also a bunch of tools that search for secrets or other hardcoded things.

For example, there was a famous attack in recent weeks where a malicious infected library used local Claude code to search your codebase and directories for secrets, environment variables, and API keys. So this is not super related, but often those agents — like Cursor or Claude Code — can access all the files you have.

Personally, in terms of secrets, I prefer fetching them dynamically. For example, when I need to run some script, I can fetch those dynamically from something like AWS Parameter Store or another password manager. They also have CLI tools that can inject security environments into your running process.

So yeah, I wish I had some magic solution to the security problem in AI, but I think it still comes down to human oversight.

Piotr: Yeah, but we have to remember that we still own the code. And the second thing is maybe dialogue in the team — so a team contract or a policy is a good way to go. Okay.

AI Beyond Coding: Automating Chores, Documentation, and Cross-Disciplinary Tasks

Piotr: But outside of pure coding, which AI use case has been the biggest change, the biggest game changer for you? I’m thinking about diagrams, log analyzers, or refactoring documentation.

Michał: Yeah, I would say all the chores that some people — myself included — in the team didn’t want to do or were lower priority.

For example, some stakeholders or external teams want up-to-date documentation of all the functionalities that one of our services provides. Keeping this documentation current can be tough and can easily go out of sync. So you can use an AI agent: “Hey, please analyze these files and generate a markdown text.” Then you can run a tool called Pandoc, which translates markdown to PDF, and voila — you have a very nice report.

Also, I did one project where we had to move one database to another, and there was a bunch of wrong data, invalid formatting, and inconsistent casing. What I found interesting is that I made the agent work in a loop, and I think that’s a very good solution for these kinds of problems. Keep the agent in this closed loop: in this case, “Here are the CSV files, here are the problems, and here are the tools.” We were using Pandas in Python for cleanup and CSV processing. Unless everything is fixed, you need to iterate over it until all issues are resolved.

There are a bunch of other use cases I found in documentation. I’m a big fan of diagramming and creating concepts in Excalidro, which are often very low fidelity and maybe not the best to showcase. I can paste it into an AI agent and say, “Please generate a diagram as code in PlantUML” from this low-fidelity diagram. Or I can even add it to a prompt to implement something.

I also see use cases for self-review, like: “Please debug this code, please improve it, or analyze potential refactoring issues.” One team in Boulder used AI to conduct a migration of the Symfony framework. They had one version and wanted to migrate to another. They ran an AI agent with Opus (Sonnet for Opus), which created a migration plan, and then another agent executed it.

Finally, having this AI “body” is very empowering for crossing into different areas. For example, if you’re mainly a frontend developer, you can use AI to learn more about databases and endpoints and contribute to that part of the system. If you’re a backend engineer, you might explore DevOps with AI, for example using Terraform or CDK.

I also think these agents are perfect for prototypes or landing pages. Those used to take a lot of time to develop, but now they can be easily “uncoded.”

Piotr: Yes, I think you brought a very interesting point — these LLMs can make Agile teams more multidisciplinary. And also, when I hear developers talking about documentation, I see some kind of change. So yes, the boring stuff can be fun again. That’s great.

Choosing the right AI tools: Cursor vs. Cloud Code

Piotr: Okay, when we last talked about four months ago, the big shift was AI-first IDEs, Cursor, and now Cloud Code has also become mainstream. I think maybe even 50/50 among developers at Boulder. How does it look on your side today?

Michał: So I tested both, just to have some opinions and get a feel for them. But my primary tool is still Cursor, and recently I’ve been testing it with GPT-5.

It had some problems at the beginning with the intelligent router that detected which models to run and with performance, but they have slowly improved it. So yeah, I’m very happy with this setup. Of course, interchangeably I use Cursor with Claude and Sonnet models, as sometimes GPT-5 just takes very long to process. But still, I think the full package you get with Cursor is very nice, and I also still use its tab completion a lot.

I write some of the code manually, if you want to note that. The other approach I tested is with Cloud Code, where I used the Z Editor, which has a very nice tab completion model. Actually, they’ve been adding lots of features, like a UI for Cloud Code. They’re even working on a new Git-like system for AI agent collaboration.

I feel that with Cloud Code, you mostly go into a delegate approach, where you don’t need to read every div. I think it’s a simpler tool because it doesn’t have a UI or all the extra layers that Cursor adds by indexing your codebase. So Cloud Code is much simpler, and that’s very nice.

I found it very suitable for smaller projects, where I don’t need to dive deep into every error. You can just say, “Here’s the screenshot of the problem, please fix it.” So it’s more AI-driven than manually code-driven.

I’ve also seen interesting use cases. Some people installed Cloud Code on a server, so you can SSH into the server, say, “Hey Claude, please write a Python script that serves something,” and then run it. Or it could configure the whole server.

To close this topic, we have this shift: some people use Cursor, some Cloud Code, some Copilot. As long as you are productive, and even if there’s disagreement about which tool to use, it’s fine. Most tools have now caught up. Now it’s mostly a matter of pricing, because these tools have very different pricing models — API-based, API-key-based, or subscription-based — and also different access to models.

For example, Cloud Code is currently limited to Anthropic models, which isn’t an issue as they are top models, but who knows what will happen in four months.

Piotr: Yeah, exactly. I can maybe oversimplify it: for you, Cursor is still the more advanced and better option. On the other hand, Cloud Code is simpler — you don’t have to leave your favorite IDE, and you still benefit from many AI-enhancing functions. The ideal combination would of course be… a “starting flame world.”

Michał: Yeah, I feel the ideal combination would be Cloud Code with Cursor’s tab completion and div statements, which I don’t know if it’s financially viable because you’d pay for both subscriptions. But I think that’s the best approach.

Piotr: Oh, we have to try it. Okay. And I think you’ve tried it, but maybe that’s for another time.

Best practices for AI agents: Prompts, validation, and trust

Piotr: Let’s go to the best practices working with AI agents — prompt, validating results, and deciding when to trust the agent versus doing it manually.

Michał: So recently I’ve shifted more into AI-first. I’m sometimes even curious how AI would solve a problem rather than me. And then, being inspired, I’m like, “Okay, yeah, maybe it’s a good approach, but let’s try it,” or let’s throw it out and start with something new. There are a bunch of best practices I could talk about.

One I think is really important is that you should be driving the AI agents not only with your rules. I think we haven’t mentioned roles in the previous webinar, but those are essentially whether it’s Cloud MD, Agents MD, or Cursor rules. There should be… there should be an elevator pitch about their product: what tools the AI coding agent can use, what the guidelines are, what it should do, what it shouldn’t do, and the product structure, so it doesn’t have to search everything every single time.

And this automatic validation — I highly recommend attaching a sentence saying: after making a change, please run, make sure the tests are valid, that the types are valid, and that linting is checked, for example. Then you have this loop: the agent can finish the code, finish the implementation, and if something is broken, it can automatically fix it and run it again.

Another tip: some tools are more advanced. They can run automatically only on the changed files, so they don’t have to run the whole test suite for the entire project, only the changed parts.

Other improvements or best practices I’ve found: you should drive the agent with your code. I’ve seen lots of solutions feature mocks or very untestable code. So you can start using dependency injection and patterns that guide the AI toward a good solution. You mentioned MCP in another webinar — one big use case I’ve seen some developers in Boulder use is Context7 MCP, to make sure the AI has up-to-date documentation and all the context needed for all the features.

To close down my rant: I think most of the tips can be boiled down to your prompts. From what I’ve learned, I try to structure each prompt with three parts.

First is the context: “Analyze these files, understand this mechanism, look here.” Then there’s the action: what exactly we want to do — be very specific. And this is my second biggest tip: use very specific action. Talk to the AI as you would to another advanced developer. For example: use dependency injection, use server actions, use indexes — very specific wording.

Finally, the third part is limitations: for example, “Think before you execute, don’t worry about backwards compatibility, simplify this code without changing any functionality.”

Piotr: Thank you, Michał. That was very valuable for me, and thanks for all these practical insights. I hope the viewers liked it as much as I did.

From manual to AI-first: Are workflows ready?

Piotr: Do you think the AI workflows are already ready to become the default way of working?

Michał: So, speaking with other colleagues, I feel like—as I mentioned before in the webinar—there’s this lever, right? On one side, you have more manual coding, and on the other, more agentic coding. I feel like I’m still in the lower part because, once you hear all the blog posts from major LLM labs like Anthropic, they said most of the code for cloud code is written by agentic coding.

And I think that’s the goal. Implementing those practices—and the practices your team develops—aims to have the agent handle as much of the boring work as possible. You do the thinking, but the agent implements it, so it can increase your velocity and help you deliver more.

And, of course, you still need to make sure the quality is there. But yeah, I think agentic workflows are the way to go.

SUMMARY:

Michał shows that AI is more than just hype—it’s actively reshaping how developers work. By combining AI agents with human oversight, teams can focus on creative, high-impact tasks while letting AI handle repetitive work. Here are the key takeaways from the interview:

Key Takeaways:

  1. AI Tools Are Rapidly Evolving: More options, faster innovation, but hype can be misleading.
  2. Workflows Matter: One-shot for small tasks, structured plans for bigger features.
  3. Human Oversight Is Essential: Code reviews, security checks, and team policies are a must.
  4. AI Handles Repetitive Work: Documentation, refactoring, and data cleanup done faster.
  5. Cross-Disciplinary Learning: AI helps developers explore new areas beyond their core expertise.

Piotr: Very interesting. Thank you very much, Michał.

That’s it for today’s episode of Agile Product Builders Tech Edition. Thanks for joining us, and stay tuned for more practical insights on how AI is reshaping the way we build software. Thank you.

;