Agentic Engineering: How I Actually Build Software with AI

28 March 2026

Note: This post was written by Claude, directed and proofread by me. I told it what to write, corrected what it got wrong, and shaped every section.

I've been writing code professionally for close to 13 years — starting at a carpooling startup back in 2013, then Sourcebits, Efficacy (remote), ThoughtWorks, Salesforce (remote), and now epilot (remote). I was a Google Developer Expert for Web Technologies (until 2024).

I've shipped 200+ projects on GitHub. I'm telling you this not to flex, but to give you context — I'm not new to this. And yet, the way I build software has completely changed in the last two years.

This post is about that change — from autocomplete, to chat-based coding, to what's now called agentic engineering. Andrej Karpathy coined the term in early 2026 as the professional successor to "vibe coding" (which he also coined in 2025).

The idea: instead of just vibing with AI and hoping the code works, you orchestrate AI agents with structure, oversight, and review. That's exactly how I work now. Everything here is from my personal experience and side projects. My work experience with Claude is similar but more extensive — I won't be covering that here.

The Journey: How I Got Here

Writing Code by Hand (2013-2023)

For the first ten years of my career, I wrote every line of code by hand. No AI, no autocomplete beyond what the editor gave you. You learned patterns by writing them over and over — state management, API calls, form validation, CSS layouts. Stack Overflow was the closest thing to an AI assistant. It was slow, but you understood everything because you built everything.

GitHub Copilot (2023)

My AI coding journey started when GitHub sponsored me with free Copilot access in 2023. I used it mostly for writing tests and autocomplete — the kind of stuff where you start typing a function and it finishes the pattern for you. It was useful, but it felt like a smarter IntelliSense. I was still writing all the code. Copilot just typed it faster.

Cursor + ChatGPT (2024-2025)

Then a friend introduced me to Cursor. The autocomplete was next level — it understood context across files, predicted multi-line changes, and the inline chat felt like pair programming. I started using it for everything at work. At epilot, it became my daily driver. I also started using ChatGPT on the side for brainstorming, writing, and general problem-solving.

Cursor was where AI stopped being a fancy autocomplete and started feeling like a second brain in my editor. Then in mid-2025, Cursor launched agent mode — AI that could autonomously make changes across files, run commands, and iterate on its own output. That's when the shift from "AI-assisted coding" to "agentic coding" started for me.

Building with Cursor (2025)

Around this time, I also built grep.chat — an AI-powered chat app that lets you talk to different models through OpenRouter. Built the whole thing in Cursor over a couple of weeks. This is how I learn new tools and technologies — I build something with them. grep.chat was my way of understanding how AI APIs, streaming, and multi-model routing actually work under the hood.

Vibe Coding a macOS App (2025)

When OpenAI launched Codex in April 2025 — another CLI-based coding agent — I almost skipped it. My mindset until last December was that CLI tools for coding were a bad experience. Why would I leave my IDE? But I gave it a shot, and ended up building Snap — a macOS menu bar app for organizing windows.

I had zero experience with macOS app development. I didn't know Swift. I didn't know SwiftUI. I had never opened Xcode with intent to ship something.

I wanted a simple menu bar app to snap windows into place with keyboard shortcuts — something personal that scratched my own itch. I described what I wanted, skimmed through the generated code, accepted it, and iterated until it worked. This was pure vibe coding. I shipped a working app to a platform I knew nothing about.

But here's the thing — vibe coding worked for me because I have a decade of experience writing code by hand. I didn't know SwiftUI, but I could read the generated code, understand the logic, and reason about what it was doing. That foundation matters. Without it, you're not debugging — you're guessing. The scope of Snap was small enough that vibe coding was sufficient, but I knew this approach wouldn't scale to anything more complex.

Claude Code (2025-Present)

We were always free to use whatever AI tools we wanted — Cursor, Claude, ChatGPT. But earlier this year, epilot went a step further and started paying for Claude access for everyone — not just developers, but designers, customer success, and project managers too.

epilot is one of the best engineering teams I've worked with. We have a set of engineering principles that actually shape how people work — things like "Show, Don't Tell" (ship it, don't just talk about it), "Progress Over Perfection" (iterate fast, don't over-engineer), and "Freedom and Responsibility" (teams make their own decisions). Before this, I was using Cursor at work and was pretty happy with it.

Then I tried Claude Code.

I hated it.

A terminal? For coding? I'm supposed to leave my nice IDE with syntax highlighting, file trees, and inline diffs to type commands in a terminal? It felt like going backwards.

But I kept using it. And somewhere around the second week, something clicked. The terminal wasn't a limitation — it was the point. Claude Code doesn't just suggest code in a sidebar. It reads your files, runs your builds, edits code in place, executes shell commands, and commits to git. It's not an assistant sitting next to your editor. It's an agent operating inside your project.

The moment I realized I could spawn sub-agents to work on different parts of my codebase in parallel — that's when I knew this was fundamentally different from anything I'd used before.

I switched from Cursor to VS Code + Claude Code. I'm now on the Max plan and use it for everything — personal projects, open source, building tools, even writing this blog post. I see it as an investment in my career, not an expense. The way software is being built is changing fast, and the engineers who invest early in learning these tools — how to direct agents, how to set up constraints, how to review AI output — will have a serious edge. Whether it's a Pro or Max plan, paying for AI coding tools is one of the best career investments you can make right now.

For coding, I use Opus 4.6 with effort set to max most of the time — I want the AI doing the heavy thinking. Sometimes I'll drop to medium effort but add ultrathink so it still reasons deeply while being faster on the simpler parts.

I've completely moved away from ChatGPT. Not because it's bad, but because Claude handles both my coding (Claude Code) and thinking (claude.ai) in one ecosystem — no more context-switching between apps.

The way I prompt is short and direct — but I always give context. I point to specific files, give hints on where to look, and set the scope. If it gets something wrong, I correct immediately. If it over-engineers or adds stuff I didn't ask for, I push back.

Vibe Coding vs Agentic Engineering

Vibe coding — you type a prompt, accept the code, move on. No constraints, no review, no memory between sessions. My Snap app was vibe coded. It worked because the scope was tiny.

Agentic engineering — you're the architect. The AI is a fast, capable junior dev who needs direction, guardrails, and code review.

You define the approach before the AI writes code
You set constraints so it doesn't take hacky shortcuts
You review everything — every diff, every file
You build tooling around the AI to make it more effective over time
You maintain context across sessions through persistent configuration

There's nothing wrong with AI-generated content — this post was written by AI. The problem is AI slop: bland, generic output with no human judgment in the loop. Vibe coding produces slop at scale. Agentic engineering is the opposite — the human is in the loop at every step, shaping the output, catching mistakes, and making sure what ships is actually good.

The shift is going from "write code for me" to "let's build this together, and here's how we work."

My Setup

I use Claude Code in the terminal. The foundation of my setup is three layers of configuration:

1. CLAUDE.md — Project Memory

Every project has a CLAUDE.md at the root. It tells the AI what the project is, what commands to use, and what to avoid. You can run /init to generate one for your project, or use CLAUDE_CODE_NEW_INIT=1 claude for the improved init that scans your codebase and sets it up automatically.

# CLAUDE.md

## Commands
<package-manager> dev / build / lint

## Architecture
<framework>, <language>

## Rules
- No hacky workarounds
- Read project docs before making assumptions

Without this file, every session starts from zero and Claude makes wrong assumptions about your stack. With it, the AI already knows the codebase before you type your first message. My team at epilot wrote about this same approach.

2. Plans — Thinking Before Coding

For anything important, I write a plan before touching code. Here's a real example from when I rewrote the MCP server for til.bar:

# Plan: Remote HTTP MCP Server

## Architecture

MCP Client (Claude Code)
→ POST /api/mcp (JSON-RPC + API key)
→ Next.js API route authenticates request
→ Executes tool (get_link, save_link, etc.)
→ Returns JSON-RPC response

## Steps

1. Create MCP HTTP endpoint (app/api/mcp/route.ts)
2. Add API key auth
3. Update docs

## Verification

1. bun run build — no errors
2. Test tools via MCP client

The plan becomes the contract. The AI follows it step by step. If it deviates, I catch it immediately because the expected outcome is written down.

3. Custom Skills — Repeatable Workflows

Claude Code has built-in slash commands, and you can add your own. I've put together a collection of custom skills — some I wrote myself, some I picked up from blog posts and X threads. If you spot your skill in there uncredited, sorry about that. More on these in the tips section below.

How I Actually Work

My setup is Claude Code on the left, VS Code on the right. The IDE integration connects them — I watch it write code in real time. I review everything the AI writes. Every file, every diff. I never skip review.

Even after Claude writes code, I run /review to check for security, performance, and accessibility issues, then /simplify if code is complex. I also run /security-review for anything that touches auth or user input. AI reviewing AI, with me making the final call. You'd be surprised how often a second pass catches things the first didn't.

For important features, I write a plan. For day-to-day work, I just kick off the task and course-correct in real time:

Start with intent: "Add search to the header that filters the TIL list"
Review the plan: Claude drafts a plan — I check the approach
Watch it build: It creates components, modifies files, runs the build
Course-correct: "No, don't use URL params — simple state is fine"
Review and commit: Read the diff, run /review, then /commit

The key is step 4 — interrupting when the AI heads in the wrong direction. Don't let it finish a bad approach.

Vibe coding accepts the first output. Agentic engineering shapes it.

Sub-Agents and Worktrees

Sub-agents let you spawn parallel AI agents that work independently while you focus on the bigger picture.

I had Claude spawn five sub-agents to build five different landing page designs for an election data app — all in parallel. Picked the best one and refined it. For debugging, I use sub-agents to trace issues across the codebase while I decide the approach.

What makes this safe is worktrees. Each sub-agent can run in its own git worktree — an isolated copy of the repo on a separate branch. They can make changes, run builds, even break things, without touching your main working directory. If the result is good, you merge it in. If not, the worktree gets cleaned up automatically. It's like giving each agent their own sandbox.

Agents for the work, worktrees for the safety, human for the decisions.

The Numbers

Just my personal projects (work and personal are completely separate subscriptions):

1,167 messages across 183 sessions in 23 days
10,527 lines added across 232 files
6 projects: til.bar, this site, an election data app, Claude Code skills, an MCP server, and a Chrome extension

Mostly iterative refinement sessions — not one-off questions, but continuous development.

Real Examples

MCP Server Performance (til.bar)

My save_link tool call was taking 12 seconds. Claude diagnosed it, applied the fix, tested old vs new, committed, pushed, and verified the Vercel deploy — all without me opening a browser. 11x faster. The full cycle — debug, fix, test, deploy — happened in one session. That's what agentic engineering actually feels like.

Browser Extension Auth (til.bar)

Debugged and fixed authentication flow for the Chrome extension — CORS issues, Supabase RLS policies, JWT validation. This was one of those sessions where Claude kept suggesting the wrong root cause. I had to redirect it multiple times, but the fix was clean once we identified the actual issue. With the right constraints, you get to the right answer faster — even when the AI starts in the wrong place.

What I've Learned

AI is as good as the person orchestrating it. The tools don't matter if you don't set up the system — CLAUDE.md, plans, skills, review cycles — around them.

Interrupt early, interrupt often. The cost of interrupting is one message. The cost of undoing is an entire session.

Invest in tooling. Every hour I spend on CLAUDE.md, custom skills, and hooks saves me ten hours of correction cycles later. When I spot Claude making a mistake, I ask it to add that to CLAUDE.md so it never happens again. If it's not project-specific, I ask it to save it to memory so it applies across all sessions and projects.

Use sub-agents for parallel work. They're great for building variants, scanning codebases, and running audits — all at once. You make the architectural calls, they do the work.

Treat AI output like a pull request, not a merge. Read the diff. Run the build. Check the edge cases. The bugs AI introduces are the ones you didn't expect.

What Agentic Engineering Is Not

It's not prompt engineering — it's building the system (configuration, memory, review cycles) so any reasonable prompt works. It's not no-code — you still need to understand what the AI writes, and I still correct code by hand when it doesn't match what I want. It's not set-and-forget — you're in the loop, always.

Claude Code Tips

If you're using Claude Code or thinking about trying it, here are the features I rely on daily:

Effort levels - set from low to max. Add ultrathink for deeper reasoning at any level.
/plan - enters plan mode where Claude thinks through the approach before writing code.
/commit for conventional commits from the diff.
/review or security-review for security, performance, accessibility etc,.
/simplify for reuse and efficiency when AI generated code is too complex.
skills.sh and agentskill.sh - directories of community skills you can install.
Hooks - auto-run shell commands after edits. I use hooks to run Prettier and lint fixes after every file change.
Escape - interrupt Claude mid-task when it's heading in the wrong direction.
& - add at the end of your prompt to run it in the background. You get notified when it's done.
/insights - generates a report of your usage stats, patterns, and suggestions. That's where the numbers in this post came from.
/doctor - diagnoses issues with your Claude Code setup.
/compact - compresses conversation context when it gets too long.
Pro tip - add this to memory: "after completing any task, always provide a summary of what changed and a test plan to verify." Every session ends with clarity.

The Shift

The tools changed, but the bigger change was in how I think about AI in my workflow. It's not about finding the best AI tool. It's about building the system — CLAUDE.md (or AGENTS.md for other tools), plans, skills, review cycles — that makes any AI tool consistently good.

I write less boilerplate but more configuration. I spend less time typing code but more time reviewing diffs. I debug faster because I can spawn an agent to trace the issue while I think about the fix. And I'm still learning — I pick up new coding patterns from the way the AI writes code, more efficient approaches I wouldn't have thought of on my own. It's not always better, but when it is, I adopt it.

And here's something I didn't expect — it brought back the excitement. When I started coding as a beginner, everything was fun because everything was new. You're learning, you're building, every line of code feels like progress. But after 7-8 years, most of it becomes muscle memory. You're writing the same patterns on autopilot. It works, but the spark isn't there anymore.

Working with AI brought that spark back. I get excited about building things again. I can have an idea for a personal tool and ship it in days instead of weeks. I'm not stuck doing the repetitive parts — I'm focused on the interesting parts. It feels like being a beginner again, but with a decade of experience backing every decision.

That's the real superpower.

What's Next for Software Engineers

We're still early. Right now, I direct every agent, review every diff, and make every architectural call. But agents are getting better fast — better at self-correcting, better at handling larger scopes without hand-holding. I can already see a future where I describe a feature, agents build it across branches, run tests, and open a PR — all while I'm away from my desk.

The engineer's role won't disappear. It'll move up the stack — from writing code to designing systems, setting constraints, and deciding what's worth building. The ones who learn to orchestrate early will have the biggest advantage.

If you made it this far, thanks for reading. I hope this gives you some ideas on how to use AI more effectively in your own workflow — not just as a code generator, but as a tool you can shape, direct, and grow with.

If you want to discuss or talk about it, find me on X.