Building a Repo-Aware Coding Agent

November 2025

You see a lot of hype about "autonomous coding agents" these days. Devin, Claude Code, you name it. Everyone says they are the future of software engineering. I got curious. Is it really that complicated? Or is it just a glorified loop that calls an LLM?

So I decided to build one from scratch. I used Amp to help me code it (ironic, I know—using an agent to build an agent), but the architecture is straightforward. I call it "Mini Claude Code".

The full source code is available here: stangeorge/coding-agent-cli. Feel free to fork it and break things.

The Stack

I didn't want to overcomplicate things. I stuck to what I know:

Node.js & TypeScript (because types save lives)
Vercel AI SDK (to talk to the LLMs)
Anthropic's Claude 3.5 Sonnet (the brains)

The whole thing is a CLI tool. You run npm start -- "Fix the bug in the user service" and it goes off to work.

How it actually works

It turns out, the "magic" is pretty simple. It's a loop.

1. Plan: The agent looks at the history of what it has done so far and the user's task. It decides which tool to use next. "I need to read the file to understand the bug."

const systemPrompt = `
You are an autonomous coding agent working on a local code repository.
You have a fixed set of tools available:

- readFile(path: string): read a text file from the repo.
- writeFile(path: string, content: string): overwrite or create a file.
- searchInRepo(query: string): search for files containing the text.
- runTests(command?: string): run tests in the repo (e.g. "npm test").

Constraints:
- Paths must be relative to the repo root.
- Prefer small, incremental edits and frequent test runs.
- Stop when the task is complete or you are stuck.
`;

2. Act: It executes the tool. I gave it a few basic ones: readFile, writeFile,searchInRepo, and runTests.

3. Observe: It reads the output of the tool. "Oh, the file says X."

Then it goes back to step 1. It keeps doing this until it thinks it's done.

// The main agent loop
for (let i = 0; i < config.maxSteps; i++) {
  // 1. Ask the planner what to do
  const decision = await callPlanner(task, steps);

  // 2. Handle "done"
  if (decision.type === 'done') {
    console.log("Agent finished:", decision.message);
    break;
  }

  // 3. Execute tool
  if (decision.type === 'tool') {
    const result = await tools[decision.tool](decision.input);
    steps.push({ tool: decision.tool, result });
  }
}

Safety First

I'm not crazy enough to let an AI run wild on my machine. I built a sandbox. The agent can only read and write files inside a specific sandbox/ directory. It can't access my system files or run dangerous shell commands. True story: I once saw an agent try to rm -rf a directory because it was "cleaning up". Always sandbox your agents, folks.

The "Whoa" Moment

I was testing it with a simple task: "Write a function that adds two numbers and a test for it."

It wrote the function. Then it wrote the test. Then it ran the test. The test passed. It felt a bit like watching a junior engineer's first successful commit. Except it happened in about 30 seconds.

Then I tried something harder. I gave it a file with a subtle bug and asked it to fix it. It used searchInRepo to find the file, read it, analyzed the logic, and wrote a patch. It even ran the tests afterwards to make sure it didn't break anything.

It's not all sunshine and rainbows

Sometimes it gets stuck. It reads a file, thinks "I need to read this file again," and enters an infinite loop. I had to add a maxSteps config to kill it if it goes rogue.

And context windows are real. If you feed it a massive repo, it forgets what it was doing 5 steps ago. But for small, focused tasks? It's surprisingly competent.

I should clarify: this is just a toy project. Building a robust, production-grade agent is incredibly hard. Armin Ronacher wrote a great piece on this called "Agents are Hard". Real agents need to handle ambiguity, undo mistakes, and navigate complex dependency graphs. My little CLI tool is just scratching the surface.

Conclusion

Building this stripped away a lot of the mystery for me. It's not magic. It's just clever prompting and a feedback loop.

If you're a software engineer, I highly recommend trying to build one. You'll learn a ton about how these LLMs actually "think" (or don't think). Plus, it's pretty cool to have a CLI tool that can write its own tests.

If you want to chat about agents or see the code, drop me a line.

← Back to home