I Built an AI Code Commenter Before Copilot Existed

In February 2026 I gave a five-minute talk at a Porch Software brownbag on how I use AI to write code comments and documentation before pushing. Five developers, five minutes each, sharing how AI had changed the way they worked. My portion was about the specific workflow: highlight a method or class, ask Claude to generate XML doc comments and inline annotations, review them, commit. Fast, consistent, genuinely useful.

What I didn't have time to mention in five minutes was that I'd been chasing exactly this workflow for about three years before it became that easy. Back then I had to build the plumbing myself.

The problem I was trying to solve

Writing documentation comments for C# code is one of those tasks that's straightforward in principle and tedious in practice. You know what the method does. Turning that knowledge into a clear, consistent XML doc comment for every parameter, return value, and edge case takes time that feels disproportionate to its value, especially when you're working in a codebase where the commenting standard is inconsistent or nonexistent.

In 2022 I was at Hamilton Manufacturing, working in a legacy .NET codebase that had exactly that problem. Lots of business logic, not much documentation. I'd been following the early OpenAI API with interest and it occurred to me that generating comments was a much simpler prompt problem than generating code. You give the model the code and ask it to explain what it does. The input and the expected output are both well-defined. It seemed worth trying.

Building AICommenter

The result was a Visual Studio extension called AICommenter. The core idea was simple: select a block of code in the editor, trigger the command, and get back the same code with comments added. From the user's perspective it was one keyboard shortcut and a few seconds of waiting.

Under the hood it was a VSIX project built with the Visual Studio Community Toolkit. The toolkit handles the extension scaffolding, including the BaseCommand<T> base class for commands and BaseOptionModel<T> for surfacing settings in Tools > Options. That part was straightforward. The interesting part was the AI integration.

I was hitting OpenAI's /v1/completions endpoint with text-davinci-003, which was the best available model at the time for this kind of task. The chat completions API didn't exist yet. You constructed a prompt string, sent it, and got a completion back. No conversation, no system prompt, no context beyond what you packed into the request body. The prompt I settled on was minimal: take the selected code and append "Re-write the above code adding in-line comments explaining what it does." That was it. Adjust the temperature, set a max token limit, and see what came back.

The options page in Tools > Options let you configure the API key, model, temperature, max tokens, and frequency penalty. I exposed all of those because I was actively tuning them to get better output, and I wanted to be able to adjust without recompiling. A lower temperature (around 0.5) produced more literal, conservative comments. Higher values introduced more interpretive language, which was sometimes useful and sometimes confidently wrong. There was no good default that worked across every type of code, which was itself an interesting lesson.

What it was actually like to use

Honest answer: inconsistent, but often impressive enough to be useful.

On a well-structured method with clear variable names, text-davinci-003 would produce comments that were accurate and often better-phrased than what I'd have written myself. It understood C# idioms, it could identify what a LINQ chain was doing, and it handled XML doc comment syntax correctly most of the time. For straightforward cases it genuinely saved time.

On anything more complex, the limitations became obvious. The model had no awareness of anything outside the selected text. If you highlighted a method that called into a service defined elsewhere, the comments might describe what the code literally did line by line without understanding what the service was for or why the method existed. The comments were syntactically correct but semantically shallow. You couldn't give it context you hadn't explicitly included in the selection, and selections large enough to include that context frequently hit the token limit.

There was also a latency problem. The round trip to the API and back was several seconds on a good connection, which sounds minor but feels significant when your muscle memory expects IDE responses to be instant. You'd trigger the command and then just wait, staring at the status bar progress indicator.

And the output wasn't inserted back into the editor automatically. Getting that right required manipulating the VS text buffer in a way I hadn't fully worked out, so the result came back in a way that still needed a manual step to apply. Not ideal. In practice I used it more as a reference to write from than as a direct replacement for my own typing.

What it was useful for anyway

Despite those limitations, I kept using a version of the workflow throughout my time at Hamilton. Not always through the extension directly, but the habit it established stuck: before pushing code, spend a few minutes on documentation. Use the AI to get a first draft of the comment, review it for accuracy, adjust it for context the model couldn't have known, and commit. The output was a starting point rather than a finished product, and that turned out to be a reasonable division of labor.

The extension also ended up in the same solution as a separate VS extension I built at Hamilton for internal tooling (a URL command launcher for frequently used internal pages). Both of those projects gave me something more durable than the tools themselves: familiarity with how VS extensions work, how to interact with the editor API, and how to wire up configurable options in a way that felt like a proper VS citizen rather than a hack.

Three years later

The five-minute brownbag talk in February 2026 described a workflow that was almost unrecognizable compared to what I'd been doing in 2022, even though the goal was identical.

The shift from completion models to chat models was the biggest single change. With a chat interface and a well-constructed system prompt, you can give the model context it needs: what the class is for, what the broader module does, what conventions the codebase follows. The output improves dramatically when the model understands the why, not just the what.

IDE integration removed the latency problem and the manual application step. Copilot and Claude Code both operate inline, with awareness of the surrounding file and (in Claude Code's case) the broader project. You're not copying text out of a selection and pasting a result back. You ask for what you want and the change appears where it should.

Context window sizes went from a hard ceiling of a few thousand tokens to something that can comfortably hold an entire file or set of related files. The shallow comments that resulted from isolated selections are largely a thing of the past. The model can see enough of the codebase to write something meaningful about where a method fits in the larger picture.

And the models are simply better. Not incrementally better. The difference between text-davinci-003 at 0.5 temperature and a modern Claude model on a code documentation task is not a matter of degree. It's a different category of output.

What building it taught me that using the polished tools doesn't

Having built the plumbing before using it as a service gives you a mental model that turns out to be practically useful.

When a modern AI tool produces a confidently wrong comment, I recognize the failure mode because I saw it constantly in 2022. It's not a bug in the tool, it's what happens when the model doesn't have the context it needs to be accurate. The fix is the same now as it was then: give it more relevant context and ask again.

When I tune prompts in a custom Claude API pipeline (I've built a few for automating Jira and Bitbucket workflows), adjusting temperature and thinking about how I'm framing the request feels natural because I spent time doing it manually in 2022. The concepts weren't new to me by the time they became mainstream.

And knowing what was hard three years ago makes it easier to appreciate what's changed. The things that felt like fundamental limitations at the time (no context, no inline integration, completion-style prompting) turned out to be solvable problems. That history makes me fairly optimistic about the things that still feel like limitations now.