Claude Is the Mechanic. You're the Owner of the Car — and the Throttle Has Moved.

A quick note before we start. This is a question I've been sitting with since CloneForce, where my team and I wrestled with how AI actually changes the whole development process — not just the tooling — but never fully landed on what the new version should look like. I have an interview coming up for an AI-for-development role, which has accelerated my thinking. This post is me trying to answer honestly, with what I've seen and where I think this is going. Some of it is conviction. Some of it is speculation I'd want to test in practice. I'll flag the difference.

Hours to Minutes

Years before I touched an AI coding tool, I was at the Washington Post fighting with Jenkins.

Our deployment pipeline took almost an hour. Long builds, flaky jobs, a queue that backed up whenever three people merged in the same afternoon. And sitting on top of that, we ran Git Flow across three environments — long-lived feature and release branches, constant backfilling of changes across them when something shipped out of order, merge conflicts that showed up days after the work was done. Every release was its own coordination exercise. Every developer on the team dreaded shipping. You'd kick off a deploy, then go make coffee, then go to a meeting, then come back and find out something had broken halfway through and you'd have to start over. Everyone knew going in that the deploy wasn't going to be easy — and it almost never was. Once in a while you'd get a clean one. But with a legacy codebase and a backlog of features to push out, rough was the default. You braced for it every time.

One of the projects I took on was: make this suck less.

It turned into a rewrite of the whole delivery pipeline. None of it was quick or easy. Months of planning. Careful staging of each change. Guardrails everywhere to make sure we didn't break anything while the old system was still running and engineers still had to ship every day. But once we were through it: Jenkins out, GitHub Actions in. Long-lived branches out, trunk-based development in. Monolithic builds out, parallelized stages in. Flaky Cypress tests out, Playwright in. By the time we were done, a deploy that had taken an hour was taking ten minutes. And the feeling in the team shifted almost overnight. Deploys stopped being events. Nobody dreaded them. You'd ship a change, go back to work, and the feature would be live before you thought about it again.

That experience cemented something for me that I've carried into every job since.

Developer happiness drives good work. Not comfort. Not coddling. Friction-removal. When an engineer isn't fighting the deployment system, or the review queue, or the flaky pipeline, their mind is free to do the work that actually matters. And when engineers do the work they care about, the software gets better. That causation runs in one direction and I've never seen it reverse.

That conviction isn't just mine. Nicole Forsgren, Jez Humble, and Gene Kim spent four years and 23,000 data points rigorously proving it in Accelerate. Their research showed what most senior engineers already suspect — that employee satisfaction, burnout, and deployment pain all move together, and they move the business with them. One Microsoft case study they cite is striking: DevOps adoption lifted developer satisfaction from 38% to 75%, and the organizational performance numbers moved in lockstep. Happy engineers ship better software. Burned-out ones don't. The research was done. The conclusion was clear.

That's the lens I've been using to watch what's happening with AI and software development. And what I'm seeing is a weird, uncomfortable split.

Adoption Up, Joy Down

The Stack Overflow 2025 Developer Survey came out in December and the numbers tell a story that nobody in the AI hype cycle wants to tell.

AI adoption among developers went from 76% in 2024 to 84% in 2025. Almost everyone uses it. Trust, though, fell from 40% to 29%. Favorability dropped from 72% to 60%. The single most common complaint, cited by 45% of developers, was "AI solutions that are almost right, but not quite."

Adoption up. Joy down.

That's a bad sign. That's the shape of a bad marriage.

Zoom out and the picture is this: we've added an incredible new capability to our industry and made a lot of engineers less happy in the process. That would have looked impossible a year ago. We assumed speed would translate to satisfaction. But speed at the cost of confidence — at the cost of not really understanding what just shipped, at the cost of always reviewing code you didn't write and can't quite trust — that's not happiness. That's a new kind of treadmill.

And if developer happiness drives good work, we have a problem.

I get why it feels this way. I love coding. I mean the actual craft of it — knowing patterns, understanding why one structure scales and another doesn't, reading someone's code and recognizing the shape of their thinking. That was one of the joys of the job. Some of that is genuinely going away, and I don't want to pretend otherwise. If you're grieving the loss of the craft, you're not wrong to grieve. It was real. It mattered.

But I don't think the grief is the whole story. I think it's part of the story, and the other part — the part nobody is writing about enough — is that there's a different joy waiting on the other side of this transition, and it's bigger than the one we're losing. We just haven't organized ourselves to feel it yet.

I Was There Too

For a long time, I was in the exact same boat as the developers in that survey.

My setup was GitHub Copilot on Sonnet. That pair was decent. Sonnet could write code, and it was faster than typing everything myself — but it made a lot of small mistakes. It left loose ends. It added code that wasn't necessary. And it generated confidently, which made the mistakes harder to catch, because the output looked right. I'd take a Copilot suggestion, build on top of it, and three hours later I'd be untangling something subtle the model had gotten wrong with full conviction.

The result was exactly what the Stack Overflow survey captures. Faster, but less confident. Shipping more, trusting less. That "almost right, but not quite" feeling was my daily experience.

But I knew it was going to change. I'd been using these tools since the GitHub Copilot beta — early, through every version, every model, every incremental jump. Each month shipped something noticeably better than the month before. Anyone who uses AI every day knows exactly what I mean. The curve was real. The current frustration was a point on it, not the endpoint. I just had to keep pushing to see where it went.

Then I joined CloneForce.

Two things shifted at once. First, we were allowed — encouraged — to push the tools as hard as we could. No monitoring-and-restricting-AI-usage dance. We were actively trying to generate full features with minimal human intervention and see what held up. Second, we moved up the stack: Claude Code on a Max plan, with Opus on top. And as we worked, we started landing on spec-driven and harness engineering. We were learning and researching while we built — looking at how the Anthropic team did it, how others in the field were doing it. It was confirmation, not instruction. We kept getting burned without these disciplines. The research sharpened what the burns were already teaching us.

The numbers, roughly. In the Copilot-plus-Sonnet days, I'd get maybe 60% working code out of a generation — useful, but still requiring significant cleanup and worry. After we moved to Opus plus Claude Code plus real spec-and-harness discipline, we were landing at 80 to 90%. Code that worked, but wasn't always bug-free. The last stretch came from layering in automated testing and the Ralph Wiggum loop — iterating the agent until its own output passed the tests we'd written. That's what got us to 100%. All of it fitting into a day or two of work. Things that used to take weeks now took days.

An update to the Sonnet-vs-Opus framing, since I've worked with this longer: it's not that one model is "better" and the other is "worse." They have different jobs. My workflow today is Opus for the reasoning — the plan, the architectural calls, the design tradeoffs — and Sonnet for the mechanical execution of that plan once it's spelled out. Sonnet wasn't the problem in the old setup; the lack of structure was. Any model without a spec and a harness was going to produce that "almost right but not quite" feeling. With structure in place, Sonnet earns its keep on exactly the work it's good at: doing the thing the plan said to do, fast and cheap. Use the expensive model where the thinking happens; use the faster model where the typing happens. The structure is what makes that division of labor work.

And that speed gain wasn't just better models. Every hard week we'd been through got captured — into the larger spec, the patterns, the constraints, the failure modes we'd hit head-on. Each new feature we built inherited everything the previous ones had taught us. The spec compounded. That's why weeks became days.

The gap between 60% and 100% is the difference between shipping with dread and shipping with confidence.

Somewhere in that transition, I stopped looking at the code. Not because I wasn't being careful. Because the system around me had gotten careful enough that I could trust it.

Most developers are still in the first setup. Copilot, no spec, no harness — and a single model carrying the whole workflow whether or not it's the right model for the task at hand. That's the setup the Stack Overflow survey is measuring. I don't blame anyone for how that feels, because I felt it too. But it isn't the ceiling of what AI development can be. It's a particular floor — a specific combination of tool, model, and workflow that produces a specific kind of unhappiness.

Move up the stack and the experience changes completely. Most developers don't know that path exists yet. When they do, the data will flip.

The Janitor Work Didn't Just Leave

Here's what my actual day looks like now, in 2026.

I don't read documentation anymore. When I need to know something about an API, a library, a framework — I ask Claude. The docs still exist, somewhere in a browser tab I never open. Claude is reading them. I'm asking conversational questions and getting answers in the flow of what I was already doing. That friction — context-switching out of code to go search a docs site, scrolling through reference pages until I find the function signature I needed — has simply disappeared for me. I don't miss it.

I don't spelunk logs. At CloneForce, I watched Claude pull a 24-hour window of service logs, correlate them against the code paths that could have produced them, cross-reference with the backend and frontend sides, and hand me a report. Something that used to be an afternoon of grep scripts and eye-strain became a few minutes of AI work plus five minutes of me reading the output. And the output was better than what I would have produced — because Claude can hold the code and the logs in mind simultaneously in a way a human has to do in two windows and a whiteboard.

I don't write tests from scratch. Claude drafts them. I verify what they're testing. Often I fix the test — because what needs human judgment is whether the test is checking the thing that matters, not whether the mechanics of the test are right.

I don't read most of the code I ship. This is the one that took me the longest to say out loud. At CloneForce early on, when we didn't trust AI yet, we reviewed every line. That became impossible fast. Claude was writing whole codebases. Reviewing them line-by-line was like trying to read a river. I told my manager at one point, "I don't think we're going to be looking at the code anymore." At the time I wasn't sure if I was serious. I was. Two or three months later, I wasn't looking at it anymore. That's how fast it moved. I tell Claude what I want, I test the outcome, I refine when it's wrong. The code is the output of the system, not the artifact I tend.

Here's the thing that surprises people when I describe this: the janitor work didn't just leave my hands. Some of it stopped existing.

At CloneForce, I built an internal agent that ran every morning. It audited the previous 24 hours of production signal across our stack, identified what had broken and where, and drafted a suggested fix in whatever layer the bug lived in. Before it existed, the workflow was: a user hits a bug, they file a ticket, the ticket sits in a queue, a developer picks it up, investigates, fixes, ships. After it existed, most bugs got fixed before a ticket was ever filed.

The reactive-debugging ritual didn't move. It went away. The tickets didn't queue up faster. They stopped being filed.

That's the pattern I keep watching for. Not "AI helps me do X faster" — "what used to be X has stopped existing as a unit of work."

Claude Is the Mechanic

I want to drop the metaphor that's been shaping my thinking on this, and I want to be honest about why it's a little unusual coming from me.

Earlier in my career, I was a mechanic.

Not software. Auto body. I worked with my hands, on cars, for real customers. I know what it's like to be the person under the hood — covered in grease, knuckles skinned, actually seeing the parts nobody else sees. Later I was a machinist, a model maker, a mechanical engineer. I spent a lot of years as the person who made things work, physically, before I ever wrote a line of code.

Here's what I keep noticing about the way I work with Claude now. I'm not the mechanic anymore. I'm the owner of the car.

I drive the thing every day. When it runs well, I don't think about it. When something's off, I pay attention. When something breaks, I bring it somewhere to get fixed — and I trust that the person who opens the hood knows what they're doing, even though I used to. Because the thing I care about as the owner is not "is this valve machined to the right tolerance." The thing I care about is "does the car take me where I'm going without falling apart."

Claude is the mechanic. I'm the owner.

This is a real shift, not a cute metaphor. The identity of the software developer is changing from the person who writes code to the person who decides what code should exist and verifies that it does what it should. I don't think that's a downgrade. But it's a downgrade of the job you thought you were signing up for. If you miss being the mechanic, that's a real feeling.

Here's the honest risk in the analogy, though, and I take it seriously because it's the one that can strand you: what if the mechanic does bad work and you don't know, because you don't look under the hood?

Addy Osmani calls this comprehension debt. It's the growing gap between how much code exists in your system and how much of it any human on your team actually understands. AI generates code faster than humans can absorb it. PR volume climbs. Review capacity stays flat. Even inside Shopify — a company enthusiastically all-in on AI development — their VP of Engineering Farhan Thawar has named it the number-one long-term risk. Anthropic's own internal research showed engineers who used AI to learn a new library scored 17% lower on comprehension than engineers who learned it the hard way.

The risk is real. If you don't look under the hood and the mechanic does bad work, you crash.

The answer is not "go back to looking at every line of code." That ship has sailed and it was never going to survive at AI-generation speed anyway. The answer is closer to what actual car owners do:

You build the mechanic into the harness.

In car terms: you have a check-engine light. You have a maintenance schedule. You have an annual inspection. You have a reputable shop. You don't personally inspect the alternator, but the system around you does. The system catches bad work before it strands you.

In software terms: on every push, a pipeline runs. Static analysis. Security scans. Dead code detection — tools like knip and ts-prune flag files and exports nothing imports. AI code review — Greptile, CodeRabbit — catches bugs and drift. Test coverage stays above a threshold. The harness is the mechanic you trust, because you built the mechanic's boss.

If I could wave my hand and will one tool into existence, it's this: a janitor agent that audits a codebase not for style but for intent. Does this file still match the spec that created it? Is this function still called? Does this comment still describe what the code does? Someone is going to build this, and I hope it's soon.

Get the harness right and comprehension debt becomes something you manage the way you manage technical debt today — seriously, persistently, but not as a reason to stop shipping. Get it wrong and comprehension debt is the thing that eats the industry.

The Bigger Room

The grief is real. The craft is shifting. But there's a bigger room on the other side of the doorway.

The developers I see who are lit up right now aren't the ones who love every line of code. They're the ones who love what code can now do. I remember turning to a teammate at CloneForce and saying something like, "guess what we get to do now." Because the question was no longer how to make the typing go faster. It was what we could build now that the typing wasn't the bottleneck. That's a better question. That's the question people got into software to answer in the first place.

The developers who are miserable — who show up in that Stack Overflow survey — are, I think, standing in the doorway still holding the old tools. I'm not unsympathetic to that. The old tools were good. I used them too. But the shape of the work is changing from writing code to deciding what code should exist and knowing it works. That's not a smaller craft. It might be a bigger one.

The throttle has moved. Build the mechanic into the harness. Keep the philosophy that got us here: remove the friction, and the developer gets to do the work that actually matters.

What I haven't worked out in this post is what the new development process should actually look like — the rituals, the reviews, the full flow from spec to deploy. That's the next one, written under an old carpenter's rule. This one was about the shape of the feeling. The next one is about the shape of the work.

— Bill