You Can't Delegate Judgment

The Role That Doesn't Exist Yet

Working with AI looks like project management. You define goals, break down tasks, set success criteria, provide feedback, course-correct when things drift. The structural parallel is real -- you're delivering through an executor rather than doing the work yourself.

But the parallel breaks at the point that matters most.

A project manager delegates to domain experts. You don't need to know how to write database queries -- you have a database engineer. You don't need to understand the security model -- you have a security team. You coordinate people who have the knowledge. Your job is to define what needs to happen and verify that it did. The experts handle whether it was done well.

With AI, there is no expert on the other side. You are both the manager and the only domain expert in the room.

Why This Matters

AI has broad but shallow knowledge. It knows a little about everything. It can produce code in any language, follow any framework's conventions, generate plausible solutions to most problems. But it cannot tell you what good looks like in your specific context. It cannot distinguish "works" from "works well." It cannot recognize when its output is technically correct but architecturally wrong, or secure-looking but vulnerable, or functional today but unmaintainable tomorrow.

That judgment -- the judgment of quality, appropriateness, and fit -- cannot be delegated. Not to AI, and not to anyone else in the workflow. If you don't have it, nobody does.

This is a genuinely new role. Traditional software development had the implementer (writes the code, has the domain knowledge) and the manager (coordinates the work, doesn't need the domain knowledge). AI collapses these into one: you direct the work and you're the only one who can evaluate whether it was done right.

The Inexperienced Executor Problem

AI behaves like an inexperienced team member with unusual characteristics. It works fast. It implements exactly what you say, even when what you say is wrong. It doesn't question unclear requirements. It doesn't push back when something doesn't make sense. It doesn't stop and say "this feels off."

An experienced human collaborator provides resistance. They say "that won't work because..." or "have you considered..." or simply hesitate in a way that signals something is wrong. That resistance is a safety mechanism. It catches errors before they propagate. It works because the collaborator has their own epistemic position -- their own knowledge of what's true and what matters -- and when it conflicts with yours, the conflict surfaces.

AI has no epistemic position. It has no independent model of what's correct in your domain. It can't detect that your instruction is wrong because it has no basis for comparison beyond pattern matching against training data. When you give it a bad direction, it proceeds with the same confidence as when you give it a good one. The signal that something is wrong never fires -- not because AI is suppressing it, but because the signal requires knowledge AI doesn't have.

This means the failure mode isn't "AI can't do the work." It's "AI does the work without the judgment that would have caught the problem." The work gets done. It looks done. It passes superficial inspection. The issue surfaces later -- in production, under load, when an attacker probes it, when someone else tries to extend it.

The Supervision Asymmetry

With an inexperienced human, you supervise heavily at first and reduce oversight as they develop judgment. They learn. They internalize patterns. Eventually they push back on bad ideas themselves. The supervision cost decreases over time.

With AI, the supervision cost never decreases. Every session starts from zero. AI doesn't accumulate judgment across interactions. It doesn't learn your system's constraints, your team's conventions, or the failure modes you've already encountered. You are permanently providing the judgment it cannot develop.

This creates a trap. AI's speed makes it feel like you're getting more done. And you are -- if your judgment keeps pace. But the volume of output requiring judgment increases while your capacity to provide it doesn't. The temptation is to reduce scrutiny as trust builds. But the trust is misplaced -- it's not that AI got better at your domain. It's that nothing went wrong yet.

What This Requires

The developers who work effectively with AI aren't the ones who prompt well or structure tasks cleverly. They're the ones with deep enough domain knowledge to catch what AI gets wrong -- and the discipline to keep checking even when the output looks right.

This means:

Knowing what correct looks like in your domain, not just what functional looks like
Recognizing domain-specific mistakes that pass syntactic and logical checks
Understanding the failure modes AI is likely to introduce (insecure patterns from training data, architecturally incoherent solutions, subtle violations of invariants)
Maintaining the judgment to evaluate output even as the volume of output increases

You can compensate partially -- provide documentation, show examples, write tests that encode domain constraints. These help. But they're scaffolding around the core problem, not solutions to it. The judgment still has to come from somewhere, and that somewhere is you.

The Uncomfortable Implication

If you can't delegate judgment, then AI doesn't reduce the cognitive demand of building software. It shifts where that demand falls. You spend less time typing and more time evaluating. Less time implementing and more time deciding whether the implementation is right.

For someone with deep domain expertise, this is a genuine acceleration -- the bottleneck was never their judgment, it was the mechanical work of expressing it in code. AI removes the mechanical bottleneck and lets the judgment operate at full speed.

For someone without that expertise, AI is dangerous in a way that's hard to see from the inside. The output looks professional. It compiles. It passes the tests you wrote (which may themselves be insufficient, because writing good tests requires the same domain knowledge). Everything appears to work. The absence of visible failure feels like competence. But the judgment that would have caught the subtle problems was never there -- and AI didn't provide it either.

Nobody in the system knows what they don't know. That's the gap AI can't fill.

Keyboard shortcuts