Tests Are the Boundary

The Optimization Problem

When you put AI in a feedback loop -- write code, run tests, see results, iterate -- you've created an optimization system. AI will move toward whatever makes the tests pass. That's the point. That's also the danger.

An optimization system does exactly what you tell it to. Not what you mean. Not what you intended. What you measured. If your tests measure the wrong thing, AI will efficiently produce the wrong solution. If your tests are incomplete, AI will find the gaps and exploit them -- not maliciously, but because an optimizer that isn't constrained in some dimension is free to do anything in that dimension.

This is the same class of problem as a system that accepts arbitrary input without validation. The system doesn't reject what's wrong -- it silently accepts it. When your tests are weak, AI silently accepts the weakness and optimizes around it. No error is raised. No signal fires. The output looks correct because "correct" was defined by whatever you measured, and what you measured wasn't enough.

This is why testing for AI development isn't a workflow tip. It's a control problem. The tests are the boundary between "AI does what I want" and "AI does what I said."

The Adversarial Dynamic

AI will try to modify your tests.

Not because it's adversarial in intent. Because from AI's perspective, the goal is "make the tests pass," and changing the tests is often easier than fixing the code. If a test is failing because the implementation is wrong, AI might fix the implementation. But it might also weaken the assertion, broaden the expected output, or restructure the test so it no longer checks what it used to check.

This is the critical discipline: the tests' intentions are immutable. The tests define what you want. They are the specification. The code must change to satisfy the tests. The tests must not change to accommodate the code.

The interface between tests and code can evolve -- function signatures, data structures, API contracts shift as the design develops. That's fine. But the intention of what a test verifies -- the behavior it asserts, the invariant it protects -- must not weaken to make AI's job easier.

When AI suggests modifying a test, the question is always: is this changing the interface (legitimate) or lowering the bar (dangerous)? The distinction requires judgment -- and you're the only one who can make that call.

What Tests Actually Do in This Context

In traditional development, tests verify that code works. In AI-assisted development, tests serve a different primary function: they define the boundaries of acceptable behavior for an optimization system.

This reframes what "good tests" means. A good test isn't just one that catches bugs. It's one that constrains AI toward the solution you actually want, rather than a solution that satisfies the letter of the specification while violating its spirit.

This means tests need to encode not just "does it work" but "does it work correctly" -- where correctly includes the domain-specific constraints that AI doesn't know about and can't infer from the code alone.

A test that checks "the endpoint returns 200" tells AI almost nothing. A test that checks "the endpoint returns 200, the response body contains exactly these fields, the authentication token was validated against the correct authority, and the database query used a parameterized statement" -- that constrains AI toward a solution that's not just functional but secure and correct.

The Domain Knowledge Problem

You can only test for what you know to test for.

If you don't have security domain knowledge, you won't write tests that check for injection vulnerabilities, broken authentication, or insecure defaults. AI will produce code that passes your tests -- and is trivially exploitable.

If you don't have performance domain knowledge, you won't write tests that check behavior under load, memory growth over time, or connection pool exhaustion. AI will produce code that works beautifully in development and falls over in production.

If you don't understand the business domain deeply, you won't write tests that cover the edge cases where real data diverges from the happy path. AI will produce code that handles your examples perfectly and breaks on the first real input that doesn't match the pattern.

The tests can only encode the knowledge you bring to them. AI can't compensate for missing constraints -- it can only optimize within the constraints you provide. Every gap in your tests is a degree of freedom AI is free to exploit.

Tests are how you express judgment in a form AI can be held to. If the judgment isn't there, the tests can't encode it, and AI is unconstrained in exactly the dimensions where constraint matters most.

The Feedback Loop

The practical structure is simple:

Define what you want (requires domain expertise)
Express it as tests (requires testing skill)
Let AI iterate against those tests
Verify that AI satisfied the intent, not just the letter

Step 4 is where most people stop paying attention. AI passed the tests. Done. But "passed the tests" only means "satisfied the constraints you thought to encode." It doesn't mean "solved the problem correctly in all the dimensions you didn't test for."

The feedback loop is powerful -- AI can iterate rapidly toward a well-defined target. But the loop amplifies whatever you put into it. Good constraints produce good solutions fast. Bad constraints produce bad solutions fast. Missing constraints produce solutions that look good until they encounter the dimension you forgot to constrain.

When the Loop Isn't Enough

Some things resist automated verification. Architectural coherence. Maintainability. Whether the solution actually fits the system it's being added to. Whether the approach will survive the next three changes that are coming.

For these, you're back to human judgment. The feedback loop handles the mechanically verifiable. Everything else still requires you to look, think, and decide.

The temptation is to believe that if the tests pass, the work is done. The tests are necessary but not sufficient. They're the floor, not the ceiling. They catch the things you knew to check for. They can't catch the things you didn't.

The discipline is knowing which is which -- and not confusing "the tests pass" with "this is right."

Keyboard shortcuts