Software engineering depends on judgment.
That includes where a boundary should sit, what a component should refuse to know about, what the tests are really proving, and whether a change is making the system clearer or more fragile.
Agents change the shape of that work.
They can generate useful material faster than a person can properly absorb it. Code, explanations, decompositions, refactors, alternative approaches, test structures, edge cases, and follow-up improvements can arrive in one continuous stream. Much of it can be locally plausible. That is what makes the tool powerful, but also what makes it risky.
The main risk is cognitive saturation.
When the session fills with more structure than the engineer can synthesize, it becomes harder to hold the whole picture in mind. Individual components may still look sensible. The problem is that understanding gets replaced by navigation. We move through the material, but we do not fully integrate it. Critical thinking becomes more difficult precisely when the amount of plausible output is increasing.
That state is easy to misread as productivity. The session feels rich. There are many options. The work appears to be moving quickly. We can quietly shift from author of decisions to manager of generated motion.
When output outruns understanding
Many agent workflows optimize for continuity of output. They do far less to protect continuity of understanding.
An agent can keep expanding the session almost indefinitely. There is always another abstraction to propose, another helper to extract, another test to add, another review comment to address, another security improvement to consider. Each step can be reasonable on its own. The session can still outrun our ability to compress the problem into a coherent mental model.
Once that happens, quality gets harder to judge. Local correctness is still visible. Individual parts are still reviewable. The harder questions start to slip: is the decomposition right, are the interfaces stable, is the implementation making future changes easier or harder, and did complexity arrive without a clear decision to accept it?
This is why agent-heavy sessions can create a kind of over-excitement. There is always more useful-looking material available. Stopping to think feels like interrupting progress. If the pause disappears, judgment weakens. The tool starts to fill the space that should belong to synthesis.
What the workflow has to protect
Agents should increase leverage without exceeding the engineer’s ability to understand the system.
The engineer needs to be able to:
- Hold the whole change in mind.
- Understand why each component exists.
- Challenge decomposition before it becomes load-bearing.
- Review the work from different angles instead of accepting a single continuous stream.
If a workflow produces more material and weakens those capabilities, it is shifting effort from thinking to supervision.
Mob programming as a model
Some problems benefit from concentrated attention from several angles at once. Mob programming does that by focusing the team on one area of the system, with implementation, challenge, and review happening around the same piece of work. In doing so it prioritizes quality over quantity, reducing team velocity to a single task at any given moment. Lack of parallelism becomes the feature that enables deeper interactions in the team.
This model transfers well to agent work. It separates modes of thinking that would otherwise collapse into one fast stream.
A useful starting point is an engineer-writer-reviewer loop.
- Writer: produces the change inside agreed boundaries.
- Reviewer: checks clarity, boundary integrity, and test coverage. Looks for what the writer normalized.
The engineer goes as deep as useful with the writer. They are both co-creating the datatypes, the algorithms, the boundaries between modules or services. At some point the route is exhausted, the return on more prompting drops, or the engineer starts losing sharpness. That is the point to bring in the reviewer.
The reviewer objective is to challenge the output generated so far, giving further information for the engineer to continue driving the discussion with the writer. Normally the engineer doesn’t take the feedback at face value, adding nuance and missing context, collecting ammunition to improve the work with the writer.
If the same agent is asked to implement, explain, review, and defend the result in one continuous interaction, the process becomes self-confirming and accumulates too much context too quickly. An engineer-writer-reviewer loop already creates useful scrutiny while helping the engineer to remain focused.
The engineer-writer-reviewer loop can be applied to different scenarios: architecture design, implementation work, security analysis, or compliance evaluation. Sequence matters. It is the constraint that allows the engineer to remain in control, setting the direction.
What changes
The process becomes more legible.
The first implementation pass can still be fast. Review is easier because intent was made explicit earlier. Architectural problems appear sooner, before they become load-bearing. Security questions are raised while the change is still easy to reshape. The work stays inside a scale we can understand.
The work also feels different. It feels more like engineering and less like supervision. When a session leaves the feeling of navigated generated material, the balance is probably wrong. When it leaves with a clearer model of the system, cleaner boundaries, and fewer surprises in review, the workflow is doing its job.
The question to keep coming back to is simple: did this improve our understanding and judgment, or did it mainly increase the amount of output to manage?