The world is changing, rapidly. Today we live in a world where processes drive head count and productivity output. As we figure out more and more how to apply AI into daily workflows, I’ve come to the conclusion that (at least right now) use of AI provides amplification. I.e. it elevates strong performers and quickly reveals gaps in discipline or capability.
Personally, I am an extremely highly driven individual contributor. Probably too much so at times that I see I have 15 minutes in the day and might try to min/max and squeeze in 1 final enhancement. For someone like me, AI makes this sustainably possible. Workplaces are essentially begging their employees to start using AI in their day-to-day work. It’s incredible to me how much entropy is involved in simply trying to make any sort of behavioral change at large scale. It comes with extreme friction to get ICs to change their work approach, but for those who can find comfort in discomfort, the door is wide open to increase your output by 100-300%.
I wanted to put together a unified post sharing what I’ve found the following helpful for rooting myself in the right mindset and use of the right techniques to yield consistent success:
- An absolute must for me was creating a spec.md and implementation_plan.md files
- Further, these both must be referenced in claude.md or agents.md files. Add basic instructions to read spec.md and implementation_plan.md.
- Why? This allows you to maximize context availability. Without this, every time you start a new chat your harness has to work to create context. It’s about getting the right information to the model with the least amount of tokens.
- Spec.md – A description of your project. Primary use cases, features, architecture stacks, etc.
- Implementation_plan.md – A phase based checklist of roadmap items to deliver, broken down at necessary levels so that you can simply tell the model “jump” and it already knows how high.
- When set up properly, this allows me to simply say “Implement phase 3” and it knows exactly what I want to have happen.
- Harness / model variation – I frequently (as of 2/28/26) use Opus 4.6 within Claude Code for 2 primary purposes: planning and UI design. Outside of this, GPT 5.3 Codex in my opinion is the best model available today. It can make mistakes like any other model, but more often with this workflow I can yield consistent results within 1 or 2 prompts.
- Stop telling your model to “make no mistakes” – This is a sensationalism, but a very important point. I believe the sweet spot for how much you should try to achieve in 1 prompt is quickly moving it’s goal post due to both model advancements as well as prompting techniques.
- It’s clear, if our goal is pure agentic engineering, we don’t have to do things line by line or even necessarily function by function. It remains to be seen what is too big of a task at once, personally I’ve found I can typically address 1 phase at a time safely.
- Sandbox your risk – The bigger the project, the more likely it is to contain additional risk. You should be able to explain (or know where to guide troubleshooting) the structure of the code. Additionally, while code may be able to be generated, that doesn’t necessarily mean it utilizes your preferred APIs or dependencies.
- The obvious answer here is to include your guidelines in your plan, but in practice I’ve found that declarative prompting, even when planning, often skips this step. If you can make it front of mind, it will be easier to address the downstream risk naturally.
- Further, depending on your project you may need to apply extra guardrails. For example if you are building something that must integrate with a 3rd party product/API, you need to establish a realistic test plan for containing risk. Most often this would be use of a dev or QA environment, but if this isn’t possible you must be especially cautious with POST/PATCH/DELETE calls.
- I’ve found that if the project is generally self-contained that I am much more willing to take bigger risks.
- As part of each phase in your plan, build unit tests along the way.
- This is a big one which is why I’m calling it out standalone. It may seem like, “OK Nate, and water is wet” but really this is such an important thing to have happen because it serves 2 purposes:
- First, it ensures each phase succeeds rather than guessing.
- Second, building these unit tests during initial development is SO MUCH easier for the model than jumping into a brownfield project and asking “look over the entire project and create unit tests for everything”. The model simply does not seem to have enough inference to be able to handle such a task as a zero shot.
- This is a big one which is why I’m calling it out standalone. It may seem like, “OK Nate, and water is wet” but really this is such an important thing to have happen because it serves 2 purposes:
- Understand all the systems involved – I mean everything: the code, the architecture, the infrastructure, servers, networks, firewalls, clients, operating systems, etc. Become a polymath.
- The more you understand about every system involved in project delivery, the more you’re going to be able to successfully communicate intent.
- I’ve worked as a lead IT Systems Engineer entering my 12th year now, and 1 thing I’ve consistently observed is that SWEs may understand their code or their immediate stack, but rarely do I see client side OS and network firewall competency.
- I don’t mean for this to sound critical, but I remember vividly observing SWEs get a machine hot swap and be completely lost trying to understand system environmental variables, proxy, and other basic configuration dependencies they have to be successful ICs. By being able to think in this way, you start thinking outside of why the code itself isn’t doing what you want and begin taking a more macro approach. “Is the output not as desired because of something in the code, or is there a system level blocker?”
While these things have been yielding success for me thus far, it remains to be seen what the point of diminishing returns is. When does the repo get too large to be able to manage as a single person utilizing this toolset and technique style? I’m not sure, but it appears that continuing to break things down without giving the model too much agency may be a sustainable approach.
Is this the most optimal approach anyone is taking? I doubt it, but simultaneously there are still so many people just beginning to get into even light use of AI. I fully expect that with the innovation velocity of UI/UX guardrails and harness advancements that not only will we continue to find more optimal ways of working, but also that existing techniques will eventually integrate more natively rather than requiring industrial knowledge to apply.