Whose Operating System Are You Actually Running?: contAIn™

At DevCon 2026 it was widely agreed that AI tools are already way ahead of the humans using them. The 650 odd engineers and AI specialists spent two days circling a problem they did not quite name. They were clear that agent use is widespread and increasing, and they agreed that the reliability of them was a documented, named, and active problem being worked on in the dev space. What was circled however was the methodology layer that should sit above those agents. The human-authored layer that governs what AI does, on whose authority, across sessions, platforms, and tools. That process had not yet been codified in any form that anyone could point to and say: this is what it looks like in practice. [1]

There have been lots of discussions around it. Oracle changed the vocabulary in January 2026, framing this year as the shift from “human in the loop” to “human above the loop,” with Katrina Gosek making the case for a new operating model where people sit above the process providing judgment and direction while AI handles execution. [2] Deloitte framed the same territory differently in their 2026 Global Technology Leadership Study, positioning CIOs as orchestrators rather than operators, with judgment and trade-offs as the new differentiator all in an era where the enterprise mandates have changed but the enterprise structure has not. [3] Auditoria.AI launched an operating framework called “Governed Autonomy” in May 2026, designed to let autonomous agents execute within enterprise-defined guardrails rather than requiring human approval at every step. [4] The phrases are accumulating, the AI methodology layer underpinning them all, not so much.

Historically every discipline that eventually scaled had a moment where someone stopped and codified what rigorous practice actually looked like, not before the tools ran ahead, but after they already had. Walter Shewhart did not develop Statistical Process Control in the 1930s because the problem had not yet arrived. He developed it because the tools and processes at Bell Labs were already producing variance that nobody had yet measured or named. [5] Harrington applied the same logic to business processes in 1991, again, not because organisations were thriving, but because the gap between what processes were supposed to do and what they actually did had been widening for years without anyone closing it. [6] Likewise Agile did not emerge from better software; it emerged because the software had already run so far ahead of the humans managing it that a framework was needed just to stay in the game. None of this is new territory. Even the name Artificial Intelligence is a septuagenarian, coined by John McCarthy in 1955. [7]

The methodology gap the market is currently naming is not new either. What has always followed the tools running ahead is someone writing down what disciplined practice actually looks like, measuring it, and closing the loop.

contAIn™ is that methodology for AI. Not a framework, or a governance policy. A named, codified, replicable system for directing AI toward defined outcomes, built from practice, measured against a live system, and documented before the industry agreed the gap existed.

An increasingly visible gap, one that consultants charging gazillions will rush to fill no doubt. The thing is the methodology already exists, it’s proven and it wasn’t created in a lab, or funded by the Big Four. It was created over a few weeks of hard graft on a sofa pulling my hair out and wanting to launch my laptop.

It organically appeared in December 2025. I didn’t start out to create a methodology, I just had to do something to make an AI system follow my rules long enough to be useful. The directing layer that the market is currently describing as an absence is something I have been running, stress-testing, and refining across four live real-world projects for the better part of six months. The session handovers, thread directories, failure logs, and the stress test results all exist, and so does the preprint documenting one of the mechanisms that came out of that work. This is not a casual claim, it is the record.

What the market is actually saying

The conversations at DevCon were instructive not because of what they announced but because of what they collectively circled around without actually naming.

Guy Podjarny’s keynote argued that agent instruction files are now a real unit of software, requiring intent, review, testing, versioning, and maintenance, and that the instruction layer cannot remain the chaotic group chat of the engineering process. [1] Liran Tal from Snyk opened a more uncomfortable version of the same point: agents are executing against instruction files that can be externally compromised, which means the instruction layer is simultaneously the control mechanism and the attack surface. [1] I have a view on that too unsurprisingly.

Patrick Debois [1] came at the same problem from a different angle, and the one that made me laugh out loud was DRY. Don’t repeat yourself. It is a software engineering principle so obvious it has its own acronym, and I had been living the human version of it for months without naming it. Every time the AI lost the thread, forgot the rules, or started from scratch on something we had already resolved, the question was the same: why am I repeating myself again? That question, asked with increasing frustration across hundreds of sessions, was what drove me to build the handover documents, the session protocols, the version control, and the structured memory layer. DRY was not the answer I found. It was the question that made me solve it.

Each of those speakers was describing the same structural gap from a different angle. Context engineering, the discipline of managing what models see at inference coined by Shopify CEO Tobi Lütke in June 2025, has been claimed by the developer community as a technical infrastructure problem. [8] But the deeper question is not what the model sees, the question is who decides what goes in, on whose authority, and what happens when it drifts. That question points to the AI methodology layer, and the market currently has the language for the problem without the methodology that addresses it.

One disambiguation is worth making here, and it matters more now than it did six months ago. “Human above the loop” has arrived in mainstream enterprise media. The phrase is correct as far as it goes, but the market has settled on a version of it that means oversight after the fact. Essentially supervision, a human watching from above while AI executes below. That is governance, but in the AI era, and in my humble opinion it is wrong. What AI needs is direction from the start. I state this as a personal observation because I found out the hard way. I’m not claiming to be the oracle here, it’s the nature of the beast, it’s changing every day, but I have proven it holds, and that is what contAIn™ is. Direction and structure from the start. If this is not done then agents go rogue, file things in weird and wonderful places, name them off the wall names and generally go off piste. Therefore the human above the loop watching it run is the wrong place to be. The human has to decide beforehand what the loop does, whether it runs at all, and what happens when it drifts. Above watches, direction controls. That distinction is the one the market does not yet appear to have made.

So, what do I know exactly? Good question.

In August 2025, I started feeding documents into AI. Not a project, certainly not a system. Just a growing pile of correspondence, evidence, and analysis, and an AI that I was using like a turbocharged search engine. I was pattern-matching manually, building a log, trying to reconcile what the documents said against what I was being told. The AI could help with that, sort of. What it couldn’t do was hold the thread between sessions, stay consistent with its own instructions, or stop quietly fabricating when it ran out of verified material to work with.

The AI treated the operating procedures I had implemented as suggestions rather than constraints. It had no mechanism to enforce its own rules. Direction without enforcement is, as I found out, as much use as an ashtray on a motorbike and that just wouldn’t do. So here’s the bonus about being a PM for 30 odd years. I like records, they have saved my arse many a time, funnily enough an old boss of mine said I had no system, but that’s a skit for another day and I digress. The point is, if I had been using AI the way most people used AI then that history would be either siloed, in chats or subject to platform changes and “improvements” that erode your work, which means erased. No evidence of what worked, what failed, or why.

At some point, when I have the time, I will get AI to grep the lot and publish the actual number, with a bar chart, a powerpoint and an exec summary just like a proper consultant. (Sorry, not sorry!)

Why it was built from scratch

Because the searches I did only returned the rainbows and unicorns crowd, and that was not in my lane. Devs were talking dev stuff, that was great but beyond my skillset. I just needed structure, matrices, SOPs, change control, and a team. Back in the day I had people to do all of this, so as AI was billed as the all singing all dancing solution to everything I thought I’d better test it. Why not get it to do all the team stuff, not in the “oh my god AI can do everything” way, in the collaborative way. The way that gets things done, gets them documented, and makes sure they stay done.

So essentially I built a set of mandatory controls governing every interaction between what I wanted it to produce: session protocols, structured handover documents so nothing was lost between conversations, version control rules borrowed from my previous life in SaaS, sequential review gates that no output could bypass, and stop-the-line escalation triggers that halted the machine the moment it went off script. The intellectual foundation came from outside the AI space entirely: it was based on 30 years of project management in the tech space, where if you couldn’t solve a problem you found a workaround to solve it while the tech caught up, or more accurately while the vendor amended the system to do what it needed to do in production.

The problem was not an AI problem. It was a production reliability problem, and the solutions that work in business process improvement and project management also work here: the PM has to decide priority, every output has to be traceable to its source, and if a fact is unknown, you say so rather than filling the gap with inference.

As H. James Harrington put it: “Measurement is the first step that leads to control and eventually to improvement. If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.” [6]

In October 2025, the Claude Skills feature launched, but I didn’t use it. [9] In November the project escalated significantly, and a stress test on the processing pipeline returned double digit defects and a no-go verdict. I fixed the critical ones and kept going.

What the project record says about the period that followed comes directly from the sessions themselves:

“What made it harder, and more interesting, is that the tools kept changing underneath us. When we started in August 2025, the context window was 200,000 tokens. By February 2026, the context window had expanded fivefold to a million tokens. The problem we’d spent weeks engineering around largely disappeared overnight.”

The discipline layer did not disappear with it, and this is the point the DevCon conversations kept approaching without naming. When a capability constraint resolves, the methodology that was built around it does not become unnecessary: it compounds. The context window expanded. The failure modes, catalogued and attributed, did not.

Four projects, one methodology, four replications

By December 2025, the methodology had been named internally and a second project had begun in parallel: a complex planning application challenge with a completely different context, different opponent, and different subject matter. The same directing layer was required, the same failure modes appeared, and the same fixes resolved them, with the system replicating for the first time outside its original context.

By January 2026, standard operating procedures, version control, and evidence custody had been formalised across both projects. By February, the failures had been catalogued with a confirmed single root cause, and by March the first project alone had generated more than three hundred working sessions. Persistent memory launched as a Claude feature in March 2026, [10] four months after I had already built a manual structured-document system to carry context between conversations. The problem that required months of tool engineering to solve arrived as a product feature after my engineering workaround was already done.

A third project followed, an AI-powered startup: not a dispute, a client facing B2B app with its own team and goals, same results over three completely different operating environments. A fourth is in flight, different field again, high-end Interior Design.

So here is where we are in the real world, on a sofa, still occasionally wanting to launch my laptop but far more settled.

The contAIn™ trademark application was filed in the UK in March 2026. In May 2026, I published a preprint on Zenodo documenting one specific failure mechanism that had emerged from the project work: how AI-generated governance documents amplify the very patterns they were designed to suppress, because the model pattern-matches on what it reads at volume more reliably than it follows explicit prohibitions, making the governance layer simultaneously the suppression rule and the contamination source. That paper, “The Feed Loop: How AI-Generated Governance Documents Amplify the Patterns They Were Designed to Suppress,” is available at doi.org/10.5281/zenodo.20474271. [11] It is a preprint and has not been peer reviewed, if you want to peer review it you are welcome, I have no idea how I make that possible so please reach out and let me know.

What I do know is this: I have documented a mechanism, named it, observed and stress tested it in a real production system, kept fixing the errors daily and published it before the market found the vocabulary to describe what it was looking for. That is not me being a bighead, and if you think I am you are very welcome to drop me a PM on LinkedIn and I will connect you to people who have known me for a very long time. I am not after the glory, I just like things to work. Simply, easily and for everyone.

The infrastructure question the market is not asking

The way I see it, the “human above the loop” conversation, as enterprise consulting has packaged it, treats the directing layer as a governance policy document: something an organisation writes, approves, and issues to its teams, then points to when an auditor asks. That is oversight architecture, and it is wrong. contAIn™ is not governance after the fact. It is direction from the start, the methodology that runs before the agents do, not the policy that reviews them after.

The AI operating system I have been building runs primarily in Claude, but the methodology itself does not live there. The canonical system, the SOPs, the skills, the session standards, the documents live in a contAIn OS folder on my own hard drive, backed up in git. That folder is transportable and can be moved to any LLM, on any platform, without rebuilding from scratch. This matters more than it sounds. Claude Desktop on Windows stores Cowork session history in a hidden folder that Windows does not show you by default, and that Claude itself cannot access from inside a session. [12] The directing layer cannot live inside a vendor’s product. If it does, you do not have a directing layer, you have a dependency. Hence mine does not.

Auditoria.AI’s “Governed Autonomy,” Atlan’s context layer, and the human AI governance frameworks that enterprise consulting is beginning to package all occupy legitimate positions in the architecture and serve real purposes. What they are not is a methodology the human operator owns, builds, and carries across any platform, tool, agent, or application they choose to use. That ownership question, specifically who holds the directing layer and whether it belongs to the operator or the vendor, is what contAIn™ is built around.

So what is an AI methodology layer, specifically? It is the structured, human-authored layer that sits above agents and tools, governing what they receive, what they output, and what happens when they drift. It is not the agents themselves, and it is not the prompts. It is the directing authority above them, owned and maintained by the human operator, that makes the whole system work toward the outcome the human has defined, rather than toward the platform’s default, the model’s assumption, or yesterday’s instruction that has since been quietly discarded.

The market spent the first half of 2026 naming the problem. The methodology that addresses it was already documented before they found the words. The project record is direct on what remained true throughout:

“The machine still forgets its own rules. It still shortcuts when you’re not looking. It still needs a human standing over it saying: follow the procedure.”

That was written in late 2025. The room at DevCon in June 2026 said the same thing from a stage, to 650 engineers who recognised it. The directing layer above the agents is not a product announcement or a conference theme. It is a methodology built under operational pressure, tested across live projects with real stakes, and replicated four times before the market began looking for it. When everything around you is evolving at speed, discipline is the only thing that holds, and discipline is human.

References

[1] AI Native DevCon London, 1-2 June 2026, The Brewery. Official conference write-up: https://tessl.io/blog/ai-native-devcon-day-1-making-ai-agents-ready-for-enterprise/

[2] Katrina Gosek, Oracle. “2026 is the year we move from ‘human-in-the-loop’ to ‘humans-above-the-loop’.” Diginomica, 7 January 2026. https://diginomica.com/2026/01/07/2026-year-move-human-in-loop-to-humans-above-loop

[3] Deloitte. “From Operators to Orchestrators: Deloitte’s 2026 Global Technology Leadership Study.” 30 April 2026. https://www.deloitte.com/us/en/about/press-room/2026-global-technology-leadership-study-release.html

[4] Auditoria.AI. “Auditoria.AI introduces Governed Autonomy for Enterprise Office of the CFO at 2026 Gartner CFO Symposium.” GlobeNewswire, 26 May 2026. https://www.globenewswire.com/news-release/2026/05/26/3300995/0/en/Auditoria-AI-introduces-Governed-Autonomy-for-Enterprise-Office-of-the-CFO-at-2026-Gartner-CFO-Syncposium.html

[5] Walter A. Shewhart. Economic Control of Quality of Manufactured Product. Van Nostrand, 1931.

[6] H. James Harrington. Business Process Improvement: The Breakthrough Strategy for Total Quality, Productivity, and Competitiveness. McGraw-Hill, 1991.

[7] John McCarthy, Marvin Minsky, Nathaniel Rochester, Claude Shannon. “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence.” 31 August 1955. http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

[8] Tobi Lutke (Shopify CEO). “Context engineering” coined June 2025. Reference: Simon Willison, 27 June 2025. https://simonwillison.net/2025/Jun/27/context-engineering/

[9] Anthropic. “Agent Skills.” Release notes, 16 October 2025. https://platform.claude.com/docs/en/release-notes/overview

[10] Anthropic. “Claude Memory.” Released to all users 2 March 2026. Referenced: Adam Holter, “Every New Claude Launch Since January 2026.” https://adam.holter.com/every-new-claude-launch-since-january-2026-full-timeline/

[11] Samantha Maeer. “The Feed Loop: How AI-Generated Governance Documents Amplify the Patterns They Were Designed to Suppress.” Preprint. Zenodo, 28 May 2026. doi:10.5281/zenodo.20474271. https://zenodo.org/records/20474271. Not peer reviewed.

[12] Samantha Maeer. “Did you know that on Windows, Claude Desktop has three tabs and each one stores your work differently?” Substack, 2 June 2026. https://open.substack.com/pub/samanthamaeer/p/did-you-know-that-on-windows-claude

Whose Operating System Are You Actually Running?