Why I Believe in Agentic Recipes

I built my AI tooling first to help me work on Drupal projects and sites, and on the Claude Code plugins I develop. My wife uses the same tooling in her job now, which is the part that turned a personal hack into something I had to actually maintain.

I built it because two things were unreliable. The AI was not always following instructions in CLAUDE.md, and I was cutting steps. Anthropic's own docs are honest about this. They call CLAUDE.md instructions advisory, and hooks deterministic. That is the precise version of what I was feeling. Reading a rule and following a rule are not the same thing, for an AI or for a tired human. So I built tooling around the places where the gap mattered most, one piece at a time, because each piece came from something that had just broken.

Scope, eventually

Scope is the first step in my workflow now, but it was not the first step I added. It became the first step after I watched the AI solve a different goal than the one I had asked for, over-engineer changes that did not need engineering, and reach for the wrong abstraction often enough that I needed a step that just stopped the work and named what we were actually doing.

Scope is not necessarily specs. Specs are how. Scope is what we are even doing here, and what we are not doing, and what counts as done. Now it runs before anything else, and short alignment checks ride along through the later stages so the agent does not drift back into its own version of the goal.

I was surprised by how often the first sentence I wrote into scope made me realize I had been about to build the wrong thing.

Research, and the guides

Research came next. I started with local frameworks for the Drupal work I do a lot of. ECA module workflows. AI module pipelines. Each framework grew the way these things grow, full of decisions I had made and forgotten and made again differently. Eventually the frameworks got too big to share with my wife. She would not have read them, and I could not have pointed her at one piece without dragging in twenty others.

So I broke them apart. One decision per file. One question, one answer. Published online so we could both reach them from any project, and so the agent could fetch only what the current task needed. That is the atomic guides. Around 1,600 of them now, across 88 topics, with routing metadata that lets the agent decide which one to read for which question.

That worked, until I noticed the AI was missing pieces. Not in the small guides. In the structured ones, the ones with sequences and decisions. When you ask the AI to read a guide via WebFetch, the content gets summarized on the way in. Some steps survive. Some get described in vaguer language. Some get dropped. The agent does not know which is which.

So I forced curl. Bytes in, bytes read. No summarization layer between the agent and the content. The skill that fetches my guides is built around this rule, and three different places in its prose remind the agent why curl is mandatory. The rule is enforced by the skill, not by a hook, but the skill is invoked on every fetch, so in practice it holds.

Design, implementation, and a lot of review

After scope and research came design and implementation. The patterns here are well-trodden. Phase the work. Set acceptance criteria before writing. Test first when the test is cheaper than the code it is checking. Ask the human at each step that is hard to reverse. Nothing original. The win is consistency. The AI does not do this on its own unless I make the workflow visible, so I made it visible.

Then I added review. Seven validators, each running against the diff and writing an audit. TDD checks tests came first. SOLID and DRY catch the structural drift. Security scans for the mistake classes the AI is good at making and humans are good at missing. The guides validator checks the work against the atomic guides cited in design. Visual regression and visual parity catch the rendered-output drift the other gates would miss. I added them because passing tests and good code are not the same thing, and I have shipped enough clean-looking diffs with hidden problems to know the difference.

Opinions, and the playbook

Then I realized I needed opinions in a form the framework could carry. Some opinions are rules. Use drush commands, not raw PHP, when you need to do something to Drupal from the command line. Use Composer in the right order when you remove a module. Mobile-first breakpoints with sizes-based responsive images for content body, breakpoint-based for art-directed hero. Rules. One rule, applied across many contexts.

I called this layer the playbook, at the project level. Local rules always win on conflict. Published sets you can subscribe to for the patterns you keep using across projects. That worked. The agent stopped reaching for raw PHP when drush was the right tool, and stopped picking breakpoint-based responsive images for body content where sizes-based was the cleaner answer.

The wall

Working on the biggest tool I am still building, the one that takes a brand definition and walks all the way to an implemented app, I hit a wall I had to name.

Use drush is a playbook instruction. It is a rule about how to do a class of things in this project when I am using Drupal. That shape is fine. The playbook holds it well.

But how to wire responsive images is not a rule. It is directing the orchestration of a specific goal in Drupal with AI. The first is a single instruction, applied widely. The second is a plan for a named capability, applied once, end to end, with project-specific choices baked into the middle of it. Both are opinions. They are not the same shape.

I had no artifact for the second one. The playbook covers pieces. There is even an entry called responsive-image-sizing-per-context, which is a real rule that applies in a real way. But the end-to-end orchestration of wire-responsive-images-for-this-project lives in chat threads, where I walk the agent through the project's image strategy by hand. 16:9 at full width for a hero. 16:9 in a card, same ratio but a different image style because the rendered size is different. 4:3 portrait somewhere else. Every project has its own strategy, and the agent has to translate it into image styles, breakpoints, and field display assignments per view mode per content type. I keep saying roughly the same sequence with slightly different wording, and the agent keeps doing roughly the same thing with slightly different quality.

The same gap shows up in other places. Drupal has a declarative SEO recipe that installs the modules and sets reasonable defaults, which works well on projects where the defaults match what you need. But on a project where the team wants separate fields for metatag, Open Graph, and Twitter cards per content type, the declarative recipe cannot make that decision for you. Something has to direct the orchestration for that project's specific shape. A Next.js front-end wired to a Drupal back-end is the same story. Atomic guides cover JSON:API, decoupled menus, content routing, and the end-to-end procedure with this project's choices has no artifact home. The wall is not specific to images. It is the orchestration shape itself.

Recipes

Hence recipes. Not a guide, because guides are atomic and a recipe carries orchestration over multiple atoms. Not a playbook entry, because that is a single rule, and an orchestration is by definition more than one decision. Not a skill, because a skill is something the AI installs, sitting in the agent's front-matter for every session, whether this project needs responsive image wiring or not. A recipe is something the agent reads when this project needs this capability, and skips otherwise.

The mental model is straight. The first ten lines describe the feature, like a skill's frontmatter, so the agent can read just that and decide whether to keep going. Roughly:

---

recipe: responsive_image_wiring

describes: Wire Drupal responsive image styles, breakpoints, and field display modes from declared image use-cases per content type.

applies_when: project has image fields rendered across multiple breakpoints, art direction or resolution switching, or both.

requires: image, responsive_image, breakpoint

produces: image styles, responsive image styles, field display config per view mode per content type

---

Below that header, the body. The opinion (how we do it, what we refuse to do). The sequence of steps, each citing its atomic source so the agent fetches the decision rather than guessing it. The data flow from the declared inputs to the applied Drupal state. The repetition pattern (once, per content type, per view mode). The contract for what state the recipe reads before it acts. References, with the curl rule honored, so the agent reads bytes and not summaries.

Load on demand. Not installed. Updated when the modules and core the recipe sits on move, because I have been refining my atomic guides by hand long enough to know the same maintenance applies one layer up. I am wiring the update automation into my own setup right now, but the format is what matters. The format has to allow that automation for anyone who wants to build it, not depend on any one person owning it.

I do this in Drupal because Drupal is my community. The pattern is the same anywhere there is a stable capability layer above atomic APIs. Next.js. Rails. Design systems. Anywhere the orchestration of a named feature is repeatable but the choices inside it are project-specific.

I am sharing this to see if it resonates. I built each layer of my tooling because something was breaking when I did not have it, and recipes are the latest piece in that sequence. Orchestration plans for named capabilities. Loaded on demand. Not installed. I would like to know whether other people running AI on real projects are hitting the same wall, the one where you can feel the difference between a rule and an orchestration but you have no artifact to hold the second one. If you are hitting it, I would like to hear from you.

A Drupal Couple

Categorized: