We rebuilt our whole site, brand and all, across about a week in June. It came to roughly four days of actual work, with rest days in between, not a solid week straight. And even on those days, much of the work was the tools running while we watched and corrected, not us at the keyboard. That experience, the watching and correcting, is what this post is about.
There are two parts to this setup. First we built the brand and the guidelines for the site. Then we created the design and built the site in Drupal, using our own pipeline of dev-guides and agentic recipes. The site is adrupalcouple.us, and the pipeline we used is ours. We build these tools at Palcera, where we have written more about how we use AI on client projects. We turned them on our own site, so this is not a neutral review. It is an account of what the tools did well, where they were confidently wrong, and where a human had to step in. Plenty of people write about building with AI and skip that part. That part is where we spent most of our time.
We bring two views to it, and from the first conversation to the last we spent a lot of time telling the AI what we wanted and how to say it. What we kept running into is that these tools will hand you something that looks finished. Whether it is actually good, or just good-looking, is a different question, and on this build the two came apart more than once. Holding them together is the part a human still has to do.
It started with what we wanted to say
The build did not start in code. It started with what we wanted to express as a couple. We are part of the Drupal and tech community, we like to mentor, and we are not short on opinions. The two of us come at the work from different backgrounds, design and computing engineering. From there came the brand, then the design system, then the voice, then a reference template, and only then a full Drupal site.
The methodology mattered more than the model. We settled who we are and what we stand for before anything went near a layout, in that order, because those calls are expensive to get wrong later. Everything downstream had to answer to that alignment. That is where the brand identity actually starts, and it is why most of what looks like a design decision began as a brand one.
Then came the design system, and the first sign that you cannot just let the tools run loose. We set the palette early, deep teal and dusty plum on a warm grey ground, with one hard rule we set for ourselves, never red, yellow, or brown. The tool honored that veto on the brand colors and then let a yellow slip into the neutral ground, the one place it had decided the rule did not apply. The grey came out beige, the exact family we had banned. We saw it. Carlos traced why, re-derived the ground from the plum instead, checked the contrast by hand, and logged the gap against our own tool.
The typography we use is a values choice, not a styling one. Fraunces for display, Atkinson Hyperlegible for the body, IBM Plex Mono for code. Atkinson Hyperlegible exists to be readable for low-vision readers, and accessibility is a real commitment for us, so the body typeface had to earn it. That part came out right from the start, once the AI had the right inputs.
What AI is genuinely good at, and what it is not
Here is the part we want to be precise about, because it is easy to misread. A lot of branding and design, and honestly a lot of application architecture too, is theory. It is strategy, established technique, conventions that someone has already written down. AI is genuinely good at that part. It can read it and apply it, and you can trust it to. That part is maybe seventy or eighty percent of the job, and on our build the AI did it well. What came out was close to what a good designer or brand person would produce. Close, not better, and only that close because the two of us were driving it. Hand the same tools to someone without the design and brand judgment and you do not get a near miss, you get something that looks right and is not.
The missing part, the other twenty or thirty percent, is the human factor. It is the creativity and the experience that is not written down in any guide, and it is the part that decides whether the work is actually good. The AI cannot reach it. That is why the budget you used to spend on a designer or a brand expert does not disappear. You still need that person for the part that matters most, the cherry on top.
The real disagreement
A few weeks ago Carlos watched this come up in a conversation. Someone had given an AI the Figma connector and the skills to turn a design into a Drupal theme, and the result, in their words, was poor. He offered some advice, which Drupal theme to start from and a few rules to give the agent. Another developer pushed back with a simpler take, just ask for visual parity and it will get there. We both disagree with that, at least if you are after really good results.
The agent will reach visual parity, that part genuinely works. Our footer matched the reference and passed every visual check we ran the first time. It was also built completely wrong. The converter had generated an entire custom module, with a block plugin in PHP, just to hardcode the footer menus and the tagline copy. In Drupal, loading a menu is the most native thing the platform does. There was no reason for custom code at all, and because the editor copy sat in a $this->t() call in PHP, no one could change it without a developer. The pixels were right. The build underneath would have been a problem for every content editor who touched it later. Visual parity is not the same as well built. A thing can pass every visual check and still be the wrong way to build it.
We ran into a similar problem with Composer. Left alone, the AI took control of it and removed a module from Composer before uninstalling it, the wrong order, and a mess to fix. Carlos caught it, but catching it is not the real answer. The real answer is to set the tools up so they reach for the correct, current way to do a thing instead of acting on what they already remember, which is sometimes out of date and does not carry opinions of its own. That is what the dev-guides and agentic recipes Carlos has written are for. They are our opinions about how to build well, written down where the tools can reach them, and we have written before about why we lean on them. On this build it meant grounding the tools in current, checked guidance instead of the starter kit or the model's memory, down to driving the responsive images from a published recipe that verifies itself at the end, rather than letting the AI guess. When the tools are built to find the right information first, there is far less for a human to catch. There is always something, but far less.
What human-in-the-loop looked like
Human-in-the-loop is not a phrase we came up with. The Drupal world has used it for a while, and plenty of people in and outside our community use it for working with AI. It is exactly what we did here, and the more these tools can do, the more it matters, not less. In practice it meant we owned the judgment, designed the checks, and caught the confident mistakes while the machine did the volume. Three moments show the shape of it.
Even with a clear pipeline, the AI invented things that were not real, and it did it well enough to be dangerous. It manufactured a signature "failure report" component with a tidy five-stage structure. It built a newsletter sign-up with twelve thousand subscribers, for a newsletter that has never existed. It tried to stamp a two-author byline on every article and to put a badge reading "the technical provocateur" into the public interface. None of that was requested. The test that killed each one was simple. Is this a real recurring thing with a real authoring path in Drupal, or is it a voice stance someone turned into a widget? We had to apply that test about six times, hardest on the byline, because the tool optimizes for plausible distinctiveness and the truth is less tidy. Our byline is single-author by default. Only a handful of pieces are genuinely co-authored, this one among them. The couple is who we are at the level of the site, not a stamp on every post.
The next catch came from a machine, not from us. One of our token builds passed the automated gates, passed code review, and passed a correctness critic whose job was to confirm the code did what it claimed. Three checks, all green. Then a second critic, an adversarial one, looked at the same work with a single question. Does this actually meet the acceptance criteria? It followed the attachment chain through the theme files and found a leftover stock theme still shipping on every page. It halted the merge. We amended the work order and rebuilt it clean. The lesson is not "use a smarter reviewer." It is that we had designed disagreement into the loop on purpose, and then Carlos made the call to stop and fix. The dissent was structured, and a human owned the verdict.
And the visual comparison agents, the ones that are supposed to check parity, overstated constantly. They reported a footer as dark that was light in both places. They reported three columns where there were two. They flagged one page as three or four times too long when it was actually shorter than the reference, 2458 pixels against 3139. So we stopped trusting them as anything but finders. Every load-bearing claim got re-checked by a person at crop level before it drove a single fix. Fast and confidently wrong is the failure mode we keep writing about in AI coding, and we caught our own tools doing it.
The quieter parts
Most of the build was quieter than the failures, mostly choosing the native Drupal way over custom code, the same rule the footer had broken. The part we will point to is accessibility, because it started badly. The reference template scored a D on its first automated audit, and we did not ship it like that. We fixed the low-contrast text, added support for the reader's own contrast and motion settings, and wrote an accessibility statement that does not claim more than we have verified.
The article images are AI-generated, which we have no problem with. The photos of us are real, with no AI faces anywhere, because using AI to make an illustration is one thing and faking your own people is another.
Why this matters past our own site
We opened with economics because they are the reason this matters past our own corner. A real brand and a well-built site used to take a budget only larger companies have, a design team, a brand expert, and the months that come with them. Doing the same work in about a week of supervised time lowers that bar. That is what we are working on at Palcera, the same approach for companies that could never afford a branding team. The supervision is the part that makes it work, not an afterthought.
So that is how it went. What we made is good, and getting there took about a week of part-time attention instead of months of a team's. That is worth being glad about, and we are. But it is good because a person stayed in the loop, not in spite of one. AI can help with this, and it will help more as it gets better. It carries the standard layer and frees you to spend your judgment on the hard parts. It is not a designer and it is not a brand expert, and it cannot add the last piece, the one that actually matters. So keep a human in the loop, and put them where it counts.
Go and look at the new adrupalcouple.us. And if you are building with these tools too, we would rather hear how you set them up to get things right, and where you still have to step in, than whether the output looked good.