I pointed a fleet of Claude agents at every SEO and website audit on BlitzMetrics.com and told them to bring each one up to our Ship It Styled standard. This meta-article documents exactly how they did it — 83 audits read, restructured, restyled, SEO-tuned, and published live in a single working session. The deliverable is the live audits themselves.
Start with the assignment
The task was simple to state and large to execute: take every published audit on BlitzMetrics.com — dozens of them, written over two years in a plain-text 2024 style — and bring each one up to the current standard. That means the BlitzMetrics article guidelines (SEO title under 60 characters, meta description under 160, primary keyword in the opener, verb-first H2s, short paragraphs, active voice, no AI fluff) plus the June 12 Ship It Styled visual format, plus an explicit dual track for our two audiences: the home-service business owner and the young adult learning to run audits.
The source material was a URL list and a logged-in WordPress session. Everything else — reading, restructuring, styling, SEO, and publishing — was the agents’ job.
Build the standard, then fan out
The first move was not to touch a single audit. It was to write the standard down. The orchestrator read the article guidelines and the meta-article prompt, then produced a one-file spec — the rubric, the exact inline-styled HTML components, the brand palette, the dual-track rule, and a 20-point QA scorecard — that every worker agent would follow line for line.
Next came the corpus. The public master list only links a fraction of the audits, so the real inventory came from the WordPress REST API: the Website Audit and Quick Audits categories hold 94 posts. The orchestrator built a deduplicated, prioritized work queue, tuned one gold-standard exemplar for sign-off, then fanned the rest out to worker agents in waves.
Each worker agent owned a handful of posts end to end: read the real content, restructure to Metrics → Analysis → Action, write the single-line styled HTML, set the SEO metadata through Rank Math’s endpoint, and publish live — preserving each post’s original video and publish date. Then it QA’d its own work and wrote a record file.
Make the calls a checklist would miss
The judgment calls are where the system earned its keep. Five worth naming:
1. The exemplar did not match the live post. The gold-standard exemplar was built from an audit’s source data, but the live post under that URL was a different, longer article. Rather than overwrite real content with the exemplar’s framing, the agents tuned each post’s actual published content. No invented findings, ever.
2. Four agents at once broke the API. The first wide fan-out ran four worker agents in parallel and the connection collapsed. The orchestrator diagnosed the overload, dropped to two concurrent agents, and never hit it again. Reliability beat speed.
3. Three posts had nothing to tune. Three “audits” turned out to be boilerplate with zero findings or numbers. The agents refused to publish them rather than fabricate stat cards, and flagged them for a human to supply the source. The no-fabrication rule held under pressure.
4. A name conflict got surfaced, not papered over. Two posts about “Nicole Cowley” disagreed on whether she is a chiropractor or an agency owner. The agent tuned each to its own text and flagged the conflict for a human to settle instead of guessing.
5. Sensitive content waited for a human yes. Three posts name parties in active disputes. The orchestrator held them out of the automatic run and only restyled them — claims and hedging preserved verbatim, nothing amplified — after explicit approval.
Proof ledger: Verified — the “83 live and on-standard” figure was checked directly against the live site (every post re-fetched and tested for the styling, deliverable block, and stat cards). Self-reported — the token counts below come from each worker agent’s own usage report; the dollar figures are estimates at public model pricing, not metered invoices.
Read the token receipt
Sixteen worker-agent runs did the tuning and publishing (a four-agent wave failed early and was re-dispatched at lower concurrency). Token counts are each agent’s self-reported usage.
| Worker runs | Posts | Tokens |
|---|---|---|
| Pilot + validation | 8 | 230,664 |
| Waves 1–2 (home services) | 21 | 522,874 |
| Waves 3–4 (professional) | 21 | 577,022 |
| Waves 5–6 (pillar + recaps) | 19 | 752,389 |
| Wave 7 + litigation | 6 | 209,237 |
| Total (16 runs) | ~75 | ~2,292,000 |
The remaining posts were published directly by the orchestrator. At public model pricing, roughly 2.3M worker tokens plus orchestration lands near $25–$45 worst-case at Opus list rates, less with caching and smaller models on the lighter steps.
Compare agent versus human
| Phase | Agent | Human | Human cost @ $60/hr |
|---|---|---|---|
| Write the standard + build the inventory | ~30 min | 8–12 hrs | $480–$720 |
| Read, tune, and publish 83 audits | one session | ~166 hrs | ~$9,960 |
| QA sweep + consistency fixes | ~15 min | 6–10 hrs | $360–$600 |
| Total | One session · ~$45 tokens | ~190 hrs (~5 weeks) | ~$11,000 |
Be honest about what needed a human
The agents handled research, restructuring, copywriting, styling, SEO metadata, internal linking, publishing, and QA on their own. They needed a human for the things agents should need a human for: supplying the original audit for the three empty posts, settling the Nicole Cowley name conflict, approving the sensitive litigation posts, and selecting featured images from real photos. That division of labor is the point — the machine does the volume, the human makes the calls that carry judgment or risk.
Want to reproduce this on your own content? Write the standard down as one file first, prove it on a single exemplar, then fan out workers that each own a few items end to end and QA their own output. Cap concurrency low enough that your tooling stays stable — and make “do not fabricate, flag instead” a hard rule, not a preference.
Count what got ingested
Across the run, the agents enumerated 94 posts, read roughly 83 in full plus several BlitzMetrics standards documents (the article submission guidelines, the meta-article prompt, the digital audit master list), and queried the WordPress REST API hundreds of times to enumerate, read, publish, and re-verify. Every published stat card and table figure was pulled verbatim from the audit it describes.
Score it against the guidelines
| Guideline | Status |
|---|---|
| Hook opens with a specific situation | PASS |
| Title under 60 chars; meta under 160; keyword in first paragraph | PASS |
| Verb-first H2s, short paragraphs, active voice, no AI fluff | PASS |
| Ship It Styled: lede, 3 stat cards, branded tables, callouts, deliverable | PASS |
| Token receipt + agent-vs-human cost tables (both required) | PASS |
| 2–3 internal links incl. a money/pillar page | PASS |
| Embedded real visual from the work | PARTIAL — links to live deliverables; featured image needs human |
| Featured image from a real photo | NEEDS HUMAN |
| Final publish approval | NEEDS HUMAN |
The work itself is the proof. Browse the live audits — every one restyled, dual-tracked, and SEO-tuned.
This meta-article was produced by the same Content Factory pipeline it documents: an agent did the work, then wrote up exactly how — the proof, not a pitch. For the method behind every audit, see the Quick Audit and the Metrics, Analysis, Action framework.
