SEO + AI SEO + LLM Readiness Audit
A consolidated SEO + AEO + LLM-readiness deep-dive for a DTC brand. Merges traditional SEO (technical, on-page, content, backlinks, Core Web Vitals, schema, keyword gaps, GSC data) with AEO / AI-SEO (LLM citation presence, Google AI Overview visibility, schema for AI extraction, conversational-query targeting) and LLM readiness (per-template metadata + schema matrix across ten page archetypes, llms.txt root checks, JSON-LD URL validation). One diagnostic. One report. Covers all three.
Purpose
Compress what a traditional SEO agency sells as a three-to-six-week retainer into a single Claude Code session that produces a consolidated SEO + AEO report. The audit is the deliverable for the SEO strand of Step 1 of the Operating System (Map the Growth Equation), the first-cut of Step 3 (Find the Binding Constraint) as applied to organic traffic, and the seed of Section 10 action items feeding {client_folder}/actions.md.
Traditional SEO audits and AEO audits are published as two separate products by most of the market. That split costs the founder time and leaves cross-cutting findings on the table. Keyword gaps in SEO often map to AEO citation gaps on the same query. Schema coverage is the same asset in both disciplines. One audit, one report, one action list.
When to run it
- Week 1 of a new engagement. The SEO + AEO audit deepens the SEO section of the broader GTM audit. Run it when organic traffic, content, or AI visibility is the suspected constraint.
- Standalone diagnostic. A self-contained run for a brand that wants the full SEO + AEO + LLM-readiness pass without an ongoing engagement.
- Strategic refresh every 6 months. Re-run on the brand to check whether the biggest organic lever has moved and whether AEO presence has shifted (LLM answers and AI Overviews re-roll the SERP every quarter).
- Before a content sprint. Re-run right before a 90-day content push so the brief lands on the right gap and the AEO-extraction targets are clear.
Who this is for (ICP)
Premium DTC brands, AOV $500+ or 2x category median, founder-led, US / western English-speaking market focus. Tuned for Shopify, WooCommerce, and Hydrogen catalog stores. The playbook works outside that ICP but the benchmarks and competitor-gap logic are sharpest inside it.
Service businesses and B2B SaaS brands can run the same playbook for the technical SEO, AEO, and on-page sections. The commerce-specific layers (product schema, Merchant Center, attribute-tuple gaps) will be marked N/A.
Two paths, five modes
The audit flexes on two dimensions. Path classifies whether the brand has historical organic data to analyse. Mode further refines coverage based on access granted.
Path A. Established brand
3+ months of trading history, GSC data, measurable organic traffic. Data-heavy. Every section populates fully.
Path B. New or pre-launch
Minimal or zero GSC history, no meaningful organic traffic yet. Focus shifts to foundations, keyword strategy, competitor positioning, and AEO readiness. SEO performance analysis converts into a readiness checklist.
The five modes
| Mode | Path | Access | Centre of gravity |
|---|---|---|---|
| A1 (Full) | Established | GSC + GA4 + Ahrefs or DataForSEO configured | Complete diagnostic across every section. Biggest-lever call grounded in historical traffic |
| A2 (Quick) | Established | Same as A1 but the operator is running a 30-minute pass before a sprint | Sections 1 through 3 depth, Sections 4 through 6 as a scorecard, Section 7 action list. Caps at 30 minutes end-to-end |
| A2-thin | Established | GSC only, no other external credentials | GSC-driven: indexation, queries, top-CTR opportunities. External competitor layer runs from public SERPs. Schema and technical checks run from the live site. AEO checks run in full |
| B1 (New) | New / pre-launch | Any external tool set | Foundations + keyword strategy + AEO readiness. Competitor positioning carries weight. Performance analysis replaced by pre-launch checklist |
| B2 (Comparison) | Either | Any external tool set | Three-competitor comparison. Benchmarks the brand against 3 named competitors across every section. Surfaces a ranked list of the biggest organic gaps |
Mode detection
At kick-off, detect the mode:
- Path classification. Check
CLAUDE.mdstage field and the presence of GSC or GA4 credentials in.env. 3+ months of history defaults to Path A. Pre-launch or first-quarter trading defaults to Path B. Confirm with the user. - Access classification. Inventory the External and Internal tools. If GSC + GA4 + Ahrefs or DataForSEO are all configured, default to A1. If only GSC, default to A2-thin. If neither GSC nor GA4 are configured and the brand is established, default to A1 with the External-only shape called out in Section 4 coverage.
- Mode override. Always ask the user. If they want a 30-minute scan before a sprint, override to A2 (Quick). If they want a focused comparison against named competitors, override to B2.
- Mode sticker. Record the mode in the report header.
Tool-availability gate (strict, two layers)
Runs in two layers. External tools run on any brand. Internal tools run only when credentials live in .env. Honest gaps beat invented numbers every time. No fabricated data, no placeholders, no guessed numbers. Mark missing checks as "Requires access" and file them in the Access-grant checklist.
External layer (always runs)
| Tool | Used for | Variable |
|---|---|---|
| DataForSEO | Ranked keywords, SERP data, backlinks, content-gap set difference | DATAFORSEO_LOGIN / DATAFORSEO_PASSWORD |
| Ahrefs | DR, referring domains, backlink profile, historical rank changes | AHREFS_API_KEY (optional, DataForSEO covers if absent) |
| Serper | Live SERP snapshots, brand-mention scan, paid-SERP copy capture | SERPER_API_KEY |
| Keywords Everywhere | Search volume, trend, related queries | KEYWORDS_EVERYWHERE_API_KEY |
| Screaming Frog CLI | Full-site crawl, technical SEO, schema validation, broken links | Local install |
| PageSpeed Insights API | Core Web Vitals per template | GOOGLE_AI_KEY |
| WebFetch + WebSearch | Live-site on-page checks, competitor positioning, AEO citation testing | Built in |
| AEO citation checks (ChatGPT, Claude, Perplexity, Google AI Overview) | AEO presence on commercial queries | WebSearch + seo_toolkit.py aeo-serp |
Internal layer (runs when credentials are present)
| Tool | Used for | Variable |
|---|---|---|
| Google Search Console | Query-level data, indexation, coverage errors, CTR patterns | GSC_SERVICE_ACCOUNT_JSON + property access |
| Google Analytics 4 | Organic sessions, conversions, landing-page performance | GA4_SERVICE_ACCOUNT_JSON + property access |
| Shopify Admin API | Product schema quality, collection hygiene, redirect hygiene | SHOPIFY_ADMIN_TOKEN_{STORE} |
| Microsoft Clarity | Session recordings, rage clicks on SEO landing pages | Project token |
| Google Merchant Center | Feed health, product-schema cross-check | Content API access |
Gate behaviour
For every check the audit plans to run:
- If the check uses External tools only, run it.
- If the check uses Internal tools, verify the credential exists in
.env. If present, run it. If absent, stub the check with "Requires access. {tool} not yet granted. Unlocks when the client adds {service account} as {role}: {one-line value}." - Add every missing Internal section to the Access-grant checklist at the end of the report.
- Never block the audit on a missing Internal tool. Ship the External layer. Flag the Internal gaps.
Inputs the playbook accepts
Prompt the user for these at the start of the run. Accept sensible defaults where possible. Pull from CLAUDE.md when populated.
| Input | Required | Example |
|---|---|---|
| Client name | Yes | Incred Golf |
| Client project folder path | Yes | Claude-Projects/IncredGolf |
| Primary site URL | Yes | https://incred.golf |
| Competitor URLs | Yes | 3 to 5 URLs |
| Path | Yes | A or B (auto-detected, confirm) |
| Mode | Yes | A1, A2, A2-thin, B1, B2 |
| Target geography | Yes | US, UK, EU, global |
| Target audience (1 line) | Yes | Club golfers in the US aged 35 to 55 improving short game |
| Primary commercial keyword hypothesis | No | wedge distance trainer |
| Ecommerce platform | Yes | Shopify, WooCommerce, Hydrogen, custom |
If any required input is missing, ask the user. Never proceed with invented inputs.
Output: one consolidated report, two file formats
{client_folder}/docs/seo-audit/{YYYY-MM-DD}/seo-audit-report.md (canonical source)
{client_folder}/docs/seo-audit/{YYYY-MM-DD}/seo-audit-report.docx (Word deliverable)
The .md is the source of truth, version-controlled in Git. The .docx is the polished Word deliverable, generated from the markdown via pandoc at the end of the audit run.
Supplementary evidence lives alongside the report in:
{client_folder}/data/seo-audit/{YYYY-MM-DD}/
Length target: 3,500 to 7,000 words.
Report structure (fixed order)
# SEO + AEO Audit: {Client Name} ({YYYY-MM-DD})
Mode: {A1 / A2 / A2-thin / B1 / B2}
Path: {A Established / B New}
Access level: {External only / External + Internal}
## Executive summary
3 to 4 sentences leading with the narrative the Cross-cutting themes reveal. Names the single biggest organic lever, the top two quick wins, and the biggest access gap.
## Cross-cutting themes
3 to 5 bullets. Each insight spans 2 or more sections. Section numbers called out in brackets. Produced by the Synthesis pass.
## Key findings
5 to 7 one-sentence bullets, one per section. Skim-readable in 60 seconds.
## Key action items (top 5 this week)
The top 5 items from Section 8 pulled forward with a one-line "why this first" tied to Cross-cutting themes.
## 1. Crawlability and indexation
## 2. Technical foundations
## 3. On-page SEO
## 4. Content quality and topical authority
## 5. Keyword footprint and content gaps
## 6. AEO and AI-SEO
## 7. Backlinks and domain authority
## 8. Local and ecommerce-specific (or Prioritised Action Items for non-commerce)
## 9. Prioritised Action Items
## Access-grant checklist
(Only appears if any Internal-layer section is stubbed.)
## Conclusion
2 to 3 short paragraphs. What the founder does next this week. Where to learn more about ongoing support if the founder wants this run for them. Book-a-call CTA link.
Synthesis pass (mandatory before writing the report to disk)
After drafting all eight area sections plus Section 9 in first-pass form, do not write to disk yet. Assemble the complete draft as a single document in your working context and read it end-to-end in one pass. Ask:
- What pattern emerges across three or more sections that no single section captures?
- What does a finding in Section X imply for the recommendations in Section Y? (A content gap in Section 5 that overlaps an AEO citation gap in Section 6. A schema-coverage weakness in Section 2 that caps AEO extraction in Section 6. An on-page issue in Section 3 that compounds a crawl-depth issue in Section 1.)
- Is there a strategic narrative connecting the data that the Executive Summary placeholder is missing?
- Does any Section 9 action address multiple section findings at once? Promote it.
Produce 3 to 5 cross-cutting insight bullets from this synthesis. Revise:
- The Executive Summary to lead with the narrative these insights reveal.
- Section 9 to re-rank actions that hit multiple leverage points.
- Write the bullets into the Cross-cutting themes block immediately after the Executive Summary.
Only after the synthesis pass completes do you write the final consolidated file to disk.
Section 1. Crawlability and indexation
Runs primarily on the External layer. GSC is the key Internal add-on when access exists.
[E]Site-level crawl via Screaming Frog SEO Spider CLI (mandatory). EntireCommerce holds a paid licence; cost is zero at the margin. Standard headless invocation:Default rendering (HTML, no JS) is sufficient for SSR-hydrated stacks. For SPA frontends (React/Vue/Next.js client-rendered), enable JavaScript rendering via a config file. Save a"/Applications/Screaming Frog SEO Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher" \ --crawl https://{brand-domain}/ \ --headless \ --output-folder {cwd}/data/seo-audit/{YYYY-MM-DD}/screaming-frog \ --export-format csv \ --overwrite \ --export-tabs "Internal:All,Internal:HTML,Response Codes:All,Response Codes:Client Error (4xx),Response Codes:Redirection (3xx),Page Titles:All,Page Titles:Missing,Page Titles:Duplicate,Meta Description:All,Meta Description:Missing,H1:All,H1:Missing,H2:All,Images:All,Images:Missing Alt Text,Canonicals:All,Directives:All,Structured Data:All" \ --bulk-export "Links:All Inlinks,Links:All Outlinks,Web:All Page Source" \ --save-report "Crawl Overview,Issues Overview,Orphan Pages,Redirects:All Redirects,Structured Data:Validation Errors & Warnings Summary"screaming-frog-findings.mdsummary alongside the raw CSVs covering: full page inventory, H1 issues (missing / duplicate / multiple), structured-data status at scale, image hygiene (alt text plus width/height attributes), broken internal links, redirect chains, and any active paid-offer / checkout endpoint set surfaced as auth-gated 4xx responses. Common discoveries that earn this step its keep: pages the founder forgot existed, missing H1s on indexable pages, zero structured data confirmed at scale, image-dimension attributes missing site-wide (a frequent CLS root cause), and active paid offers visible only as gated checkout URLs.[E]Optional one-competitor peer-tier benchmark. Worth doing when a same-band comparison on H1 / schema / image-hygiene / page-count would sharpen Section 5 cross-competitor analysis. Pick the closest peer-by-style match. Skip when the scale gap is too wide for the comparison to be actionable. Output todata/seo-audit/{YYYY-MM-DD}/screaming-frog-competitors/{competitor-slug}/.[E]DataForSEO OnPage API as an automated alternative when Screaming Frog is not installed locally (seo_toolkit.py site-audit --max-pages 100). Report crawled-URL count, crawl depth, broken links, redirect chains, orphan pages.[E]robots.txtcheck: presence, syntax, blocked paths, sitemap reference, AI-bot allow-list (must allow GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot, anthropic-ai, Google-Extended, Bingbot).[E]Agent-readable site summaries: HEAD-checkhttps://{domain}/llms.txtANDhttps://{domain}/llms-full.txt. Both must return 200. If either 404s, file as a fix in Section 9 (Action Items). Spec: https://llmstxt.org.llms.txt= brand summary + links to canonical pages.llms-full.txt= full markdown of FAQ + technology + about content. Single static file each. ~1 hour to generate when the content already exists.[E]XML sitemap check: presence, index sitemap format, URL count, last-modified dates, sitemap referenced inrobots.txtand submitted to GSC.[E]noindex/nofollowtag review: verify no commercial pages are accidentally blocked from indexing.[E]Canonical tag audit: self-canonicals on unique pages, cross-canonicals on duplicates, no canonical chains.[I]GSC Index Coverage report: Valid count, Excluded breakdown (Discovered but not indexed, Crawled but not indexed, Duplicate without user-selected canonical, Soft 404, 404, redirect errors).[I]GSC sitemap submission status and last-processed date.[E]Shopify redirect chain hygiene (Shopify only) viashopify-admin-url-redirect-auditskill.[E]Shopify collection hygiene (Shopify only): orphan products in zero collections viashopify-admin-collection-membership-auditskill.
Output: a coverage scorecard (indexed, blocked, excluded, unknown), a ranked list of indexation blockers, and a short list of orphan or near-orphan pages worth reconnecting to the internal-link graph.
Section 2. Technical foundations
[E]Core Web Vitals per template via PageSpeed Insights API. Pull mobile and desktop. Templates: homepage, top category, top product, top blog, checkout. Thresholds: LCP < 2.5s, INP < 200ms, CLS < 0.1, mobile score > 70, desktop score > 80. Flag every template failing any threshold.[E]Mobile UX audit: viewport meta, tap-target sizing, legible font sizes, no horizontal scroll. Use PageSpeed Insightsmobile-friendlinesssignals.[E]Schema markup validation: Organization, Product, Breadcrumb, FAQ, Review, HowTo, Article. Run each priority URL through Google Rich Results Test (via WebFetch) or Schema.org validator. Record which schemas are present, which are missing, which have warnings.[E]HTTPS and certificate check: confirm every URL redirects HTTP to HTTPS, certificate is valid for the full domain chain, no mixed-content warnings.[E]Internal linking graph: anchor-text distribution, orphan pages, deep-link pages more than 3 clicks from home, hub-and-spoke cluster shape. Flag commercial pages receiving fewer than 5 internal links.[E]URL structure: lowercase, hyphenated, no session IDs, no trailing.html, consistent trailing-slash behaviour.[E]International SEO (if the brand targets multiple markets):hreflangtag presence, bidirectional matching, defaultx-defaultfor the global version.
Output: a Core Web Vitals template table, a schema coverage matrix, and a ranked list of template-level fixes.
Section 3. On-page SEO
Runs entirely on the External layer, with GSC adding query-level precision when available.
[E]Title tag audit: length (30 to 60 chars), primary keyword position, uniqueness across pages, brand suffix consistency. Flag duplicates and stuffed titles.[E]Meta description audit: length (120 to 160 chars), uniqueness, click-earning phrasing (benefit + differentiation + CTA). Flag blank, duplicate, and truncated metas.[E]H1 audit: one H1 per page, primary-keyword match, no keyword-stuffing. Flag missing H1s and multi-H1 pages.[E]Heading hierarchy: logical H1 → H2 → H3 nesting, no jumps (H2 directly to H4).[E]Keyword cannibalisation: multiple pages targeting the same query. First pass via Ahrefs or DataForSEO.[E+I]Cannibalisation confirmation via GSC query-to-page mapping when available.[E]Duplicate content: near-duplicate pages, boilerplate paragraphs repeated across PDPs, faceted-navigation crawl traps.[E]Image alt text: presence, descriptive phrasing, free of keyword stuffing. Sampled across priority templates.
Output: a prioritised on-page fixes list, sorted by impressions at risk (when GSC is available) or estimated traffic value (when it is not).
Section 4. Content quality and topical authority
[E]E-E-A-T signals: author byline with bio, author credentials, About page completeness, contact info, terms and privacy, original research, external citations in the niche. Score as a binary checklist.[E]Content depth per priority page: word count, embedded media, original analysis, first-hand experience markers.[E]Content freshness: last-updated timestamp visible on the page, date in URL where applicable, patterns of annual refresh for evergreen content.[E]Topical authority map: cluster the existing content by topic, identify which topics are mature (pillar + 10+ supporting articles), which are shallow (pillar + <3 supporting), which are missing. Map each commercial intent to a cluster.[E]Duplicate-topic audit: multiple posts targeting the same topic with marginal differentiation. Merge candidates.[E]Thin content: pages under 300 words on commercial intents, PDPs with only manufacturer-supplied descriptions, category pages with no above-the-fold copy.
Output: a topical-authority heat map (mature / shallow / missing by cluster) and a merge-or-expand list for shallow content.
Section 5. Keyword footprint and content gaps
Step 0 — competitor-set + benchmarking opt-in. Run the competitor-set decision tree from PLAYBOOK-TEMPLATE.md. If {client}/CLAUDE.md has a populated ## Competitors block, use it. Otherwise auto-propose a set from the head-term SERP data already pulled in Section 4 and ask the user to confirm, swap, or skip. If user confirms, write the set back to {client}/CLAUDE.md.
Conditional BT deliverables (when a competitor set exists):
BT-SEO-organic-visibility.md— own brand + each competitor: monthly organic visits, organic keywords, average position, blog post count, domain authority.BT-CONTENT-publishing-cadence.md— posts per month, total post count, content depth signals, internal-linking density.
If the user has skipped benchmarking, this section MUST include a single line: "Benchmarking skipped — no competitor set provided." Never ship a silently incomplete report.
[E]Ahrefs (or DataForSEO) top-100 ranking keywords for the brand domain. Include position, URL, search volume, keyword difficulty, estimated traffic share.[E]90-day position-trend analysis: keywords gaining 10+ positions, keywords losing 10+ positions. Flag losing keywords for urgent review.[E+I]GSC query-level data (when available): top 100 queries by impressions, top 50 by clicks, top 50 by CTR. Cross-reference with Ahrefs to spot GSC-only queries (branded and long-tail).[E]Competitor keyword gap vs the 3 to 5 competitors from the positioning brief. Pulllabs_ranked_keywordson the client and each competitor separately. Computecompetitor_keywords - client_keywordsin Python. Do not use DataForSEO'slabs_keyword_intersectionfor gap analysis. It returns the intersection set (the keywords both domains rank for). A content gap needs the set difference.[E+I]Striking-distance queries (position 4 to 15 + impressions 50+): one content push or on-page fix away from page 1. Source from GSC when available, Ahrefs otherwise.[E+I]High-impression, low-CTR queries (position 1 to 10 + CTR under 1%): title and meta rewrite candidates.[E]Branded vs non-branded split: Ahrefs estimate, confirmed by GSC when available. Flag brands where non-branded is below 30% of organic clicks.[E]Industry keyword universe (consolidation step). After per-competitor gap CSVs are produced, union them into one deduped master dataset. For each keyword computenum_competitors_ranking,aggregate_volume(max observed),max_difficulty,best_competitor_position,brand_ranks. Scoreopportunity = aggregate_volume × num_competitors_ranking × max(1, 101 - max_difficulty) × (1 if not brand_ranks else 0.3). The(101 - max_difficulty)factor keeps the score positive across the full 0-100 DataForSEO KD scale. Classify intent (informational / commercial / transactional / navigational) via pattern-matching. Cluster by head-term semantic similarity into 6 to 12 clusters. Outputindustry-keyword-universe.csvwith columns: keyword, aggregate_volume, max_difficulty, num_competitors_ranking, best_competitor_position, brand_ranks, brand_position, intent, cluster, opportunity_score, priority_rank. This replaces per-competitor CSVs as the primary keyword artefact.[E]Clusters page-map. For each cluster: does the brand already rank top-30 on the cluster's head term? If yes, action isexpand. If yes but below position 30, action isrewrite. If no URL exists, action isbuild(specify page type: pillar 2000+ words, comparison hub, FAQ, buying guide, or category landing). Outputindustry-clusters-page-map.md: cluster_name, intent, size, existing_page_url_or_missing, recommended_action, top_5_keywords, opportunity_score, priority_rank.[E]SERP-feature mining. For the top 20 to 30 head-terms by opportunity score (skipping any the brand already owns position 1): pull PAA, related searches, and autocomplete via Serper. Compileserp-features-mined.csvwith head_term, feature_type (paa / related / autocomplete), text, implied_intent, semantic_cluster. Cluster the mined phrases. Flag any not addressed by existing content as FAQ-block or H3-section candidates. When Serper is unavailable, useWebSearchfor related searches and manual incognito-Chrome capture for autocomplete.[E]On-page-signal fallback. When DataForSEO, Ahrefs, and Serper are all unavailable: WebFetch the brand's and competitors' homepages, top 10 nav links, top 5 collection pages, top 5 PDPs, blog index. Extract titles, meta descriptions, H1/H2, URL slug tokens, schema, anchor text. Producefallback-onpage-signals.csvper domain. Run the universe consolidation on this reduced dataset, labelled "on-page signals only, no ranking or volume data". JS-rendered fallback: when WebFetch returns empty on a JS-heavy competitor (single-page apps, React-SSR-only sites), use Apifyapify/website-content-crawler(ifAPIFY_API_TOKENis in.envand account billing is configured). Caps:maxItems=100per domain,maxTotalChargeUsd=1.00. Returns markdown-ready content the on-page-signal extractor can parse.
Output: a ranked opportunity list, industry-keyword-universe.csv, industry-clusters-page-map.md, serp-features-mined.csv. The page-map is the operational artefact; the universe CSV is its evidence base.
Content-scoring rubric
Use when converting gap findings into ranked actions.
| Factor | Weight | Question |
|---|---|---|
| Customer Impact | 40% | Does this address a buyer question or commercial intent? |
| Content-Market Fit | 30% | Can we write this better than what already ranks? |
| Search Potential | 20% | Volume plus achievable difficulty for the domain authority |
| Resources | 10% | Can we produce this in under a week with existing assets? |
Section 6. AEO and AI-SEO
Answer-engine presence across ChatGPT, Claude, Perplexity, and Google AI Overview. External-layer only. This is the merged AEO layer that most SEO audits omit.
Tool fallback for JS-rendered SERP scraping. If Serper rate-limits or the AI Overview scan via seo_toolkit.py aeo-serp fails on a specific query, fall back to Apify apify/google-search-scraper (when APIFY_API_TOKEN is in .env and account billing is configured). Caps: maxItems=20 per query, maxTotalChargeUsd=0.20 per query. Returns full SERP HTML including AI Overview block and related-searches grid. Default to Serper for cost reasons — Apify SERP scraping is more expensive past ~500 queries — but it's the right tool when Serper has degraded for a specific query type.
6A. Presence audit
[E]Run 10 to 20 commercial queries from the brand's category across ChatGPT, Claude, Perplexity, and Google AI Overview. Useseo_toolkit.py aeo-serpfor the Google AI Overview scan. Manual prompts for the three LLMs. Record for each query:- Is the brand cited or mentioned?
- Which competitors are cited?
- Does an AI Overview exist on Google for this query?
- What source URLs does each engine cite?
[E]Cross-engine presence matrix: query × engine grid. 4 green squares = dominant presence. 0 green squares = AEO blind spot.
6B. Extractability checklist per priority page
Run against the brand's top 10 commercial pages.
| Check | Pass criterion |
|---|---|
| Clear definition in first paragraph | Yes / No |
| Self-contained answer blocks (work without surrounding context) | Yes / No |
| Statistics with named sources | Yes / No |
Comparison tables for X vs Y queries | Yes / No |
| FAQ section with natural-language questions | Yes / No |
| Schema markup (FAQ, HowTo, Article, Product) present | Yes / No |
| Expert attribution (author name and credentials) | Yes / No |
| Updated within last 6 months | Yes / No |
| Heading structure matches query patterns | Yes / No |
AI bots allowed in robots.txt | Confirmed in Section 1 |
6C. Princeton GEO visibility-boost plays
Use when prioritising AEO content edits.
| Method | Visibility boost | How to apply |
|---|---|---|
| Cite sources | +40% | Add authoritative references with links |
| Add statistics | +37% | Include specific numbers with named sources |
| Add quotations | +30% | Expert quotes with name and title |
| Authoritative tone | +25% | Write with demonstrated expertise |
| Improve clarity | +20% | Simplify complex concepts |
| Technical terms | +18% | Use domain-specific terminology correctly |
6D. AEO whitespace map
Queries where the brand has zero presence but 2+ competitors are cited. Each row becomes an AEO content brief: target query, missing citation-worthy element (definition, stat, comparison table, FAQ), target schema, priority.
6E. Entity and topic clarity
- Does the homepage declare what the brand is in the first 100 words?
- Are the primary product categories explicitly named with schema
Category/CollectionPagetyping? - Does
Organizationschema includesameAslinks to Wikipedia, LinkedIn, Crunchbase, niche authoritative sources? - Are author pages present with
Personschema includingsameAsto LinkedIn, publications?
6G. Per-template metadata + schema matrix (mandatory)
Every audit must produce this matrix. It is the single highest-leverage check in the entire playbook because it is the only one that catches per-page-template execution misses — bugs that are invisible when only the homepage is inspected, and bugs that are invisible when only one product page is inspected.
Why it matters: most "AI/LLM readiness" failures are not strategic. They are bugs that a single page template carries until someone explicitly inspects it. The homepage is the most common offender (Shopify defaults <title> to shop.name when no SEO title is set, blog index has no native SEO panel, refund/policy pages inherit nothing). PDPs, product detail pages, and tech/about pages each have their own quirks. Without a per-template grid, audits collapse to "homepage looks fine" or "PDP has Product schema" and miss the gaps in between.
Page archetypes to inspect (every audit, every brand):
- Homepage (
/) - Collection / category index (top-traffic example)
- Product detail page (top-traffic example)
- Blog index (
/blogs/newsor equivalent) - Blog article (top-traffic example)
- About / team page
- FAQ page (if exists)
- Technology / methodology / how-it-works page (if exists)
- Refund / shipping / policy pages (single representative)
- Comparison / "vs" pages (if exist)
Fields to verify per archetype (build as a wide table):
| Field | Pass criterion |
|---|---|
<title> | Unique per template, not equal to shop.name, contains primary keyword |
meta name="description" | Present, 120–160 chars, claim-rich, page-specific |
og:title | Present, page-specific (not equal to shop.name) |
og:description | Present, page-specific (not equal to shop.name) |
og:image | Present, absolute https URL, resolves to 200 |
twitter:card | summary_large_image |
twitter:title / twitter:description | Present, page-specific |
<link rel="canonical"> | Present, https, self-canonical |
Organization JSON-LD | Present (sitewide) |
WebSite JSON-LD with SearchAction | Present (sitewide) |
BreadcrumbList JSON-LD | Present on all non-homepage |
Product JSON-LD with offers.price, priceCurrency, availability, sku, brand | Required on every PDP |
FAQPage JSON-LD | Required on dedicated FAQ page AND on PDPs/landing pages with Q&A blocks |
Article / TechArticle JSON-LD with author (Person) | Required on blog articles AND tech/methodology pages |
Person JSON-LD with jobTitle, sameAs (LinkedIn) | Required on about/team page for each named team member |
Review / AggregateRating JSON-LD | Required wherever named testimonials exist |
Dataset / methodology citation URL | Required wherever quantified performance claims are made (e.g. "500% more forgiving") |
Execution method (do this exactly, do not improvise):
- Pull the live HTML of one representative URL for each archetype via
curl -sL. - For each URL, extract:
<title>, every<meta>tag, every<script type="application/ld+json">block, the canonical link. - Resolve every URL embedded inside JSON-LD (logo, image, sameAs, author.url, citation). HEAD-check each. Flag any 4xx, 5xx, or malformed pattern (
https://domain//domain/...double-domain,http://mixed-content, missing scheme). - Output
per-template-matrix.mdin the audit folder: rows = archetypes, columns = fields, cells = ✅/❌/⚠️ with a one-line note on the failure. - For every ❌, file an action item in Section 9 with the exact template file or Shopify SEO panel field that needs editing.
Common failures this matrix catches that nothing else does:
- Homepage
<title>andmeta descriptionfalling back toshop.namebecause no SEO title is set in Shopify Admin (Shopify's Online Store > Preferences won't surface this in any other audit). - Logo URL bug
https://domain//domain/cdn/...from concatenatingshop.urlwithasset_url(theasset_urlfilter already returns the full domain). - og:title and twitter:title set to
shop.namebecause the Liquid fallback chain isog_title = page_title | default: shop.name. - FAQPage schema present on PDP accordions but absent on the dedicated
/pages/faqpage. - Product schema with
offersbut missingavailability,sku, orbrand. - Article schema on blog posts with no
authorfield. - Person schema on about pages reduced to
{name: "X"}only — nojobTitle,sameAs,url. - Review schema completely absent on pages with named testimonials.
This grid is what separates a 5.5/10 audit from an 8.5/10 audit. It is mandatory on every run, including A2-Quick mode.
6F. Conversational-query and passage-ranking targeting
- Identify the 10 most commercially valuable conversational queries (natural-language phrasing, often starting with
how,what,which,can,should,is it worth). - For each query, score whether the target page answers it in a self-contained paragraph within the first 300 words.
- Score whether the page carries a direct-answer snippet formatted as a definition, list, or comparison suited to passage ranking.
Output: an AEO presence scorecard, a whitespace map, and a ranked list of content edits that improve citation likelihood across all four engines simultaneously.
Section 7. Backlinks and domain authority
[E]DataForSEO Backlinks API or Ahrefs snapshot for the client domain: referring-domains count, DR / DA, new and lost links over the last 90 days, spam score (flag if above 5).[E]Link velocity trend: referring-domain gain or loss over 12 months.[E]Competitor backlink comparison vs the 3 to 5 competitors from the positioning brief. Surface domains linking to competitors but not the client (ref_domains_competitor - ref_domains_client).[E]Unlinked brand-mention scan via Serper ("{client-brand}" -site:{client-domain}). Each unlinked mention is a reclaim opportunity.[E]Toxic-link flags: spam score thresholds, anchor-text distribution anomalies (commercial anchors above 40%, brand anchors below 30%).[E]10-tier backlink target list for the first 90 days: directories, niche publications, podcast placements, HARO or Featured responses, data-driven PR, digital PR, resource-page outreach, guest-post outlets, broken-link reclaim, competitor-link gap reclaim.[I]GSC Top linking sites report to cross-validate.
Output: a referring-domain scorecard, a ranked reclaim list, and the 10-tier target list.
Section 8. Local and ecommerce-specific
Runs for ecommerce brands (Shopify, WooCommerce, Hydrogen). For service or B2B SaaS brands, replace with Local SEO (NAP consistency, Google Business Profile completeness, local citation audit) or skip entirely.
Ecommerce
[E]Product schema coverage and quality:Productschema on every PDP, required fields (name,image,description,sku,brand,offers.price,offers.priceCurrency,offers.availability,aggregateRatingwhere reviews exist).[E]Collection / category-page structure:CollectionPageschema, unique H1, category description above the product grid, faceted-navigation crawl control (parameters excluded inrobots.txtor handled via canonical).[E]Category breadcrumb schema on every PDP and collection.[I]Google Merchant Center feed health (if access granted): approved / pending / disapproved counts,item_missing_required_attribute,low_image_quality,landing_page_error,price_mismatch. Cross-reference with Shopify product fields.[I]Shopify-specific: sales-channel activation (Google and YouTube ticked on every product), metafield consistency (product type, brand, gender where relevant).
Local (alternative, when applicable)
[E]Google Business Profile completeness: primary category, business hours, photos, Q&A, attributes, service areas.[E]NAP consistency scan across major directories.[E]Local citation audit against top 20 niche and geographic directories.
Output: a product-schema scorecard, a feed-health scorecard (where access exists), and the top 5 catalog-structure fixes.
Section 9. Prioritised Action Items
Every finding in Sections 1 through 8 converts into a candidate initiative. Each initiative is scored on ICE. The ranked list becomes the client's operating plan for the next 30 to 60 days.
9A. The biggest organic lever
Open this section with one to two short paragraphs naming the single biggest lever holding organic revenue back right now. Follow Theory of Constraints logic. Fix this one. The next reveals itself.
Structure:
- The lever. One sentence.
- Evidence. Three bullets pulling the strongest data points from the prior sections. Label each as
[E]or[I]. - The counter-move. One or two sentences on what relieves the constraint.
- Expected organic impact. Quantified in monthly sessions or monthly revenue where attribution allows, with assumptions stated.
- Confidence. High, medium, or low with a reason.
On Path B (new brand, no ranking history), replace with the "Critical path hypothesis":
- The hypothesis. "If we rank for [cluster] targeting [audience], they will [measurable response] because [search intent]."
- Evidence. Pull from Sections 4, 5, 6.
- The experiment. One sentence describing the smallest test that moves the hypothesis from belief to data.
- Effort and third-party costs. Effort in days or weeks. Third-party costs only — tooling subscriptions, paid amplification, software licences. Always close with an explicit clarifier: "EntireCommerce engagement fees are separate and quoted at the engagement scoping stage — none of the dollar figures above are EntireCommerce charges." The audit must not read as if EntireCommerce is offering the work at the listed dollar amounts.
9B. Ranked list
Below the biggest lever, present every other initiative as a flat numbered list ranked by ICE score descending. Group into three buckets:
- This week (top 5 items). Work begins immediately.
- Next 30 days (next 10 items).
- Backlog (everything else).
No P0/P1/P2 labels in the output. The bucket names are the priority signal.
9C. ICE scoring (rubric)
| Axis | Scale | Definition |
|---|---|---|
| Impact | 1 to 10 | Expected organic revenue or traffic lift. 1 = cosmetic. 10 = unlocks or protects more than 20% of organic revenue |
| Confidence | 1 to 10 | How sure the initiative works. 1 = pure guess. 10 = near-certain |
| Effort | 1 to 10 | Inverse scale. 1 = months of work. 10 = ships in under a day |
Score = Impact × Confidence × Effort. Maximum 1000.
9D. Workstream granularity
Each item is a workstream. Sub-tasks live underneath. Example:
### 1. Close the title-and-meta CTR gap on the top 20 commercial pages
- [ ] Pull current titles and metas from GSC
- [ ] Generate CTR-optimised rewrites via `seo_toolkit.py ctr-optimize`
- [ ] Ship rewrites to the CMS
- [ ] Re-score CTR 14 days later
9E. Handoff to actions.md
After writing Section 9, merge the "This week" items into {client_folder}/actions.md as new Next Actions. Respect the caps: P0 max 5, P1 max 10, P2 max 15. Rescore via ICE if any tier is over cap.
9F. 30-60-90 plan (optional summary)
Close Section 9 with an optional 30-60-90 summary for the Word deliverable:
- 30 days. Quick wins from the "This week" and "Next 30 days" buckets.
- 60 days. Content-cluster builds and AEO-extraction rewrites.
- 90 days. Backlink campaign results, authority-signal compounding.
Access-grant checklist
Appears only when one or more Internal sections are stubbed. Supply exact steps.
Template:
## Access-grant checklist
To populate the stubbed sections above, grant EntireCommerce the following access:
- **Google Search Console**: add `nish-claude@gen-lang-client-0913692809.iam.gserviceaccount.com` as Restricted. Unlocks: Section 1 indexation depth, Section 3 cannibalisation confirmation, Section 5 query-level data.
- **Google Analytics 4**: add the same service account as Viewer. Unlocks: Section 5 revenue-weighted keyword prioritisation.
- **Shopify Admin** (Shopify stores only): share a read-scoped custom-app token. Unlocks: Section 1 redirect hygiene, Section 8 product schema and collection hygiene.
- **Microsoft Clarity**: share the project token. Unlocks: CRO overlay on SEO landing pages where rage clicks correlate with ranking risk.
- **Google Merchant Center**: add as user. Unlocks: Section 8 feed health.
Book a call to wire these up: https://entirecommerce.ai/book-a-call
Voice conventions (strict, applied throughout)
- No em-dashes. Use period, comma, or colon.
- No negative parallelisms. Never "X, not Y". Never "rather than X". Never "instead of X". Never "opposite of X". Never "not only X". State the positive.
- Four-word sentence floor. No one- or two-word sentences.
- Spelling: "ecommerce" one word. Never "e-commerce".
- No AI clichés.
- ICP language discipline: "premium DTC", "AOV $500+", "founder-led", "US-market focus".
- Every finding traces to real data. No fabricated numbers. No invented sources.
Voice enforcement gate (mandatory before writing to disk)
Grep the full draft for each banned pattern and rewrite every match:
| Banned pattern | Grep for | Fix |
|---|---|---|
| Em-dash | — (U+2014), -- used as em-dash | Replace with period, comma, or colon |
| Negative parallelism | , not , rather than, instead of, opposite of , not only | Rewrite to state the positive claim alone |
| Negative parallelism (contraction-led, sneaky form) | \bisn't\b, \baren't\b, \bwasn't\b, \bdoesn't\b, \bdon't\b, \bdidn't\b, \bwon't\b, \bcan't\b, \bhasn't\b, \bhaven't\b, plus spelled-out is not, are not, was not, does not, do not, has not, have not | For each match: load-bearing fact (e.g. "AI engines do not run JavaScript") → leave; setup-and-contrast pattern (e.g. "This isn't urgent because X. It's urgent because Y") → drop the negation half, lead with the positive |
| Banned spelling | e-commerce, E-commerce, E-Commerce | Replace with ecommerce |
| AI clichés | Here's the, Here's what, Here's why, Most people, The uncomfortable truth, The brutal truth, The breakthrough | Rewrite |
| Sub-four-word sentence | Standalone 1 to 3 word sentences | Merge or expand |
not is permitted only when genuinely load-bearing. Default to rewriting.
Output formatting: bullets over paragraphs (mandatory for every report and summary)
When this playbook generates a report, executive summary, brief, or any deliverable with multiple findings or actions, the default output structure is bullet lists, not prose paragraphs.
- Any time the text enumerates three or more items (findings, fixes, gaps, recommendations, criteria, things-they-do-better) those items become separate bullets. Do not bury enumerated points inside a paragraph.
- Reserve paragraphs for genuinely connected reasoning where prose flow carries the argument. Cap those paragraphs at four sentences.
- Multi-clause sentences with semi-colons or commas separating discrete points are a signal to break into bullets.
- The canonical pattern is: heading + one lead-in sentence + bullet list. The lead-in frames the list; the bullets carry the load.
- Reports ship scannable. The reader skims on a phone between meetings.
Run a paragraph-density check before commit: any paragraph longer than four sentences, or containing three or more enumerated items inside prose, is a candidate for bullet conversion. Convert it.
Applies to: full reports, executive summaries, email overviews, weekly reviews, dashboards, action-item lists, and any narrative deliverable. Does not apply to: opening framing paragraphs (one paragraph allowed), conclusion paragraphs (one or two allowed), or quoted founder/buyer voice.
Word-doc export (pandoc)
After the voice gate passes on the markdown, run pandoc to generate the .docx:
pandoc {client_folder}/docs/seo-audit/{YYYY-MM-DD}/seo-audit-report.md \
-o {client_folder}/docs/seo-audit/{YYYY-MM-DD}/seo-audit-report.docx \
--toc \
--number-sections
If a branded reference template exists at EntireCommerce/playbooks/seo-audit-template.docx, add --reference-doc=... for brand styling. Template is optional.
Pandoc install check: which pandoc. If missing: brew install pandoc.
Execution cadence
- Week 1. The SEO + AEO audit is a week-1 deliverable when organic is in scope. "This week" items begin immediately.
- Ongoing. The weekly review playbook at
playbooks/weekly-marketing-review.mdreads the audit's biggest lever and re-ranks Section 9 every Monday. - Quarterly refresh. Re-run the audit quarterly (or after any major Google or LLM-provider algorithm shift).
Reference material
gtm-audit.md: parent playbook this audit deepens.scripts/seo/seo_toolkit.py: runnable SEO toolkit (ctr-optimize,aeo-serp,site-audit).scripts/seo/dataforseo_client.py,serper_client.py,keywords_everywhere_client.py,gsc_client.py: individual API clients.
Acceptance criteria
- Runs end-to-end from a single Claude Code prompt. No manual copy-paste between steps.
- Detects the correct mode (A1, A2, A2-thin, B1, B2) and adapts the report.
- Honours the two-layer tool gate.
- Skips missing Internal tools cleanly, stubs the affected sections with the access-grant CTA, and files the unlocks in the Access-grant checklist with exact step-by-step instructions for the brand owner.
- Writes a single consolidated report to
{client_folder}/docs/seo-audit/{YYYY-MM-DD}/seo-audit-report.mdwith all eight area sections plus the Prioritised Action Items section plus the Access-grant checklist. - Stores supplementary evidence under
{client_folder}/data/seo-audit/{YYYY-MM-DD}/and cross-links from the report. - Every finding is ICE-scored. "This week" items flow directly into
{client_folder}/actions.md. - No P0/P1/P2 labels in the report.
- Voice conventions honoured end-to-end. Voice gate run before write. Zero banned-pattern matches in final output.