Practical Agent Readiness Audit Priority
Practical Agent Readiness Audit Priority
Section titled “Practical Agent Readiness Audit Priority”The highest-leverage agent-readiness work is raw fetchability, crawler policy, extractable passages, sitemaps, curated LLM files, Markdown alternatives, parity, and real capabilities.
Use this page when you need an 80/20 audit sequence instead of a scanner-score chase.
The weights are a planning heuristic; brochure sites, docs sites, API products, SaaS apps, and commerce flows need different capability expectations.
This is the practical 80/20 checklist for making a site easier for agents and AI search systems to find, fetch, parse, cite, and use.
The weights split the high-leverage 80% into 100 points. A 10% item below means roughly 8 points of total value in this model. Treat the numbers as a planning heuristic, not a universal score. Site type still matters: brochure sites, documentation sites, API products, SaaS apps, and commerce flows should not be audited against the same capability expectations.
Priority chart
Section titled “Priority chart”Priority table
Section titled “Priority table”| # | Item | Weight of the 80% | Audit tool or command | Interactive guidance | Best resource |
|---|---|---|---|---|---|
| 1 | Raw fetchability of key pages | 18% | curl -sSIL https://example.com/page and curl -sSL https://example.com/page | head -80; also run Cloudflare URL Scanner. | Pass when key pages return clean 200 HTML, sane redirects, no WAF or bot block, and useful main content without login or app interaction. | Cloudflare Agent Readiness |
| 2 | robots.txt with crawler lanes | 14% | curl -sS https://example.com/robots.txt | Separate search/index, user-triggered fetch, and training crawler policy. Do not block search or user-fetch bots when AI-search visibility is the goal. | RFC 9309, OpenAI crawlers, Anthropic crawler controls, Perplexity crawlers |
| 3 | Passage extractability | 14% | Manual source review, browser inspection, and npx lighthouse@latest https://example.com --only-categories=seo,accessibility | Pass when important pages have direct-answer sections, clean heading hierarchy, short focused paragraphs, concrete facts, dates, sources, and low ambiguity. | Vercel Agent Readability |
| 4 | Sitemap quality | 10% | curl -sS https://example.com/sitemap.xml; xmllint --noout sitemap.xml for local files. | Pass when the sitemap lists canonical high-value URLs, removes redirects and junk pages, includes accurate lastmod, and is linked from robots.txt. | Sitemaps protocol |
| 5 | /llms.txt curated map | 10% | curl -sS https://example.com/llms.txt; for docs sites, also run npx afdocs check https://docs.example.com. | Pass when it is short, Markdown, root-hosted, maintained, and points to canonical useful pages with descriptions. Fail giant dumps and stale link lists. | llms.txt proposal, Agent-Friendly Documentation Spec |
| 6 | Markdown alternatives for high-value pages | 10% | curl -sSI -H 'Accept: text/markdown' https://example.com/page; also test /page/index.md or /page.md. | Pass when important docs, service, product, policy, or reference pages have clean Markdown preserving headings, links, dates, examples, and source references. | Cloudflare Markdown for Agents, Cloudflare AI consumability |
| 7 | Schema, canonical, and visible-content parity | 8% | Google Rich Results Test, Schema Markup Validator, and source checks with curl. | Pass when schema matches visible content, canonical URLs are stable, and dates, authors, methods, source links, and business facts are visible to humans too. | Google AI features and your website |
| 8 | Real capability surfaces only | 7% | isitagentready.com, Cloudflare URL Scanner, OpenAPI validators, curl https://example.com/.well-known/api-catalog, curl https://example.com/.well-known/mcp/server-card.json. | Mark as not applicable unless there is a real API, tool, auth, agent, or commerce flow. Never publish fake MCP, API, OAuth, WebMCP, or commerce metadata for points. | RFC 9727 API Catalog, RFC 9728 OAuth Protected Resource Metadata, Lighthouse agentic browsing scoring |
| 9 | /llms-full.txt or segmented context files | 5% | curl -sS https://example.com/llms-full.txt | wc -c; check generation source and freshness. | Use for docs, product knowledge bases, API docs, or evergreen reference corpora. Prefer generated and segmented files over one huge hand-maintained file. | Cloudflare Agent Readiness, llms.txt proposal |
| 10 | HTTP Link headers for machine discovery | 4% | curl -sSI https://example.com/ | rg -i '^link:' | Useful when real resources exist: sitemap, llms.txt, API catalog, MCP card, agent skills, or docs. Nice-to-have for simple sites, not a substitute for clean pages. | Cloudflare Agent Readiness |
Audit flow
Section titled “Audit flow”Use the same interaction for every line item:
- Does this apply?
yes,no, orunsure. - What evidence exists? URL, status, headers, body excerpt, scanner output, or validation result.
- What should we do?
fix now,defer,not applicable, ordo not implement. - How is it maintained? manual edit, generated artifact, CI check, scanner, or log monitor.
Cut line
Section titled “Cut line”If time is short, do only these first:
- Raw fetchable HTML for key pages.
robots.txtwith sitemap and crawler-lane policy.- Clean sitemap.
- Extractable, citable page content.
- Curated
llms.txt. - Markdown alternatives for the pages agents are most likely to need.
That bundle captures most of the practical value without creating fake protocol surfaces or overfitting to one scanner.