Practical Agent Readiness Audit Priority

Bottom line

The highest-leverage agent-readiness work is raw fetchability, crawler policy, extractable passages, sitemaps, curated LLM files, Markdown alternatives, parity, and real capabilities.

Is this relevant?

Use this page when you need an 80/20 audit sequence instead of a scanner-score chase.

Caveat

The weights are a planning heuristic; brochure sites, docs sites, API products, SaaS apps, and commerce flows need different capability expectations.

This is the practical 80/20 checklist for making a site easier for agents and AI search systems to find, fetch, parse, cite, and use.

The weights split the high-leverage 80% into 100 points. A 10% item below means roughly 8 points of total value in this model. Treat the numbers as a planning heuristic, not a universal score. Site type still matters: brochure sites, documentation sites, API products, SaaS apps, and commerce flows should not be audited against the same capability expectations.

Priority chart

Pie chart showing the relative weight of each practical agent readiness audit item

Priority table

#	Item	Weight of the 80%	Audit tool or command	Interactive guidance	Best resource
1	Raw fetchability of key pages	18%	`curl -sSIL https://example.com/page` and `curl -sSL https://example.com/page \| head -80`; also run Cloudflare URL Scanner.	Pass when key pages return clean `200` HTML, sane redirects, no WAF or bot block, and useful main content without login or app interaction.	Cloudflare Agent Readiness
2	`robots.txt` with crawler lanes	14%	`curl -sS https://example.com/robots.txt`	Separate search/index, user-triggered fetch, and training crawler policy. Do not block search or user-fetch bots when AI-search visibility is the goal.	RFC 9309, OpenAI crawlers, Anthropic crawler controls, Perplexity crawlers
3	Passage extractability	14%	Manual source review, browser inspection, and `npx lighthouse@latest https://example.com --only-categories=seo,accessibility`	Pass when important pages have direct-answer sections, clean heading hierarchy, short focused paragraphs, concrete facts, dates, sources, and low ambiguity.	Vercel Agent Readability
4	Sitemap quality	10%	`curl -sS https://example.com/sitemap.xml`; `xmllint --noout sitemap.xml` for local files.	Pass when the sitemap lists canonical high-value URLs, removes redirects and junk pages, includes accurate `lastmod`, and is linked from `robots.txt`.	Sitemaps protocol
5	`/llms.txt` curated map	10%	`curl -sS https://example.com/llms.txt`; for docs sites, also run `npx afdocs check https://docs.example.com`.	Pass when it is short, Markdown, root-hosted, maintained, and points to canonical useful pages with descriptions. Fail giant dumps and stale link lists.	`llms.txt` proposal, Agent-Friendly Documentation Spec
6	Markdown alternatives for high-value pages	10%	`curl -sSI -H 'Accept: text/markdown' https://example.com/page`; also test `/page/index.md` or `/page.md`.	Pass when important docs, service, product, policy, or reference pages have clean Markdown preserving headings, links, dates, examples, and source references.	Cloudflare Markdown for Agents, Cloudflare AI consumability
7	Schema, canonical, and visible-content parity	8%	Google Rich Results Test, Schema Markup Validator, and source checks with `curl`.	Pass when schema matches visible content, canonical URLs are stable, and dates, authors, methods, source links, and business facts are visible to humans too.	Google AI features and your website
8	Real capability surfaces only	7%	`isitagentready.com`, Cloudflare URL Scanner, OpenAPI validators, `curl https://example.com/.well-known/api-catalog`, `curl https://example.com/.well-known/mcp/server-card.json`.	Mark as `not applicable` unless there is a real API, tool, auth, agent, or commerce flow. Never publish fake MCP, API, OAuth, WebMCP, or commerce metadata for points.	RFC 9727 API Catalog, RFC 9728 OAuth Protected Resource Metadata, Lighthouse agentic browsing scoring
9	`/llms-full.txt` or segmented context files	5%	`curl -sS https://example.com/llms-full.txt \| wc -c`; check generation source and freshness.	Use for docs, product knowledge bases, API docs, or evergreen reference corpora. Prefer generated and segmented files over one huge hand-maintained file.	Cloudflare Agent Readiness, `llms.txt` proposal
10	HTTP `Link` headers for machine discovery	4%	`curl -sSI https://example.com/ \| rg -i '^link:'`	Useful when real resources exist: sitemap, `llms.txt`, API catalog, MCP card, agent skills, or docs. Nice-to-have for simple sites, not a substitute for clean pages.	Cloudflare Agent Readiness

Audit flow

Use the same interaction for every line item:

Does this apply? yes, no, or unsure.
What evidence exists? URL, status, headers, body excerpt, scanner output, or validation result.
What should we do? fix now, defer, not applicable, or do not implement.
How is it maintained? manual edit, generated artifact, CI check, scanner, or log monitor.

Cut line

If time is short, do only these first:

Raw fetchable HTML for key pages.
robots.txt with sitemap and crawler-lane policy.
Clean sitemap.
Extractable, citable page content.
Curated llms.txt.
Markdown alternatives for the pages agents are most likely to need.

That bundle captures most of the practical value without creating fake protocol surfaces or overfitting to one scanner.