Skip to content

Practical Agent Readiness Audit Priority

The highest-leverage agent-readiness work is raw fetchability, crawler policy, extractable passages, sitemaps, curated LLM files, Markdown alternatives, parity, and real capabilities.

Use this page when you need an 80/20 audit sequence instead of a scanner-score chase.

The weights are a planning heuristic; brochure sites, docs sites, API products, SaaS apps, and commerce flows need different capability expectations.

This is the practical 80/20 checklist for making a site easier for agents and AI search systems to find, fetch, parse, cite, and use.

The weights split the high-leverage 80% into 100 points. A 10% item below means roughly 8 points of total value in this model. Treat the numbers as a planning heuristic, not a universal score. Site type still matters: brochure sites, documentation sites, API products, SaaS apps, and commerce flows should not be audited against the same capability expectations.

Pie chart showing the relative weight of each practical agent readiness audit item

#ItemWeight of the 80%Audit tool or commandInteractive guidanceBest resource
1Raw fetchability of key pages18%curl -sSIL https://example.com/page and curl -sSL https://example.com/page | head -80; also run Cloudflare URL Scanner.Pass when key pages return clean 200 HTML, sane redirects, no WAF or bot block, and useful main content without login or app interaction.Cloudflare Agent Readiness
2robots.txt with crawler lanes14%curl -sS https://example.com/robots.txtSeparate search/index, user-triggered fetch, and training crawler policy. Do not block search or user-fetch bots when AI-search visibility is the goal.RFC 9309, OpenAI crawlers, Anthropic crawler controls, Perplexity crawlers
3Passage extractability14%Manual source review, browser inspection, and npx lighthouse@latest https://example.com --only-categories=seo,accessibilityPass when important pages have direct-answer sections, clean heading hierarchy, short focused paragraphs, concrete facts, dates, sources, and low ambiguity.Vercel Agent Readability
4Sitemap quality10%curl -sS https://example.com/sitemap.xml; xmllint --noout sitemap.xml for local files.Pass when the sitemap lists canonical high-value URLs, removes redirects and junk pages, includes accurate lastmod, and is linked from robots.txt.Sitemaps protocol
5/llms.txt curated map10%curl -sS https://example.com/llms.txt; for docs sites, also run npx afdocs check https://docs.example.com.Pass when it is short, Markdown, root-hosted, maintained, and points to canonical useful pages with descriptions. Fail giant dumps and stale link lists.llms.txt proposal, Agent-Friendly Documentation Spec
6Markdown alternatives for high-value pages10%curl -sSI -H 'Accept: text/markdown' https://example.com/page; also test /page/index.md or /page.md.Pass when important docs, service, product, policy, or reference pages have clean Markdown preserving headings, links, dates, examples, and source references.Cloudflare Markdown for Agents, Cloudflare AI consumability
7Schema, canonical, and visible-content parity8%Google Rich Results Test, Schema Markup Validator, and source checks with curl.Pass when schema matches visible content, canonical URLs are stable, and dates, authors, methods, source links, and business facts are visible to humans too.Google AI features and your website
8Real capability surfaces only7%isitagentready.com, Cloudflare URL Scanner, OpenAPI validators, curl https://example.com/.well-known/api-catalog, curl https://example.com/.well-known/mcp/server-card.json.Mark as not applicable unless there is a real API, tool, auth, agent, or commerce flow. Never publish fake MCP, API, OAuth, WebMCP, or commerce metadata for points.RFC 9727 API Catalog, RFC 9728 OAuth Protected Resource Metadata, Lighthouse agentic browsing scoring
9/llms-full.txt or segmented context files5%curl -sS https://example.com/llms-full.txt | wc -c; check generation source and freshness.Use for docs, product knowledge bases, API docs, or evergreen reference corpora. Prefer generated and segmented files over one huge hand-maintained file.Cloudflare Agent Readiness, llms.txt proposal
10HTTP Link headers for machine discovery4%curl -sSI https://example.com/ | rg -i '^link:'Useful when real resources exist: sitemap, llms.txt, API catalog, MCP card, agent skills, or docs. Nice-to-have for simple sites, not a substitute for clean pages.Cloudflare Agent Readiness

Use the same interaction for every line item:

  1. Does this apply? yes, no, or unsure.
  2. What evidence exists? URL, status, headers, body excerpt, scanner output, or validation result.
  3. What should we do? fix now, defer, not applicable, or do not implement.
  4. How is it maintained? manual edit, generated artifact, CI check, scanner, or log monitor.

If time is short, do only these first:

  1. Raw fetchable HTML for key pages.
  2. robots.txt with sitemap and crawler-lane policy.
  3. Clean sitemap.
  4. Extractable, citable page content.
  5. Curated llms.txt.
  6. Markdown alternatives for the pages agents are most likely to need.

That bundle captures most of the practical value without creating fake protocol surfaces or overfitting to one scanner.