Agent-Friendly Documentation Spec
Table of Contents
| Status | Draft |
| Version | 0.1.0 |
| Date | 2026-02-22 |
| Author | Dachary Carey + community contributors |
| URL | https://agentdocsspec.com |
| Repository | https://github.com/agent-ecosystem/agent-docs-spec |
Abstract #
Documentation sites are increasingly consumed by coding agents rather than
human readers, but most sites are not built for this access pattern. Agents
hit truncation limits, get walls of CSS instead of content, can’t follow
cross-host redirects, and don’t know about emerging discovery mechanisms like
llms.txt. This spec defines 21 checks across 8 categories that evaluate how
well a documentation site serves agent consumers. It is grounded in empirical
observation of real agent workflows and is intended as a shared standard for
documentation teams, tool builders, and platform providers.
Scope #
This spec targets coding agents that fetch documentation during real-time development workflows. These are tools like Claude Code, Cursor, GitHub Copilot, and similar IDE-integrated or CLI-based agents that a developer uses while writing code. The agent fetches a docs page, extracts information, and uses it to complete a task, all in a single session.
This spec does not target:
- Training crawlers (GPTBot, ClaudeBot, etc.) that scrape content for model training. These have different access patterns, different user-agents, and different concerns. See Appendix B.
- Answer engines (Perplexity, Google AI Overviews, ChatGPT search) that retrieve content to generate responses to user queries. These systems have their own retrieval pipelines that may or may not resemble the web fetch pipelines described here.
- RAG pipelines that pre-index documentation into vector stores. These ingest content at build time, not at query time, so truncation limits and real-time fetch behavior are less relevant.
The findings and checks in this spec are grounded in empirical observation of
coding agents. Some recommendations (like providing llms.txt and serving
markdown) will benefit other consumers too, but the pass/warn/fail criteria
are calibrated for the coding agent use case.
Background #
Agents don’t use docs like humans. They retrieve URLs from training data rather
than navigating table-of-contents structures. They struggle with HTML-heavy
pages, silently lose content to truncation, and don’t know about emerging
standards like llms.txt unless explicitly told. These checks codify the
patterns that empirically help or hinder agent access to documentation content.
Terminology #
- Agent: An LLM operating in an agentic coding workflow (e.g., Claude Code, Cursor, Copilot) that fetches and consumes documentation as part of a development task. See Scope for what this spec does and does not cover.
- Web fetch pipeline: The chain of processing between “agent requests a URL” and “model sees content.” Typically involves HTTP fetch, HTML-to-markdown conversion, truncation, and sometimes a summarization model.
- Trusted site: A domain hardcoded into an agent platform’s web fetch implementation that receives more favorable processing (e.g., bypassing summarization).
- Truncation: The silent removal of content that exceeds a platform’s size limit. The agent receives partial content with no indication that anything was cut. See Appendix A for known limits by platform.
Conventions #
This spec uses the following language to distinguish between requirements and recommendations:
- Must / Required: The item is an absolute requirement of the spec. Used sparingly; most checks in this spec are recommendations rather than hard requirements, because agent-friendliness is a spectrum.
- Should / Recommended: The item is a strong recommendation. There may be valid reasons to deviate, but the implications should be understood.
- May / Optional: The item is genuinely optional. Implementing it provides additional benefit but omitting it is not a deficiency.
Sections of this spec are either normative (defining checks and their pass/warn/fail criteria) or informational (providing context, evidence, and recommendations). The distinction is noted where it matters:
- Normative sections: Category 1-8 check definitions, Checks Summary table.
- Informational sections: Background, Scope, Start Here, “How Agents Get Content”, “Who Actually Uses llms.txt?”, Progressive Disclosure recommendation, “Making Private Docs Agent-Accessible”, Appendices.
The progressive disclosure pattern for llms.txt is a recommendation from
this spec, not a normative requirement. Sites that keep their llms.txt under
50,000 characters don’t need it.
Start Here: Top Recommendations #
If you’re a documentarian and can only do a few things, start with these. They are ordered by impact based on observed agent behavior:
-
Create an
llms.txtthat fits in a single agent fetch (under 50K characters). This is the single highest-impact action. Agents that find anllms.txtnavigate documentation dramatically better. If your docs set is large, use the nested pattern to keep each file under the limit. Checks:llms-txt-exists,llms-txt-size -
Serve markdown versions of your pages. Either via
.mdURL variants or content negotiation. Markdown is what agents actually want; HTML conversion is lossy and unpredictable. Checks:markdown-url-support,content-negotiation -
Keep pages under 50,000 characters of content. If a page has tabbed or dropdown content that serializes into a massive blob, break it into separate pages or ensure the markdown version stays under the limit. Checks:
page-size-markdown,page-size-html,tabbed-content-serialization -
Put a pointer to your
llms.txtat the top of every docs page. A simple blockquote directive that tells agents where to find the documentation index. Anthropic does this; it works. Check:llms-txt-directive -
Don’t break your URLs. If you must move content, use same-host HTTP redirects. Avoid cross-host redirects, JavaScript redirects, and soft 404s. Checks:
http-status-codes,redirect-behavior -
Monitor your agent-facing resources. Treat
llms.txtand markdown endpoints like any other production surface: check freshness, verify content parity with HTML, and ensure cache headers allow timely updates. Checks:llms-txt-freshness,markdown-content-parity,cache-header-hygiene
Spec Structure #
Each check has:
- ID: A short identifier (e.g.,
llms-txt-exists). - Category: The area of agent-friendliness it evaluates.
- What it checks: A description of what the check evaluates.
- Why it matters: The observed agent behavior that motivates the check.
- Result levels: What constitutes a pass, warn, or fail.
- Automation: Whether the check can be fully automated, partially automated (heuristic), or is advisory only.
Check Dependencies #
Some checks depend on the results of others:
llms-txt-valid,llms-txt-size,llms-txt-links-resolve, andllms-txt-links-markdownonly run ifllms-txt-existspasses.page-size-markdownonly runs ifmarkdown-url-supportorcontent-negotiationpasses (the site must serve markdown for this check to apply).section-header-qualityis most relevant whentabbed-content-serializationdetects tabbed content.markdown-code-fence-validityonly runs ifmarkdown-url-supportorcontent-negotiationpasses (the site must serve markdown for this check to apply). It also runs against any discoveredllms.txtfiles.llms-txt-freshnessonly runs ifllms-txt-existspasses.auth-alternative-accessonly runs ifauth-gate-detectionreturns warn or fail (the site must have auth-gated content for alternative access paths to be relevant).markdown-content-parityonly runs ifmarkdown-url-supportorcontent-negotiationpasses (the site must serve markdown for this check to apply).
Implementations should run checks in category order (1 through 7) and skip dependent checks when their prerequisites fail.
A Note on Responsible Use #
This spec describes checks that involve making HTTP requests to documentation
sites. Implementations should be respectful of the sites being evaluated:
introduce delays between requests, cap concurrent connections, honor
Retry-After headers, and avoid overwhelming sites with traffic. The goal is
to help documentation teams improve agent accessibility, not to load-test
their infrastructure.
Category 1: llms.txt #
These checks evaluate whether the site provides an llms.txt file and whether
that file is useful to agents.
Location Discovery #
The llmstxt.org proposal specifies that llms.txt
should be at the root path (/llms.txt), mirroring robots.txt and
sitemap.xml. In practice, the location varies significantly across sites:
| Site | Root /llms.txt |
/docs/llms.txt |
Notes |
|---|---|---|---|
| MongoDB | 200 | 200 | Both locations, different content |
| Neon | 200 | 200 | Both locations |
| Stripe | 200 | 301 -> docs.stripe.com | Root + docs subdomain |
| Vercel | 200 | 308 -> root | Root only, /docs redirects |
| React | 200 | – | Root only |
| GitHub Docs | 200 | – | Root only |
| Claude Code | 302 -> product page | 200 | /docs only; root is not docs |
| Anthropic (old) | 301 -> 404 | – | Moved domain, redirect breaks |
The proposal does not address whether sites should serve llms.txt at subpaths,
or whether a site with docs at /docs/ should place it at /docs/llms.txt vs
/llms.txt. In practice, both patterns exist. Implementations should check
multiple candidate locations.
Discovery algorithm: Given a base URL, check for llms.txt at:
{base_url}/llms.txt(the exact URL the user provided, plus llms.txt){origin}/llms.txt(site root, per the proposal){origin}/docs/llms.txt(common docs subpath)
Where {origin} is the scheme + host of the base URL, and {base_url} is
the full URL the user provided (which might be https://example.com/docs or
https://example.com or https://docs.example.com). Duplicate URLs are
deduplicated before checking.
For each location, record whether llms.txt exists and whether the response
involved a redirect (and if so, what kind). All subsequent llms.txt checks run
against every discovered llms.txt file.
llms-txt-exists #
- What it checks: Whether
llms.txtis discoverable at any of the candidate locations described above. - Why it matters:
llms.txtwas the single most effective discovery mechanism observed. When agents found one, it fundamentally changed their ability to navigate a documentation site. Agents don’t know to look forllms.txtby default, but when pointed at one, they treat it as a primary navigation resource. - Result levels:
- Pass:
llms.txtexists at one or more candidate locations, returning 200 with text content (direct or after same-host redirect). - Warn:
llms.txtexists but is only reachable via cross-host redirect (agents may not follow it). - Fail:
llms.txtnot found at any candidate location.
- Pass:
- Automation: Full.
- Report details: List all candidate URLs checked and their status
(200, 404, redirect chain). When multiple locations return
llms.txt, note whether they serve the same or different content.
llms-txt-valid #
- What it checks: Whether the
llms.txtfollows the structure described in the llmstxt.org proposal. The proposal specifies:- An H1 with the project/site name.
- A blockquote with a short summary.
- H2-delimited sections containing markdown link lists.
- Each link entry:
[name](url)optionally followed by: description. - An optional H2 “Optional” section for secondary content.
- Optional companion file
llms-full.txtwith complete content.
- Why it matters: A well-structured
llms.txtgives agents a reliable map of the documentation. Inconsistent implementations reduce its value. That said, even a non-standardllms.txtthat contains useful links is better than nothing. - Result levels:
- Pass: Follows the proposed structure with H1, summary blockquote, and heading-delimited link sections.
- Warn: Contains parseable markdown links but doesn’t follow the proposed structure (still useful, just non-standard).
- Fail: Exists but contains no parseable links, or is empty.
- Automation: Full.
- Checks in detail:
- H1 present (first line starts with
#). - Blockquote summary present (line starting with
>). - At least one heading-delimited section with markdown links.
- Links follow
[name](url)format. - Optional: check for
llms-full.txtcompanion file.
- H1 present (first line starts with
- Notes on heading levels: The llmstxt.org proposal specifies H2 (
##) for section delimiters. In practice, some implementations (notably MongoDB) use H1 (#) for sections instead. Implementations should accept any heading level for section delimiters when evaluating structure. The important thing is that sections exist and contain parseable links, not that they use a specific heading level.
llms-txt-links-resolve #
- What it checks: Whether the URLs listed in
llms.txtactually resolve (return 200). - Why it matters: A stale
llms.txtwith broken links is worse than nollms.txtat all. It sends agents down dead ends with high confidence. - Result levels:
- Pass: All links resolve (200, following same-host redirects).
- Warn: >90% of links resolve.
- Fail: <=90% of links resolve.
- Automation: Full.
- Notes: Requires making HTTP requests to each URL. For large files, implementations may choose to test a random subset rather than every link.
llms-txt-size #
-
What it checks: The character count of the
llms.txtfile, and whether it exceeds the truncation limits of known agent web fetch pipelines. -
Why it matters: An
llms.txtthat exceeds an agent’s truncation limit defeats its own purpose. The agent sees only a fraction of the index and may miss the section it needs entirely. This is the same truncation problem that affects documentation pages, but arguably worse becausellms.txtis supposed to be the solution to discovery.Real-world sizes vary enormously:
Site Size Links Notes MongoDB /docs/llms.txt4.56 MB 21,891 Every version of every product Vercel 287 KB ~3,000 Single file Stripe 89 KB ~1,000 Single file Neon 75 KB ~600 Points to .md URLs React 14 KB ~150 Single file Claude Code 11 KB ~60 Small, focused GitHub Docs 2 KB ~30 Small index MongoDB /llms.txt(root)1.5 KB 6 Top-level index only Claude Code’s web fetch pipeline truncates at ~100KB. A 4.56MB file means the agent sees roughly 2% of it. Even Vercel’s 287KB file would be heavily truncated. Only the files under ~100KB are reliably consumable in their entirety by current agent implementations.
-
Result levels:
- Pass: Under 50,000 characters (fits comfortably within all known truncation limits, even accounting for overhead).
- Warn: Between 50,000 and 100,000 characters (fits within Claude Code’s limit but may not fit others; consider splitting).
- Fail: Over 100,000 characters (will be truncated by Claude Code and likely all other agent platforms).
-
Automation: Full.
-
Recommendation: See Progressive Disclosure for Large Documentation Sets below.
llms-txt-links-markdown #
- What it checks: Whether the URLs in
llms.txtpoint to markdown content (.mdextension in the URL, or response withContent-Type: text/markdown). - Why it matters: Markdown content is dramatically more useful to agents than
HTML. An
llms.txtthat points agents to HTML pages misses an opportunity to deliver content in the most agent-friendly format. The best implementations (like Neon’s) point to.mdURLs that serve clean markdown directly. - Result levels:
- Pass: All or most links point to markdown content.
- Warn: Links point to HTML, but markdown versions are available (detected
by trying
.mdvariants of the URLs). - Fail: Links point to HTML and no markdown alternatives are detected.
- Automation: Full.
Progressive Disclosure for Large Documentation Sets #
The llmstxt.org proposal does not address what to do when a documentation site
is too large for a single llms.txt file to fit within agent truncation limits.
In practice, large documentation sets (like MongoDB’s, with 185 products/versions
and 21,891 links) produce llms.txt files that are orders of magnitude beyond
what any current agent can consume in a single fetch.
Who Actually Uses llms.txt? #
The original framing of llms.txt drew analogies to robots.txt and
sitemap.xml, suggesting it would serve AI crawlers gathering training data.
The evidence shows this hasn’t happened:
- An audit of 1,000 domains over 30 days found zero visits to
llms.txtfrom GPTBot, ClaudeBot, or PerplexityBot (Longato, August 2025). - A 90-day study tracking 62,100+ AI bot visits found only 84 requests
(0.1%) to
/llms.txt, roughly 3x fewer visits than an average content page (OtterlyAI GEO Study). - John Mueller from Google stated directly: “no AI system currently uses llms.txt.”
Training crawlers don’t use llms.txt because they have their own
discovery mechanisms (sitemaps, link following, pre-built datasets) and
probing /llms.txt on every domain would waste crawl budget for an
unestablished standard.
The real consumers of llms.txt are agents in real-time workflows:
a developer’s coding assistant fetching documentation to verify an API
pattern, an agent following a directive on a docs page that points it to
llms.txt, or a user explicitly handing their agent an llms.txt URL as
a discovery starting point. These are fetch-once, use-now interactions
subject to the truncation limits of web fetch pipelines.
This distinction matters for our recommendation. A progressive disclosure
pattern that splits llms.txt into nested files has no practical impact on
crawler consumption (since crawlers aren’t consuming it). It directly
benefits the agent use case, which is where llms.txt actually provides
value today.
Recommendation #
We recommend a nested llms.txt pattern for progressive disclosure:
Structure #
A root llms.txt serves as a table of contents, listing the major sections
of the documentation with links to section-level llms.txt files. Each
section-level file contains the actual page links for that section.
# MongoDB Documentation
> MongoDB is the leading document database. This index covers all MongoDB
> products, drivers, and tools documentation.
## Products
- [Atlas](https://www.mongodb.com/docs/atlas/llms.txt): MongoDB Atlas cloud database
- [Atlas CLI](https://www.mongodb.com/docs/atlas-cli/llms.txt): Command-line interface for Atlas
- [Compass](https://www.mongodb.com/docs/compass/llms.txt): GUI for MongoDB
- [MongoDB Server](https://www.mongodb.com/docs/manual/llms.txt): Server documentation
## Drivers
- [Python Driver](https://www.mongodb.com/docs/drivers/pymongo/llms.txt): PyMongo driver
- [Node.js Driver](https://www.mongodb.com/docs/drivers/node/llms.txt): Node.js driver
- [Java Driver](https://www.mongodb.com/docs/drivers/java/llms.txt): Java sync and reactive drivers
Each linked llms.txt then contains the actual page listings for that product
or driver, scoped to the current version (or with a small number of version
variants).
Design Principles #
-
The root
llms.txtshould fit in a single agent fetch. Target under 50,000 characters. This is the entry point that agents will discover first, and it must be fully consumable. It should contain enough descriptive context for an agent to identify which section-level file to fetch next. -
Section-level files should also fit in a single agent fetch. If a section is still too large (e.g., a product with hundreds of pages across many versions), consider further nesting or limiting the index to the current version only.
-
Version sprawl is the primary size driver. The MongoDB
/docs/llms.txtlists every version of every product. Linking to every historical version in the index provides diminishing returns for agents, who almost always want the current version. Historical versions could be listed in a separatellms-versions.txtor under the “Optional” H2 section that the proposal already defines for secondary content. -
Links between levels should use absolute URLs. An agent following a link from root
llms.txtto a sectionllms.txtneeds to resolve it without ambiguity. -
Each
llms.txtshould be self-describing. Include the H1 and blockquote summary at every level so an agent landing on a section-level file (via direct URL from training data, for example) has enough context to understand what it’s looking at.
Compatibility Note #
This nested pattern is a recommendation from this spec, not part of the
llmstxt.org proposal as of February 2026. It is fully compatible with the
existing proposal (which doesn’t prohibit linking to other llms.txt files)
but would benefit from formal standardization. The proposal’s existing
“Optional” H2 section could be leveraged for secondary/versioned content, but
the nesting pattern goes further by distributing content across multiple files.
Category 2: Markdown Availability #
These checks evaluate whether the site serves documentation in markdown format, which agents consume far more effectively than HTML.
markdown-url-support #
- What it checks: Whether appending
.mdto documentation page URLs returns valid markdown content. - Why it matters: Agents work dramatically better with markdown than HTML. The HTML-to-markdown conversion in web fetch pipelines is lossy and unpredictable. Sites that serve markdown directly bypass conversion issues entirely. However, agents don’t discover this pattern on their own; it needs to be signaled.
- Result levels:
- Pass:
.mdURLs return valid markdown with 200 status. - Warn: Some pages support
.mdbut not consistently. - Fail:
.mdURLs return errors or HTML.
- Pass:
- Automation: Full. Test against a sample of page URLs (from
llms.txt, sitemap, or user-provided list).
content-negotiation #
- What it checks: Whether the server responds to
Accept: text/markdownwith markdown content and an appropriateContent-Typeheader. - Why it matters: Some agents (Claude Code, Cursor, OpenCode) send
Accept: text/markdownas their preferred content type. If the server honors this, the agent gets clean markdown without needing to know about.mdURL patterns. Most agents don’t request markdown, but the ones that do should get it. - Result levels:
- Pass: Server returns markdown content with
Content-Type: text/markdownwhen requested. - Warn: Server returns markdown content but with incorrect
Content-Type. - Fail: Server ignores the
Acceptheader and returns HTML regardless.
- Pass: Server returns markdown content with
- Automation: Full.
Category 3: Page Size and Truncation Risk #
These checks evaluate whether page content fits within the processing limits of agent web fetch pipelines. Truncation is silent: the agent doesn’t know it’s working with partial data.
How Agents Get Content #
Not all agents see the same thing. The format an agent receives depends on the request it makes and the server’s response:
-
Agents that request markdown (Claude Code, Cursor, OpenCode send
Accept: text/markdown). If the server honors this and returns markdown, the agent gets clean content. If the server also returnsContent-Type: text/markdownand the content is under 100K characters, Claude Code bypasses its summarization model entirely, delivering the content directly to the agent. This is the best-case path. -
Agents that request HTML (most agents, including Gemini, Copilot, and others, send
Accept: text/htmlor*/*). These agents receive the full HTML response. Some pipelines convert HTML to markdown before truncation (Claude Code uses Turndown); others may truncate raw HTML or use their own processing. The HTML path is where boilerplate CSS/JS causes the most damage. -
Agents that use
.mdURL variants. If an agent knows to append.mdto a URL (becausellms.txttold it, or a directive on the page, or persistent context), it gets markdown directly regardless of Accept headers.
Because different agents hit different paths, this spec defines size checks for both the markdown response (if available) and the HTML response. A site that’s only optimized for the markdown path is leaving most agents behind.
page-size-markdown #
- What it checks: The character count of the page when served as markdown,
via either the
.mdURL variant or content negotiation withAccept: text/markdown. Only runs if the site serves markdown (as detected by Category 2 checks). - Why it matters: This is the best-case scenario for agent consumption. Markdown is what agents actually want, and it’s the format where page size most directly corresponds to what the model sees. If the markdown version fits within truncation limits, agents that can request it will get the full content.
- Result levels:
- Pass: Under 50,000 characters (fits comfortably within all known limits, including Claude Code’s direct-delivery threshold for trusted sites).
- Warn: Between 50,000 and 100,000 characters (fits within Claude Code’s truncation limit but may exceed others; also exceeds the direct-delivery threshold, meaning a summarization model may process it).
- Fail: Over 100,000 characters (will be truncated by Claude Code and likely all other platforms).
- Automation: Full.
- Notes: If the site doesn’t serve markdown at all, this check is skipped
and
page-size-htmlbecomes the primary size check. The report should note that the site relies entirely on the HTML path.
page-size-html #
- What it checks: The character count of the HTML response, and the character count after simulating an HTML-to-markdown conversion (using a Turndown-equivalent pipeline). Reports both numbers.
- Why it matters: Most agents receive HTML, not markdown. The raw HTML size determines whether the page even fits in the fetch buffer (Claude Code caps at ~10MB). The post-conversion size is closer to what the agent’s summarization model actually sees, but conversion is lossy and unpredictable. A 500KB HTML page might convert to 50KB of useful markdown (safe) or 400KB of markdown including raw CSS text that survived conversion (not safe). Both numbers matter.
- Result levels (based on post-conversion size, since that’s what the
model receives):
- Pass: Converted content under 50,000 characters.
- Warn: Converted content between 50,000 and 100,000 characters.
- Fail: Converted content over 100,000 characters.
- Automation: Full. Use a Turndown-equivalent library with default
configuration (no explicit
<style>/<script>stripping) to match observed agent behavior. - Report details: Show both the raw HTML size and the post-conversion size. A large gap between the two indicates heavy boilerplate. Report the conversion ratio (e.g., “505KB HTML -> 12KB markdown (98% boilerplate)”) as a useful signal for site owners.
content-start-position #
- What it checks: How far into the post-conversion content (by character count and as a percentage) the actual documentation content begins.
- Why it matters: Even after HTML-to-markdown conversion, boilerplate can
survive. Turndown’s default configuration doesn’t strip
<style>tag contents; it dumps CSS rules as raw text into the markdown output. If inline CSS and JavaScript consume most of the truncation budget, the summarization model never sees the documentation content. In one observed case, actual content didn’t start until 87% of the way through the HTML response (441K characters of CSS before the first paragraph), and the post-conversion output was still dominated by CSS text. - Result levels:
- Pass: Content starts within the first 10% of the post-conversion output.
- Warn: Content starts between 10% and 50%.
- Fail: Content starts after 50%.
- Automation: Heuristic. Detect first meaningful content element (heading, paragraph with prose) after stripping obvious boilerplate patterns (CSS rules, JavaScript, navigation text).
- Notes: This check only applies to the HTML path. Markdown served directly by the site should not have boilerplate preamble; if it does, that’s a separate issue worth flagging but not something this check targets.
Category 4: Content Structure #
These checks evaluate whether page content is structured in ways that agents can effectively consume. These are harder to fully automate and rely more on heuristics.
tabbed-content-serialization #
- What it checks: Whether pages use tabbed, accordion, or dropdown UI patterns that serialize into long sequential content in the source, and if so, how large the serialized output is.
- Why it matters: Tabbed content is great for humans but can be catastrophic for agents. A tutorial with 11 language variants serializes into a single massive document where an agent might see only the first 1-3 variants. Source order determines what the agent sees; everything past the truncation point is invisible. Asking for a specific variant (e.g., Python) does not help if that variant is beyond the truncation point.
- Result levels:
- Pass: No tabbed content, or tabbed content that serializes to under 50,000 characters total.
- Warn: Tabbed content serializes to 50,000-100,000 characters.
- Fail: Tabbed content serializes to over 100,000 characters.
- Automation: Heuristic. Detect common tab/accordion component patterns
(e.g.,
<Tab>,<Tabs>, role=“tabpanel”, common CSS class patterns) and estimate serialized size.
section-header-quality #
- What it checks: Whether section headers contain enough context to be meaningful without the surrounding UI. Specifically, when tabbed content is serialized, do headers distinguish which variant (language, platform, deployment type) a section belongs to?
- Why it matters: When an agent sees serialized tabbed content, descriptive headers are the only way it can tell which section applies to which context. Generic headers like “Step 1” repeated across all variants are indistinguishable. Headers like “Step 1 (Python/PyMongo)” preserve the filtering context that the UI provided to human readers.
- Result levels:
- Pass: Headers within serialized tabbed sections include variant context.
- Warn: Headers are present but generic/repeated across variants.
- Fail: No distinguishing headers in serialized tabbed content.
- Automation: Heuristic. Requires detecting tabbed sections and analyzing header patterns within them.
markdown-code-fence-validity #
- What it checks: Whether markdown content contains unclosed or improperly
nested code fences (
```or~~~blocks without a matching closing delimiter). - Why it matters: An unclosed code fence causes everything after it to be
interpreted as code rather than prose. The agent sees documentation text,
API descriptions, and instructions as if they were inside a code block,
which fundamentally changes how it processes the content. A model treats
code blocks as literal content to reproduce or analyze, not as natural
language instructions to follow. If an unclosed fence appears early in a
page, the agent effectively loses the rest of the document’s meaning. This
applies to any markdown the site serves directly: pages via
.mdURLs or content negotiation, andllms.txtfiles themselves. - Result levels:
- Pass: All code fences in the markdown content are properly opened and closed.
- Warn: Code fences are technically balanced but use inconsistent
delimiters (e.g., opening with
```and closing with~~~), which some parsers may not match correctly. - Fail: One or more unclosed code fences detected.
- Automation: Full. Parse the markdown for fence delimiters (
```and~~~, with optional info strings) and verify each opening delimiter has a matching close. Run against markdown served via.mdURLs, content negotiation responses, andllms.txtfiles. - Notes: This check applies to markdown the site authors and serves directly. Code fences broken by an HTML-to-markdown conversion pipeline are outside the site owner’s control, though implementations may optionally flag them as informational findings.
Category 5: URL Stability and Redirects #
These checks evaluate whether documentation URLs behave in ways that agents can handle, given that agents retrieve URLs from training data and have limited ability to discover moved content.
http-status-codes #
- What it checks: Whether pages return correct HTTP status codes. In particular, whether “not found” pages return 404 (not 200 with a friendly error page).
- Why it matters: Soft 404s (200 status with “page not found” content) are worse than real 404s for agents. The agent sees a 200 and tries to extract information from the error page content rather than recognizing the page doesn’t exist. A clean 404 tells the agent to try a different approach.
- Result levels:
- Pass: Error pages return appropriate 4xx status codes.
- Fail: Error pages return 200 (soft 404).
- Automation: Full. Test known-bad URLs (e.g., append random strings to real page paths) and check status codes.
redirect-behavior #
- What it checks: Whether redirects are same-host (transparent to agents) or cross-host (a friction point), and whether redirects use proper HTTP status codes (301/302) vs. JavaScript-based redirects.
- Why it matters: Same-host redirects work transparently because the HTTP client follows them automatically. Cross-host redirects are a known failure point; Claude Code, for example, doesn’t automatically follow cross-host redirects (security measure against open-redirect attacks). JavaScript redirects don’t work at all because agents don’t execute JavaScript.
- Result levels:
- Pass: All redirects are same-host HTTP redirects (301/302).
- Warn: Cross-host HTTP redirects are present (agents may or may not follow them depending on the platform).
- Fail: JavaScript-based redirects are detected.
- Automation: Partial. HTTP redirects are detectable. JavaScript redirects
require fetching the page and scanning for
window.location,meta refresh, or similar patterns.
Category 6: Agent Discoverability Directives #
These checks evaluate whether the site includes signals that help agents find and navigate content effectively.
llms-txt-directive #
- What it checks: Whether documentation pages include a directive, visible
to agents but not necessarily to human readers, pointing to
llms.txtor another discovery resource. - Why it matters: Anthropic embeds a directive at the top of every Claude
Code docs page telling agents to fetch the documentation index at
llms.txt. In practice, agents see this directive, follow it, and use the index to find what they need. It’s simple, low-effort, and observed to work in real agent workflows. This is the agent equivalent of a “You Are Here” marker. The directive can be visually hidden (e.g., using a CSS clip-rect technique) as long as it remains in the DOM and survives HTML-to-markdown conversion. Avoiddisplay: none, which some converters strip. - Result levels:
- Pass: A directive pointing to
llms.txt(or equivalent index) is present in page HTML, ideally near the top of the content. - Warn: A directive exists but is buried deep in the page (may be past truncation).
- Fail: No agent-facing directive detected.
- Pass: A directive pointing to
- Automation: Heuristic. Search the page HTML for patterns like links to
llms.txt, phrases like “documentation index”, or directives near the top of the content area. Check both visible text and visually-hidden elements.
Category 7: Observability and Content Health #
These checks evaluate whether the site’s agent-facing resources stay accurate
and up to date over time. Categories 1-6 can be evaluated as point-in-time
audits; this category addresses the ongoing maintenance dimension. llms.txt
files and markdown endpoints are secondary outputs that often aren’t wired
into existing monitoring, so they can go stale, break, or drift from primary
HTML content without anyone noticing.
llms-txt-freshness #
- What it checks: Whether
llms.txtcontent reflects the current state of the documentation site. - Why it matters: An
llms.txtthat was accurate at launch but hasn’t been updated since is a silent failure mode. New pages won’t appear in the index, deleted pages will send agents to 404s, and renamed pages will produce redirect chains or broken links. Unlikellms-txt-links-resolve(which catches broken links), this check catches missing coverage: pages that exist on the site but aren’t represented inllms.txt. - Result levels:
- Pass:
llms.txtlinks cover the site’s primary pages and no links point to removed content. - Warn: Some live pages are missing from
llms.txt, orllms.txthasn’t been updated recently relative to site changes. - Fail:
llms.txtcontains significant stale links or is missing large sections of the documentation.
- Pass:
- Automation: Heuristic. Compare links in
llms.txtagainst a sitemap or crawled page list; flag pages present in the sitemap but absent fromllms.txt. CheckLast-ModifiedorETagheaders onllms.txtvs. recently changed doc pages. - Notes: The definition of “primary pages” requires judgment. Not every
page needs to be in
llms.txt(changelog pages, release notes archives, and similar low-value pages can reasonably be omitted). Implementations should allow configurable exclusion patterns.
markdown-content-parity #
- What it checks: Whether markdown versions of pages contain the same substantive content as their HTML counterparts.
- Why it matters: When markdown is generated separately from HTML (rather than being the source that HTML is built from), the two can drift. A site might update an HTML page but forget to regenerate the markdown version, leaving agents with outdated instructions or code examples. This is particularly insidious because agents that receive the markdown version have no signal that a newer HTML version exists.
- Result levels:
- Pass: Markdown and HTML versions contain equivalent content.
- Warn: Minor differences detected (formatting variations, whitespace, navigation elements present in one but not the other).
- Fail: Substantive content differences: missing sections, outdated code examples, or different instructions between the two versions.
- Automation: Heuristic. Fetch both versions, extract text content from HTML (strip tags), and compare key sections (headings, code blocks, paragraph content) for meaningful differences. Minor formatting differences should be ignored.
- Notes: Sites where markdown is the source format and HTML is generated from it are less likely to have parity issues, but the check is still valuable as a safety net for build pipeline failures.
cache-header-hygiene #
- What it checks: Whether
llms.txtand markdown endpoints have cache headers that allow timely updates. - Why it matters: Aggressive caching on agent-facing resources means
that even after a site owner updates their
llms.txtor markdown content, agents (and intermediary CDNs) may continue serving stale versions for hours or days. Conversely, no cache headers at all leads to ambiguous behavior where different CDN providers apply their own defaults. For resources that are relatively small and infrequently fetched, short cache lifetimes with revalidation are appropriate. - Result levels:
- Pass: Cache headers allow timely updates (e.g.,
max-ageunder 3600, or usesmust-revalidatewithETag/Last-Modified). - Warn: Moderate caching (1-24 hours) that could delay updates.
- Fail: Aggressive caching (over 24 hours) with no revalidation mechanism, or no cache headers at all (ambiguous behavior).
- Pass: Cache headers allow timely updates (e.g.,
- Automation: Full. Inspect
Cache-Control,Expires,ETag, andLast-Modifiedresponse headers.
Ongoing Monitoring Recommendations #
The three checks above can be run as one-time audits, but they’re most valuable when run on a schedule. This section offers non-normative guidance on integrating agent-facing resources into existing monitoring workflows.
Include llms.txt and markdown endpoints in uptime monitoring. These
resources should be monitored alongside your primary documentation site. A
200 response from your docs homepage doesn’t guarantee that /llms.txt or
.md URL variants are also healthy. Add them to whatever uptime tool you
already use (Pingdom, Uptime Robot, Checkly, etc.) as separate check targets.
Set up alerting for response time degradation. If your llms.txt or
markdown endpoints start responding slowly, agents may time out before
receiving content. This is especially relevant for dynamically generated
markdown (as opposed to static files), where a backend issue could cause
latency spikes that don’t affect the HTML site.
Run freshness and parity checks on a schedule. Rather than treating
llms-txt-freshness and markdown-content-parity as one-time audits, run
them weekly or on every deploy. A CI check that compares llms.txt link
coverage against the sitemap can catch missing pages before they reach
production.
Monitor for silent failures. A 200 response with empty content, a
generic error message, or a login page is worse than a clean 404, because
agents will try to extract information from the response. Check that
llms.txt and markdown responses contain expected content markers (e.g., an
H1, a minimum character count) rather than just checking for a 200 status
code.
Category 8: Authentication and Access #
These checks evaluate whether documentation is accessible to agents without requiring interactive authentication. Docs behind login walls are effectively invisible to coding agents, which has significant implications as agent-assisted development becomes a standard workflow.
Why This Matters #
Enterprises often gate documentation behind authentication to protect intellectual property, enforce licensing terms, or comply with access control policies. These are legitimate business reasons. However, the tradeoff is sharper than most organizations realize: authenticated docs are not just inconvenient for agents, they are completely inaccessible.
When an agent encounters an auth-gated page, it sees one of these:
- A 401 or 403 response, which tells it nothing useful.
- A login page returned as 200, which is a soft 404 from the agent’s perspective. The agent tries to extract documentation from the login form HTML and produces nonsensical results.
- A redirect to an SSO provider, which is a cross-host redirect the agent cannot follow, even if it wanted to.
In all three cases, the agent may take one of two actions:
- Fall back on whatever it absorbed during training, which may be outdated, incomplete, or wrong.
- Leave your official product website and look for secondary sources to learn about your product, including blogs or articles which may inaccurate, outdated, and not reflect your official best practices.
In these scenarios, the developer either gets bad guidance, or has to manually copy-paste docs into the conversation, losing the workflow benefits that agents provide. This may also be completely invisible to the developer, as an agent may “helpfully” turn to blog posts or secondary references without disclosing to the human user that it used secondary sources which should be verified.
The competitive dimension is real. If your product’s documentation requires a login and your competitor’s doesn’t, developers using agents will have a dramatically better experience with the competitor’s product. The agent can read the competitor’s API reference, find code examples, and verify patterns in real time. For your product, the agent is guessing.
auth-gate-detection #
- What it checks: Whether documentation pages require authentication to access content.
- Why it matters: A documentation site that returns login pages, 401/403 responses, or SSO redirects for its content pages is completely opaque to agents. This check identifies the problem so site owners can make an informed decision about the tradeoff.
- Result levels:
- Pass: Documentation pages return content (200 with substantive body) without requiring authentication.
- Warn: Some pages are accessible but others require authentication (partial gating). This is common for sites that gate advanced content or API references while keeping tutorials public.
- Fail: All or most documentation pages require authentication.
- Automation: Full. Fetch a sample of documentation URLs and classify
responses: 200 with content (accessible), 401/403 (auth required), 200
with login form heuristics (soft auth gate), or redirect to known SSO
providers (auth redirect). Login form detection uses heuristics: look for
<input type="password">, common SSO redirect domains (okta.com, auth0.com, login.microsoftonline.com), or page titles containing “sign in” or “log in”. - Notes: This check is informational for sites that intentionally gate content. It doesn’t prescribe that all docs must be public. It ensures the site owner is aware of the agent accessibility impact and can evaluate whether alternative access paths (see below) are warranted.
auth-alternative-access #
- What it checks: Whether an auth-gated documentation site provides alternative access paths that agents can use.
- Why it matters: Sites that must gate their primary docs can still serve agents through secondary channels. This check looks for evidence that such channels exist, giving the site credit for providing agent access even when the main docs require a login.
- Result levels:
- Pass: At least one alternative access path is detected (see list below).
- Warn: The site provides partial alternative access (e.g., an
llms.txtexists but only covers a subset of the gated content). - Fail: No alternative access paths detected for auth-gated content.
- Automation: Partial. Some access paths can be detected automatically; others require manual verification.
- Detectable access paths:
- Public
llms.txt: The site serves anllms.txtfile that doesn’t require authentication, even if the underlying docs pages do. This gives agents at least a navigational index. - Public markdown or API endpoint: Some pages or a content API respond to unauthenticated requests even when the main docs UI requires login.
- Bundled documentation: The product ships docs as part of its package
or SDK (e.g., a
docs/directory, man pages, or built-inhelpsubcommands). Agents can read local files without authentication. - CLI-based doc access: The product provides a CLI command (e.g.,
yourproduct docs search "topic") that the developer has already authenticated, making content available to agents through tool use. - MCP server: The organization provides an MCP server that exposes documentation through tool calls, with authentication handled server-side. This is the most capable option for private docs because it preserves full content access while keeping credentials out of the agent context. (Detection is manual; there’s no standard way to discover whether a company offers an MCP server.)
- Public
- Notes: Only applies when
auth-gate-detectionreturns warn or fail. If docs are publicly accessible, this check is skipped.
Making Private Docs Agent-Accessible #
This section offers non-normative guidance for organizations that gate their documentation. The options below are ordered roughly by implementation effort, from lowest to highest.
1. Ungating reference documentation. The simplest option: make API references, SDK docs, and integration guides public while keeping truly sensitive content (internal architecture, security configurations, pricing tiers) behind auth. Many enterprises already do this for developer experience reasons. Agents benefit from the same split.
2. Shipping docs with the product. Include documentation as local files
in your SDK, package, or CLI tool. A docs/ directory with markdown files,
comprehensive README content, or built-in help text is always available to
agents reading the local filesystem. This is particularly effective for
API clients and libraries where the docs are version-specific anyway.
3. Providing a public llms.txt. Even if page content is gated, a
public llms.txt that describes what documentation exists and how it’s
organized gives agents a map. They can tell the developer “the rate limiting
docs are at /docs/api/rate-limits, but I can’t access them; could you paste
the relevant section?” This is better than the agent having no idea what
docs exist at all.
4. Supporting token-based access for agent-facing endpoints. Serve
llms.txt and markdown content behind API key or bearer token
authentication rather than browser-based SSO. Agents and their tooling can
be configured to pass static credentials, similar to how npm or pip
authenticate with private registries. This preserves access control while
enabling programmatic access.
5. Building an MCP server. An MCP server gives agents structured,
authenticated access to documentation through tool calls like
search_docs("rate limiting") or get_doc("api/authentication"). Auth
credentials are configured on the server; the agent never sees them. This
is the richest option because the MCP server can provide search, filtering,
and context-aware responses rather than just serving raw files. It also
allows fine-grained access control (different API keys could see different
content tiers).
6. Providing a CLI with doc access. If your product already has a CLI
that developers authenticate with, adding a docs subcommand gives agents
access through a channel the developer has already authorized. The agent
calls the CLI tool; the CLI handles authentication using the developer’s
existing credentials.
Organizations don’t need to implement all of these. A public llms.txt
combined with ungated reference docs covers the most common agent use cases
with minimal effort. MCP servers are for organizations that want to provide
a first-class agent experience with their private documentation.
Checks Summary #
| ID | Category | Automation | Severity | Depends On |
|---|---|---|---|---|
llms-txt-exists |
llms.txt | Full | High | – |
llms-txt-valid |
llms.txt | Full | Medium | llms-txt-exists |
llms-txt-size |
llms.txt | Full | High | llms-txt-exists |
llms-txt-links-resolve |
llms.txt | Full | High | llms-txt-exists |
llms-txt-links-markdown |
llms.txt | Full | Medium | llms-txt-exists |
markdown-url-support |
Markdown Availability | Full | High | – |
content-negotiation |
Markdown Availability | Full | Medium | – |
page-size-markdown |
Page Size | Full | High | markdown-url-support or content-negotiation |
page-size-html |
Page Size | Full | High | – |
content-start-position |
Page Size | Heuristic | High | – |
tabbed-content-serialization |
Content Structure | Heuristic | High | – |
section-header-quality |
Content Structure | Heuristic | Medium | tabbed-content-serialization |
markdown-code-fence-validity |
Content Structure | Full | Medium | markdown-url-support or content-negotiation |
http-status-codes |
URL Stability | Full | Medium | – |
redirect-behavior |
URL Stability | Partial | Medium | – |
llms-txt-directive |
Agent Discoverability | Heuristic | Medium | – |
llms-txt-freshness |
Observability | Heuristic | High | llms-txt-exists |
markdown-content-parity |
Observability | Heuristic | Medium | markdown-url-support or content-negotiation |
cache-header-hygiene |
Observability | Full | Medium | – |
auth-gate-detection |
Authentication | Full | High | – |
auth-alternative-access |
Authentication | Partial | Medium | auth-gate-detection (warn or fail) |
Appendix A: Known Platform Truncation Limits #
The thresholds used in this spec’s pass/warn/fail levels are derived from observed and documented platform behavior. This appendix tracks known limits so that implementations can calibrate their thresholds appropriately, and so that the spec’s default thresholds can be updated as more data becomes available.
Thresholds Used in This Spec #
The spec uses two threshold tiers across its size-related checks:
- 50,000 characters: The “pass” threshold. Content under this size fits comfortably within all known platform limits.
- 100,000 characters: The “fail” threshold. Content over this size will be truncated by Claude Code and likely by most other platforms.
These are conservative defaults based on the best-documented platform (Claude Code). Implementations should allow these thresholds to be configurable so users can evaluate against specific platform limits or adjust as new data becomes available.
Known Platform Limits #
| Platform | Truncation Limit | Source | Confidence | Notes |
|---|---|---|---|---|
| Claude Code | ~100,000 chars | Reverse engineering | High | Trusted sites serving text/markdown under 100K chars bypass summarization model entirely. Content over this threshold goes through a summarization model that may lose information. |
| MCP Fetch (reference server) | 5,000 chars (default) | Official docs | High | Default max_length is 5,000 chars. Configurable up to 1,000,000. Supports chunked reading via start_index. |
| Claude API (web_fetch tool) | ~20,700 chars - default, unset | empirical testing | Medium | Optional max_content_tokens parameter can cap content length, but no default truncation limit is documented. Distinct implementation from Claude Code client-side tool. Default truncation ~20,700 chars when unset - ended mid-word. max_content_tokens is approximate — setting 5,000 returned 17,186 chars. Truncation occurs mid-token. CSS stripped effectively unlike Claude Code. HTML boilerplate 81–97.5% before first heading; Markdown reduces content 77%. JS-rendered pages return static shell only. |
| Google Gemini (URL context) | Unknown | empirical testing | Medium | Docs state a 34 MB max fetch size per URL, but this is a retrieval ceiling, not a processing limit. How much content actually reaches the model after fetching is undocumented. 20 URL hard limit per request, 400 INVALID_ARGUMENT if exceeded, zero tokens consumed. Truncation boundary unknown — retrieved content is injected into context without a testable field; tool_use_prompt_token_count is the only available size proxy, <1% variance across runs. PDF failed consistently despite being a documented supported type; YouTube succeeded despite being documented as unsupported. url_context_metadata order is non-deterministic. Tested on gemini-2.5-flash only — behavior may vary across supported models. |
| OpenAI (web search) | Unknown | – | – | 128K token context window for web search. search_context_size parameter (low/medium/high) controls context amount but no per-page truncation limit is documented. |
| Cursor | Unknown | – | – | Requests text/markdown via Accept header. No documented truncation limit. |
| GitHub Copilot | Unknown | – | – | No documented web fetch or truncation details. |
| Windsurf | Unknown | – | – | Docs state it “chunks up web pages” and “skims to the section we want.” No specific limits documented. |
Thank you to contributors!
- Claude API (web_fetch tool) limitations contributed by Rhyannon Rodriguez
- Google Gemini (URL context) limitations contributed by Rhyannon Rodriguez
What This Means for Threshold Selection #
The MCP Fetch reference server’s default of 5,000 characters is worth noting. Many agent setups use MCP-based fetch tools, and if users haven’t changed the default, they’re working with a limit 20x smaller than Claude Code’s. A page that passes at the 50K threshold may still be unusable for MCP Fetch users with default settings.
Implementations may want to support named profiles (e.g., --profile claude-code, --profile mcp-default) that set thresholds to match specific
platforms, in addition to allowing custom threshold values.
Appendix B: Notable Exclusions #
This section documents topics that were considered for the spec but intentionally excluded, along with the rationale.
robots.txt and AI User-Agent Blocking #
robots.txt can block known AI training crawlers (ClaudeBot, GPTBot,
Google-Extended, etc.) that identify themselves via user-agent strings.
However, this is a crawling policy concern, not an agent-friendliness concern,
and the two audiences are distinct.
Training crawlers and coding agents are different request paths with different
user-agents. The agents this spec targets (coding assistants fetching docs
during real-time workflows) are largely invisible to robots.txt:
| Agent | User-Agent | Identifiable as AI? |
|---|---|---|
| Claude Code | axios/1.8.4 |
No (generic HTTP library) |
| Cursor | Standard Chrome UA | No |
| OpenCode | Standard Chrome UA | No |
| GitHub Copilot | Electron/VS Code UA | No (looks like normal IDE traffic) |
| OpenAI Codex | ChatGPT-User/1.0 |
Yes |
| Gemini CLI | GoogleAgent-URLContext |
Yes |
| Windsurf | colly |
Somewhat (Go scraping library) |
Source: Checkly, “State of AI Agent Content Negotiation”
Most coding agents use standard browser user-agent strings and are
indistinguishable from human traffic. A site blocking ClaudeBot in
robots.txt is blocking Anthropic’s training crawler, not Claude Code
fetching a docs page. Since this spec is about making documentation accessible
to agents in real-time workflows, robots.txt configuration is out of scope.
GitHub Raw URL Fallback #
GitHub raw URLs (raw.githubusercontent.com/...) were observed to be the
single most reliable documentation access pattern in practice. When official
docs failed (rate-limited, JavaScript-rendered, or hard to navigate), GitHub
was almost always a viable fallback.
However, this is a fallback strategy for agent users, not a property of the documentation site itself. Whether a project’s docs source happens to be on GitHub, and whether the raw content there is usable as standalone documentation, is outside the control of a docs site evaluation. This spec focuses on what documentation site owners can do to improve agent accessibility of their own sites.
Contributing #
This spec is a living document. Feedback, corrections, and contributions are welcome.
- Discussion and feedback: Open an issue on the GitHub repository.
- Proposing changes: Submit a pull request. For significant changes (new checks, changes to pass/warn/fail criteria, new categories), please open an issue first to discuss the proposal.
- Platform truncation data: If you have data about a platform’s web fetch truncation limits (from official documentation, reverse engineering, or empirical testing), please contribute it to the Known Platform Limits table via issue or PR.
- Real-world validation: If you’ve run these checks against your own documentation site and have findings to share, we’d love to hear about it.
References #
- llmstxt.org proposal
- Dachary Carey, “Agent-Friendly Docs”
- Dachary Carey, “Agent Web Fetch Spelunking”
- Giuseppe Gurgone, reverse-engineered Claude Code Web Fetch
- Mikhail Shilkov, “Claude Code Web Tools”
- Liran Yoffe, “Reverse Engineering Claude Code Web Tools”
- Checkly, “State of AI Agent Content Negotiation”
- Kody Jackson, “The speculative and soon to be outdated AI consumability scorecard”
- Longato, “LLMs.txt - Why Almost Every AI Crawler Ignores it”
- OtterlyAI, “llms.txt and AI Visibility: Results from OtterlyAI’s GEO Study”
Changelog #
v0.1.0 (2026-02-22) - Initial Draft #
- Initial spec with 21 checks across 8 categories.
- Progressive disclosure recommendation for large
llms.txtfiles. - Authentication and access category: auth gate detection, alternative access paths, and guidance for making private docs agent-accessible.
- Known platform truncation limits (Appendix A).
- Notable exclusions with rationale (Appendix B).
There's no articles to list here yet.