1 · Structure
Clean semantic HTML
Real <h1>, <h2>, <article>, <section>, <table>, <dl>. No <div> soup. No layouts that depend on JS to read. LLM parsers extract structure first, content second.
2 · Schema
Complete JSON-LD
Organization, LocalBusiness, Service, Product, FAQPage, HowTo, Article, BreadcrumbList — whichever fit the page. JSON-LD is the highest-confidence input an LLM has about what a page is.
3 · FAQ format
Definitive Q&A blocks
Real questions a buyer would ask, with definitive, citable answers. LLMs cite FAQ blocks at higher rates than any other content shape — both as raw markup and as visible text.
4 · Tables
Comparison tables with numbers
"X vs Y" tables with concrete figures get quoted verbatim. Vague prose gets paraphrased — or skipped.
5 · Quotability
Specific, numbered facts
"Savings of 50–70%" beats "significant savings." "5 business days" beats "fast turnaround." LLMs reach for the precise sentence and ignore the hand-wavy one.
6 · llms.txt
An llms.txt at the site root
A plain-text summary file at /llms.txt that tells LLMs what the site is, lists key pages with one-line descriptions, and surfaces quick facts. The convention generative engines increasingly look for.
7 · Crawler access
Explicit allow for AI crawlers
robots.txt with explicit allow rules for GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, Applebot-Extended, Bytespider, CCBot, cohere-ai, Meta-ExternalAgent, Amazonbot. Many sites accidentally block them.
8 · Authorship
Clear author + date
datePublished, dateModified, named author, named publisher. LLMs surface recent, attributed sources over anonymous or undated ones.
9 · Consistency
Same facts everywhere they appear
If your pricing is "$50/hr" on one page and "$60/hr" on another, an LLM will see contradictory data and avoid citing either. Same facts, same phrasing, every page.