How to get your website cited by ChatGPT, Claude, and Perplexity

Generative engine optimization (GEO) is the practice of structuring your website so AI search engines can find, understand, and cite your content. This guide covers the 18 technical checks, schema markup, llms.txt, robots.txt, and content structure that determine whether AI engines cite your site or ignore it.

Chapter 1

What Is Generative Engine Optimization

Generative engine optimization (GEO) is the practice of structuring your website so AI search engines can find, understand, and cite your content. When someone asks ChatGPT "what tools help startups get customers," you want your site in the answer. GEO is how you get there.

Traditional SEO focuses on ranking in a list of ten blue links. GEO focuses on being included in a generated answer. The difference matters because AI search engines do not just rank pages. They read pages, extract facts, and synthesize responses. If your content is not structured in a way these models can parse, you will be invisible to the fastest-growing search channel in the world.

GEO is not a replacement for SEO. It is an extension of it. A site that ranks well on Google has a head start with AI search engines. But ranking alone is not enough. You also need to make your content easy for language models to extract, attribute, and cite. That requires specific technical and content changes that most SEO guides do not cover.

Chapter 2

How AI Search Engines Decide What to Cite

AI search engines like ChatGPT, Claude, Perplexity, and Google AI Overviews follow a common pattern when generating answers. They crawl the web (or use a search index), retrieve relevant pages, extract information, and synthesize a response. Understanding each step helps you optimize for the entire pipeline.

Crawling and access. Before an AI can cite your content, its crawler needs permission to access your site. OpenAI uses GPTBot, Anthropic uses ClaudeBot, and Google uses its standard Googlebot. If your robots.txt blocks these crawlers, your content is invisible to AI search.

Retrieval and relevance. AI search engines use a combination of traditional search signals (backlinks, domain authority, content relevance) and semantic understanding to decide which pages to retrieve for a given query. Pages that answer a question directly in their opening paragraph are more likely to be retrieved.

Extraction and citation. Once a page is retrieved, the AI reads it and extracts facts. Content that follows a definition-first structure ("X is Y") is easier to extract than content buried in long narratives. Schema markup, particularly FAQ and HowTo schemas, provides structured data the AI can parse directly.

Attribution. AI search engines cite sources when they can clearly attribute a claim to a specific page. Pages with clear authorship, publication dates, and canonical URLs are more likely to receive attribution than anonymous or undated content.

Chapter 3

The 18 Checks That Determine AI Citation Readiness

Your AI citation readiness comes down to 18 specific technical and content checks. These are the same checks our free AI citation checker runs on your site.

Crawler access checks: 1. robots.txt allows GPTBot (OpenAI). If you block GPTBot, ChatGPT cannot read your content. 2. robots.txt allows ClaudeBot (Anthropic). Same principle for Claude and Perplexity. 3. robots.txt allows Googlebot. Google AI Overviews use Googlebot to crawl content. 4. No blanket disallow rules that accidentally block AI crawlers. 5. Server responds with 200 status codes for key pages (not redirects or errors).

Structured data checks: 6. Organization schema present. AI engines use this to identify who published the content. 7. Article or WebPage schema on content pages. Helps AI understand page type and authorship. 8. FAQ schema on relevant pages. AI engines extract FAQ pairs directly into answers. 9. BreadcrumbList schema for navigation context. 10. Author and datePublished fields populated in schema.

Content structure checks: 11. Definition-first paragraphs. The first sentence of key sections directly answers the question the section heading poses. 12. Clear H1, H2, H3 hierarchy. AI models use heading structure to understand content organization. 13. Concise paragraphs under 150 words. Long paragraphs are harder for AI to extract clean quotes from. 14. No content hidden behind JavaScript-only rendering. AI crawlers often cannot execute JavaScript.

Technical foundation checks: 15. SSL certificate valid and active. 16. XML sitemap present and accessible at /sitemap.xml. 17. Page load time under 3 seconds. Slow pages may be deprioritized or skipped. 18. Mobile-responsive layout. Google AI Overviews prioritize mobile-friendly content.

You can check all 18 of these for your site right now using our free AI citation checker at /tools/ai-citation-checker.

Get your free growth plan from Distro

Stop reading about distribution. Start doing it with a plan built for your business.

Get My Free Growth Plan
Chapter 4

How to Create an llms.txt File

The llms.txt file is a proposed standard (similar to robots.txt) that tells AI language models about your site. It provides a structured summary of what your site does, what content is available, and how the AI should understand your business.

While llms.txt is not yet universally adopted by all AI providers, creating one now positions your site ahead of competitors and signals to crawlers that you are AI-friendly. The file lives at the root of your domain: yoursite.com/llms.txt.

A basic llms.txt file includes: your site name and URL, a one-paragraph description of what your site does, a list of your most important pages with brief descriptions, and any specific instructions for how AI models should represent your content.

Here is what a good llms.txt structure looks like for a startup:

Line 1: Your company name and what it does in one sentence. Line 2: Your primary URL. Lines 3+: A list of key pages, each with a URL and a one-sentence description of what the page covers.

The key principle is clarity. Write your llms.txt the way you would write a brief for someone who has never seen your site. Be factual, be specific, and avoid marketing language. AI models respond better to clear descriptions than to persuasive copy.

Keep your llms.txt file updated whenever you add significant new pages or change your core offering. An outdated llms.txt is worse than not having one, because it teaches the AI incorrect information about your site.

Chapter 5

Schema Markup That AI Understands

Schema markup (structured data) is the single most impactful technical change you can make for AI citation readiness. AI search engines use schema to understand what a page is about, who wrote it, when it was published, and what questions it answers.

The most important schema types for AI citation:

Organization schema tells AI models who you are. Include your company name, URL, logo, description, and founder information. Place this on your homepage.

Article schema tells AI models that a page is a piece of content with an author, publication date, and headline. Every blog post, guide, and resource page should have Article schema.

FAQ schema provides question-and-answer pairs that AI models can extract directly. If your page answers common questions, wrapping those Q&A pairs in FAQ schema dramatically increases citation likelihood.

HowTo schema structures step-by-step instructions. If your content explains how to do something, HowTo schema helps AI models extract and present those steps.

BreadcrumbList schema provides navigation context, helping AI models understand where a page sits in your site hierarchy.

When implementing schema, use JSON-LD format (script tags in your HTML head), not Microdata or RDFa. JSON-LD is the format Google recommends and AI models parse most reliably. Test your implementation with Google's Rich Results Test and the Schema.org validator.

Chapter 6

robots.txt Configuration for AI Crawlers

Your robots.txt file controls which crawlers can access your site. For AI citation readiness, you need to explicitly allow the crawlers used by major AI search engines while still blocking them from private areas like dashboards, admin panels, and authentication pages.

The key AI crawlers to allow:

GPTBot is OpenAI's crawler. It powers ChatGPT's web browsing and retrieval features. User-agent: GPTBot.

ChatGPT-User is the crawler used when ChatGPT users ask it to browse the web in conversation. User-agent: ChatGPT-User.

ClaudeBot is Anthropic's crawler, used by Claude. User-agent: ClaudeBot.

PerplexityBot crawls for Perplexity AI. User-agent: PerplexityBot.

Googlebot crawls for Google Search and Google AI Overviews. User-agent: Googlebot.

A good robots.txt strategy for AI readiness: allow all AI crawlers to access your marketing pages, blog, resources, and tools. Disallow access to dashboard, authentication, API, admin, billing, and settings routes. Always reference your sitemap URL at the bottom of the file.

Common mistakes to avoid: using a blanket "Disallow: /" that blocks everything, forgetting to add new AI crawlers as they emerge, and blocking CSS or JavaScript files that crawlers need to render your pages properly.

Check your current robots.txt configuration with our free domain health checker at /tools/domain-health-checker.

Get your free growth plan from Distro

Stop reading about distribution. Start doing it with a plan built for your business.

Get My Free Growth Plan
Chapter 7

Content Structure That Gets Cited

The way you structure your content determines whether AI models can extract clean, citable facts from it. Two pages can contain the same information, but the one with better structure gets cited while the other gets ignored.

The definition-first rule. Start every section with a direct answer to the question posed by the heading. If your H2 is "What is generative engine optimization," the first sentence should be "Generative engine optimization is..." not a three-paragraph backstory. AI models prioritize content that answers questions in the first sentence.

Short paragraphs. Keep paragraphs under 150 words. AI models extract quotes and facts from individual paragraphs. Long, rambling paragraphs make it harder for the AI to isolate a clean quote.

FAQ blocks. Include a frequently asked questions section on content pages. Structure each question as an H3 and answer it directly in the following paragraph. Wrap the entire section in FAQ schema for double impact.

Lists and steps. When explaining a process, use numbered lists or step-by-step formatting. AI models parse structured lists more reliably than prose descriptions of sequential actions.

Clear attribution. Include author names, publication dates, and your organization name on every content page. AI models are more likely to cite content they can attribute to a specific source.

No JavaScript-only content. Ensure your important content is in the initial HTML response, not loaded dynamically via JavaScript. Many AI crawlers do not execute JavaScript, so content behind client-side rendering may be invisible.

Internal linking. Link between your content pages using descriptive anchor text. This helps AI models understand the relationships between your pages and the breadth of your expertise on a topic.

Chapter 8

How to Check Your AI Citation Readiness

You can check your AI citation readiness score right now using our free AI citation checker. Enter your URL and get results across all 18 checks in under 60 seconds. No signup required, no credit card, no email.

The checker analyzes your robots.txt configuration, schema markup, content structure, page speed, SSL status, and crawlability. You get a score out of 100 with specific recommendations for what to fix first.

Most sites score between 30 and 60 on their first check. The most common issues are blocked AI crawlers in robots.txt, missing schema markup on content pages, and content that buries answers deep in long paragraphs instead of leading with definitions.

After you run the check, prioritize fixes in this order: first, fix any crawler access issues (these completely block AI citation). Second, add Organization and Article schema to your key pages. Third, restructure your most important content to follow the definition-first pattern. Fourth, create or update your llms.txt file.

Check your site now at /tools/ai-citation-checker. Then use the free domain health checker at /tools/domain-health-checker to verify the technical foundations.

Ready to start executing?

Stop reading about growth. Get a plan built for your business and start your first mission today.

Get My Free Growth Plan