Getting cited by AI engines is a system, not a hack. There's no single trick that flips you on. But every business that consistently shows up in AI answers does the same fundamental things, in roughly the same sequence. This is the playbook.
It's organized in three layers: technical foundations (what makes you crawlable and parseable), content patterns (what makes you citable), and authority signals (what makes you trusted). Skip any layer and the others can't compensate.
Layer 1: Technical foundations
1.1 Make sure AI crawlers can reach your site
Every major AI engine has its own crawler. They follow robots.txt rules the same way Googlebot does. Many websites — especially older ones — implicitly or explicitly block them. Open yourdomain.com/robots.txt and verify these are allowed (not Disallow:):
GPTBot— OpenAI's training crawlerOAI-SearchBot— OpenAI's live search crawlerChatGPT-User— fetches pages when users click links in ChatGPTClaudeBot/Claude-Web— Anthropic crawlersPerplexityBot/Perplexity-User— Perplexity crawlersGoogle-Extended— Google's AI training (separate from Googlebot)Bytespider,Amazonbot,meta-externalagent— emerging engines worth allowing
If your robots.txt doesn't explicitly mention them and has a permissive default (User-agent: * Allow: /), they can crawl. But explicit allow-lines remove ambiguity and signal that you welcome the traffic.
1.2 Server-render your content
AI crawlers vary in how aggressively they execute JavaScript. Some don't run JS at all. Others run it with strict timeouts. The safe assumption: if it's not in your HTML response, it's not getting indexed.
Static HTML, SSR (server-side rendering), and SSG (static site generation) all work. SPAs without SSR fail. If you're on a modern framework (Next.js, Astro, Nuxt, SvelteKit), you're probably fine. If you're on an older React or Vue SPA, this is a foundational fix worth prioritizing.
1.3 Make your pages fast and clean
AI crawlers have crawl budgets like Google. A slow site gets less coverage. A site with broken links, redirect loops, or 5xx errors during crawling gets de-prioritized. The same Core Web Vitals work that helps Google rankings also helps AI coverage. There's no separate "AI speed" target — just be fast for everyone.
1.4 Deploy comprehensive JSON-LD schema
This is the single highest-leverage technical fix. Schema gives AI engines structured truth they can rely on instead of guesses. Required schema for most businesses:
- Organization (or LocalBusiness if you have a physical location) on every page
- WebSite with SearchAction on every page
- BreadcrumbList on internal pages
- Article on blog posts — with author, datePublished, dateModified, and inLanguage
- FAQPage wherever you have Q&A content (it's the #1 schema for AI extraction)
- Product with Offer for e-commerce; Service for services
- HowTo for step-by-step content
- Review / AggregateRating where you have honest review data
Validate everything with Google's Rich Results Test before deploying. Bad schema is sometimes worse than no schema.
Layer 2: Content patterns that get cited
2.1 Question-first structure
AI engines extract content. The shape they prefer: an H2 that's a real question, followed immediately by a self-contained 40–80 word answer paragraph, followed by detail and nuance. This is the inverted-pyramid pattern news writers used for a century, and it's now the dominant pattern for AI-citable content.
Bad headings: "Our Solution," "How We Work," "About the Process."
Good headings: "What does Reffed actually do?", "How long does it take to see results?", "Why doesn't ChatGPT mention my business?"
2.2 Factual density over marketing prose
Compare two sentences:
- "Our innovative platform unlocks meaningful results for our customers."
- "Reffed customers see citation share grow an average of 31% within 60 days, measured across ChatGPT, Claude, Perplexity, Google AI Overviews, Gemini, and Copilot."
The first is unciteable. The second is extractable, specific, verifiable. Every page you want cited should be packed with concrete claims: numbers, named comparisons, dated references, specific time-windows, named techniques. Marketing fluff doesn't get pulled into AI answers.
2.3 Original frameworks and naming
AI engines love sites that contribute novel structure. If you name a concept ("the seven reasons ChatGPT doesn't mention your business"), explain it cleanly, and other sites start referencing it, you become the canonical source. Naming is a defensible moat in content — it's much harder to compete with a named framework than with a generic explainer.
Don't force this. Don't invent terms that don't add clarity. But when you genuinely have a clear way to organize a problem, claim it.
2.4 First-party data and original research
Almost nothing earns citations like original data. Run a survey, analyze your own customer base, publish findings. Even a small dataset, well-explained, becomes a magnet for citations because every competing piece of content will reach for it as supporting evidence. AI engines particularly reward sources that originate data versus aggregating it.
2.5 Comparison content
"X vs Y" is one of the most prompted question shapes in AI assistants. If your category has named competitors, write honest comparison pages — even comparisons against tools that aren't yours. The traffic is exceptional and the citation rate is high, because AI engines love structured comparison content.
Honesty matters. The fastest way to get demoted across AI engines is to write thinly-disguised promotional content masquerading as comparison. Genuinely acknowledge competitors' strengths. Engines that detect bias steer around you.
Layer 3: Authority signals that compound
3.1 Third-party mentions in your category
AI engines build trust through corroboration. They believe what multiple credible sources independently report. The single biggest predictor of whether a business gets cited isn't on-page anything — it's how many times the business is mentioned, by name, in trusted third-party content about its category.
High-leverage sources:
- Industry-specific "best of" roundups — these get crawled and recrawled constantly
- Trade publications in your category
- Niche directories (Capterra, G2, Clutch, etc., for software; geographic + industry directories for local)
- Podcast appearances — most major podcasts publish transcripts, which get crawled
- Guest posts on category-relevant blogs
- Wikipedia where editorially defensible
- Reddit and forum discussions — AI engines now actively crawl and weight these
3.2 Author bylines and expertise signals
If your content is published anonymously, AI engines have nothing to weight against. Named authors with consistent bylines, linked LinkedIn profiles, demonstrated expertise (published work, credentials, speaking history) and clear domain reputation get cited more than identical anonymous content.
If you can't put a real human name on the byline — for legitimate reasons — at least put an organizational identity with verifiable credentials. "Reffed Editorial" is weaker than "Maria Chen, Reffed" but stronger than nothing.
3.3 Citation networks and outbound links
When you cite authoritative sources, you become part of a credibility network the AI engines map. Quote experts (by name). Link to original sources. Reference primary data instead of secondary writeups. The act of citing well makes you more citable.
3.4 Update cadence
AI engines explicitly weight content recency for many query types. A page that hasn't been touched in two years competes poorly against a similar page updated last month. Establish a quarterly review cadence for your most important pages, refresh the data, update the "last updated" date, and re-publish.
The 30-day starter sequence
For a small business or solo founder doing this without help:
Week 1 — Technical foundations
Allow AI crawlers in robots.txt. Audit schema and deploy Organization, WebSite, BreadcrumbList. Add FAQPage schema to your FAQ section if you have one. Verify everything in Google's Rich Results Test.
Week 2 — Content restructure
Pick your three most important commercial pages. Rewrite headings as questions. Lead each section with a 40–80 word direct answer. Add one specific number or named comparison per section. Tighten meta titles and descriptions.
Week 3 — Original publication
Publish one substantive article. Ideally 1,500+ words, with original framing or first-party data, structured for citation. Add it to your sitemap, link to it from your homepage, share it to category communities. This piece becomes long-term citation fuel.
Week 4 — Third-party signals
Identify 15 third-party sources where you should be mentioned but aren't. Reach out to half of them with genuine pitches (guest post, expert quote, product inclusion). The other half: submit to listing directories where applicable.
What progress looks like
Measure with a fixed prompt list — 20 queries that real customers would type — checked monthly across ChatGPT, Claude, and Perplexity. Track citation share (% of prompts naming you) and competitor share (% naming a specific named competitor instead).
Realistic expectations:
- Months 1–2: technical fixes deploy, schema is recognized, AI crawl coverage improves. Citation share may not move yet.
- Months 3–4: content restructure and original publications start getting referenced. Citation share visibly improves.
- Months 4–6: third-party signals compound. Citation share growth accelerates. Competitor share starts dropping.
- Month 6+: maintain cadence. The site becomes a default citation source in your category.
Don't pixel-overlay your fixes
Worth repeating because it's the most common mistake: tools that "improve your SEO" by injecting changes via JavaScript at the user's browser don't work for AI crawlers. The crawlers don't execute the overlay. Your real HTML, what crawlers actually see, hasn't changed. The whole premise is broken for GEO.
Whatever you do, do it in the actual source — your CMS, your repo, your real meta tags. If you cancel a tool tomorrow, you should be able to see the change still on your site. If it disappears when the tool turns off, it was overlay, not optimization.
Run a free audit
Reffed automatically checks all 40+ signals AI engines use to pick sources. Full report in 60 seconds.