AI engines don't publish a ranked algorithm. There's no equivalent of Google's quality rater guidelines. But after running the Reffed pipeline on hundreds of sites, watching which signals correlate with citation gains, and triangulating against public statements from engine teams, twelve signals consistently show up as the load-bearing ones.

They're not equal in weight. The first six matter much more than the last six. Work top-down.

1. Underlying search retrieval rank

The most underappreciated signal. Every major AI engine starts with a retrieval step: it runs your prompt (or a derived query) against an index. Results in the top 10–20 of that index are the working set the model considers. If you're not in the top 20 organic results for your target queries on Google or Bing (depending on which the engine uses), you're invisible no matter how perfectly optimized your site is.

Fix: traditional SEO. Rank for your queries. The foundation of GEO is still SEO.

2. Structured data (JSON-LD schema)

Schema gives AI engines explicit truth instead of inference. Sites with complete Organization, WebSite, FAQPage, Article, and Product/Service schema get cited noticeably more than sites without. FAQPage in particular is among the highest-correlation signals we observe.

Fix: see our schema guide for which types matter and how to deploy them.

3. Entity clarity and consistency

AI engines build an entity model of your business by reading your site and what others say about it. If your homepage describes you one way, your about page another, and third-party listings a third, the engine concludes you're not a coherent entity. Sites that are unambiguous about who they are, what category they belong to, and who they serve get cited more.

Fix: write a single sentence describing your business (category + customer + outcome). Use it in your homepage, your about page, your meta description, your Organization schema, and your social profiles. Consistency compounds.

4. Third-party citation density

AI engines weight what other credible sources say about you. The more often you're mentioned, by name, in trusted third-party content about your category, the more confident the engine becomes that you're a real authority. This is the single biggest off-site signal.

Fix: identify 10–15 high-authority third-party sources in your category. Pitch each one. Aim for original contribution (data, expert quote, guest post), not begs-for-a-mention.

5. Question-structured content

AI engines extract content. The shape they reward most: H2 headings written as real questions, followed immediately by a self-contained 40–80 word answer. This pattern is so consistent across engines that it's worth treating as a hard rule for any content you want cited.

Fix: rewrite the headings of your top 5–10 commercial pages as questions. Lead each section with a direct answer in 40–80 words.

6. Factual density and specificity

Marketing prose ("innovative solutions, world-class outcomes") is extractive deadweight. Specific claims (numbers, named comparisons, dated references, concrete examples) get pulled into answers. AI engines actively prefer content packed with verifiable specifics over content full of adjectives.

Fix: for every page you want cited, audit each section. Can you replace adjectives with numbers? Can you name a specific comparison? Can you cite a date or source? Pack the specifics in.

7. AI-crawler accessibility

If GPTBot, ClaudeBot, PerplexityBot, and Google-Extended can't reach your site (blocked in robots.txt, hidden behind JS that doesn't render server-side, behind aggressive bot-blocking firewalls), you're invisible. This sounds basic, but many sites have at least one crawler blocked by accident.

Fix: explicitly allow AI crawlers in robots.txt. Server-render your content. Don't run aggressive WAF rules that block legitimate crawler IPs.

8. Content recency

AI engines weight freshness heavily for most query types. A page with a 2023 dateModified competes poorly against a 2026 page covering the same topic, even if the older page is technically more comprehensive. Recent content signals current relevance.

Fix: establish a quarterly review cycle for your most important pages. Update statistics, refresh examples, modify dateModified, re-publish. Even small content updates that trigger a fresh crawl help.

9. Author and publisher authority

Content with named human authors and verifiable expertise outperforms anonymous content. Author bios, linked LinkedIn profiles, named credentials, and consistent bylines all feed the engine's confidence model. Publisher reputation works the same way — established brands get cited more, but emerging brands can fast-track through consistent publication and clean entity signaling.

Fix: add author bylines with linked profiles. If you publish under a brand byline, ensure the brand has consistent and verifiable identity signals (organization schema, sameAs links, etc.).

10. Outbound citation network

When you cite authoritative sources by name and link to original data, you become part of the citation network the engines map. The act of citing well makes you more citable. Pages with strong outbound citation networks get cited at meaningfully higher rates than pages that float in isolation.

Fix: quote experts by name. Link to original sources. Reference primary research. Build outbound citations into your content as a habit, not an afterthought.

11. Originality (anti-AI-generic signals)

AI engines are increasingly aware that they're being fed their own output back. Content that pattern-matches to generic AI generation — common phrasings, predictable structure, surface-level coverage of well-known topics — gets down-weighted. Original frameworks, first-party data, unique reasoning, and idiosyncratic voice all get rewarded.

Fix: contribute something only you can contribute. Original data from your business, named frameworks for problems in your space, distinct point of view. Don't aggregate what's already said — synthesize and add.

12. User signals (CTR, dwell time)

For engines that have access to user behavior data (Google AI Overviews via Google's broader telemetry, Microsoft Copilot via Bing's), engagement signals from real users contribute to citation decisions. Pages that get clicked, read, and acted on get cited more than pages users bounce from. This signal is weaker for engines without their own search property but still present indirectly through retrieval quality.

Fix: standard UX work. Fast loading, clear structure, content that delivers what the user came for, internal linking that keeps users engaged.

How to use this list

If you have limited time and want maximum impact, focus on signals 1–6. Those produce the most measurable citation share gains in the first 60 days. Signals 7–8 are critical foundations but mostly binary (you either have them or you don't). Signals 9–12 compound over months but rarely produce overnight gains.

Build a baseline: pick 20 target prompts, run them across ChatGPT, Claude, and Perplexity, note which mention you. Work the signals in priority order. Re-check monthly. Trust the trend line.

Audit all 12 signals at once

Reffed's free audit checks each of these signals on your site and ranks fixes by expected impact.

Related reading