How Does ChatGPT Decide Which Sites to Recommend? 5 Signals That Actually Matter

"How does ChatGPT decide which sites to recommend?" is the question every business owner asks once they realize AI search matters. The answers floating around online are usually one of two extremes: either "it's a black box, no one knows" or a 5,000-word agency post selling consulting services. Both are unhelpful. The actual mechanics aren't a mystery. They're just not widely understood yet because the category is new.

Here's a clear, no-jargon explanation of the five signals that actually drive AI citations. Understanding these helps you stop chasing the wrong things and start fixing the right ones.

The two modes ChatGPT operates in

Before getting into the signals, one critical distinction: ChatGPT has two completely different ways of generating answers.

Training mode uses what the model "remembers" from its training data — everything it learned during model training. This mode is static. You can't influence it after the fact. If your brand wasn't widely indexed and discussed before the training cutoff, you're not in there.

Browsing mode queries the live web (via Bing) at the moment the user asks. ChatGPT pulls in current pages, reads them, and synthesizes an answer with citations. This is the mode you can actively optimize for today.

Most "how do I get cited" optimization is really about browsing mode. The five signals below all apply to browsing mode. Training-mode citation requires years of authority building, which is a different and slower game.

Signal 1: Bing indexation (the gatekeeper)

When ChatGPT browses for current information, it queries Bing — not Google. This is the single fact that most people miss. If your site isn't well-indexed in Bing, ChatGPT literally cannot find you during live retrieval. It doesn't matter how well you rank in Google. Bing is the gatekeeper for ChatGPT browsing.

This explains a lot of mysteries. Sites that rank #1 on Google but get zero ChatGPT citations almost always have a Bing indexation problem. Sites that rank middling on Google but score well in Bing often get cited more than their Google rankings would suggest.

What this means for you: register at Bing Webmaster Tools, submit your sitemap, and request indexing on your top pages. This is the highest-leverage 30 minutes most business owners can spend on AI search optimization.

Signal 2: Information density and "answer capsules"

When ChatGPT reads candidate pages, it's looking for specific kinds of passages — what content strategists are now calling "answer capsules." These are self-contained chunks of text, typically 250–400 characters, that directly answer a query. Question-and-answer pairs, definitions, comparison tables, step-by-step lists.

Content that scrolls in long marketing prose with mid-paragraph pivots doesn't extract well. Even if the prose is excellent, the AI struggles to pull a clean citation. Conversely, an FAQ section with 8 clear Q&A pairs is a citation magnet — the AI can grab any single Q&A and cite it standalone.

The pattern AI engines reward: clear question, direct answer, in a structured format (with FAQPage schema wrapping it).

What this means for you: rewrite your content so the most important information lives in extractable chunks. Add an FAQ section to your homepage and main pages. Format answers as standalone paragraphs that work without surrounding context. Wrap them in JSON-LD FAQPage schema.

Signal 3: Brand authority signals from across the web

AI engines weight authority heavily — sometimes more heavily than relevance. A site cited by 50 other sources gets surfaced more than an equally relevant site with 5 sources. The authority signals that matter include backlinks, brand mentions across the web (even unlinked), Wikipedia presence, Crunchbase entries, press coverage, and discussion in communities like Reddit and Hacker News.

This is why famous brands sometimes get cited despite imperfect technical setup. Linear scoring 68 on our audit still gets cited by ChatGPT because of overwhelming brand-authority signals from elsewhere on the web. The technical floor compensates for what's missing on-site.

For smaller businesses without that authority cushion, the calculus reverses. You can't outrank a brand-authority site by being slightly better optimized. You can only win the queries that don't depend on brand authority — long-tail, niche-specific, intent-specific queries.

What this means for you: work on technical readiness first (it's controllable), but plan a long-term authority play. Pitch one industry publication a month. Get listed in 5 directories. Submit your story to local press. Authority compounds.

Signal 4: Schema markup and structured data

JSON-LD schema is how you explicitly tell AI engines what your site is, what it offers, where it operates, and how it relates to other entities. Without schema, engines have to infer everything from prose, and they often guess wrong or skip you.

The schemas that matter most:

Organization (or LocalBusiness for local services) — declares what kind of entity you are
WebSite — basic site metadata
FAQPage — wraps your FAQ sections so engines extract them cleanly
Product or Service — declares specific offerings
Article — for blog posts, includes author and dates for freshness signals

The sameAs property in Organization schema is especially powerful — it links your site to your verified profiles on LinkedIn, Crunchbase, Wikipedia, and Twitter/X. This verification helps AI engines confirm you're a real entity, not a placeholder site.

What this means for you: if you have no schema today, adding Organization + WebSite + FAQPage schemas is a 2–3 hour project that moves citation signals more than almost any other single change.

Signal 5: Content freshness

AI engines prefer pages that have been updated recently. A blog post from 2022 about "current AI trends" hurts more than helps — it signals stale content. Engines look at Article schema modified dates, on-page "last updated" indicators, and the actual content for time-anchored claims.

This affects time-sensitive queries the most. "What's the best [tool category] in 2026" weights freshness heavily. "What is [established concept]" weights freshness less.

The implication for your content strategy: maintenance beats publishing volume. A site with 20 pages, all updated annually, often outranks a site with 200 pages, half of which haven't been touched in 3 years. Stale content drags down site-wide signals.

What this means for you: establish a refresh cadence. Every 6 months, audit your main pages. Update specific data points, refresh examples, modify the modified-date in your Article schema. This is unglamorous but reliable work.

The five signals, prioritized

If you can only work on three things, here's the order:

Bing indexation — the gatekeeper. Without it, nothing else matters for browsing mode.
Schema markup + FAQ structure — together, these handle Signals 2 and 4.
Authority signals — the long game, but start now because it compounds.

Signal 5 (freshness) is maintenance, not a one-time fix. Build it into your workflow.

What AI engines do NOT use (despite what some agencies claim)

A few things that get sold as "AI search signals" but actually don't matter much:

Keyword density. AI engines work on semantic understanding, not keyword stuffing. Repeating "best [keyword] in 2026" 40 times in your content doesn't help.
Meta descriptions. Important for Google clicks, mostly ignored by AI synthesis.
Word count for its own sake. 5,000 words of fluff loses to 800 words of substance.
"AI-friendly" hosting. Hosting doesn't affect citation. Anyone selling you "AI-optimized hosting" is selling vapor.

The 60-second diagnostic

If you want to see where your site sits on these five signals right now, run our free Reffed audit. It checks all of them — robots.txt, Bing visibility, schema coverage, content depth, freshness signals — and gives you a ranked priority list of what to fix first. No signup, no credit card, 60 seconds.

For deeper dives on the individual signals: the 2026 robots.txt setup covers crawler access in detail, our schema markup guide covers which JSON-LD types actually move citation share, and the 12 signals AI engines use expands the list above to include secondary factors that compound the primary five.

The mechanics behind AI citation aren't a black box. They're a few specific signals you can audit, fix, and improve over time. The businesses that understand this and act on it now will be the ones AI assistants recommend in 12 months. The rest will keep wondering why ChatGPT keeps recommending their competitors.