Disclosure up top: I run Reffed, which competes with Otterly in the AI search visibility space. So you should read this with appropriate skepticism. I'm going to try to be fair about where Otterly outperforms Reffed and where it doesn't. If you want a sales pitch, this isn't it. If you want my actual workflow for tracking AI citations on my own sites, this is.

I subscribed to Otterly AI in February 2026 at the $29/mo Lite tier. I ran it for three months on three different domains — Reffed itself, a personal portfolio site, and a client's B2B SaaS. In May 2026 I downgraded to free and built the rest of my measurement stack from other tools. This post explains why, and what I'd recommend depending on what you're actually trying to do.

What Otterly does well

Three things, genuinely.

Prompt-by-prompt visibility tracking. Otterly lets you define a library of buyer prompts ("what's the best CRM for a 10-person sales team," "AI search visibility tools for small businesses") and runs each one regularly across ChatGPT, Perplexity, and a few other engines. You see exactly which prompts cite your brand and which don't. This is genuinely useful and Otterly does it better than the alternatives.

Competitor benchmarking on those prompts. When Otterly runs a prompt, it logs every brand mentioned, not just yours. Over time this builds a competitive intelligence dataset showing share of voice per query category. If you want to know whether your brand is gaining or losing ground against named competitors in specific buyer prompts, Otterly is the cleanest tool I've used for it.

Historical trend data. Three months of weekly snapshots gave me a clear picture of how citation rates moved as I shipped content. Not perfect — ChatGPT response variance adds noise — but directionally useful enough to make decisions on.

What Otterly didn't do for me

Four things mattered enough that I downgraded.

The prompts are still my prompts. Otterly only tracks the prompts I tell it to track. If a buyer asks ChatGPT something I didn't think to define, I have no visibility into whether I was cited or not. This is a fundamental limitation of any prompt-library tool — you only see what you measure, and the AI citation landscape has too many high-intent queries for a hand-curated library to cover them all.

Citations vs mentions are blurred. Otterly logs when your brand appears in an AI response, but doesn't always distinguish between a citation (with link) and a mention (without link). Those are different beasts strategically — a citation drives traffic, a mention drives recall and won't necessarily produce a session. The reporting is improving but wasn't there for me three months ago.

It doesn't tell you why. Otterly shows you that your brand was or wasn't cited. It doesn't tell you what specifically about your content or absence drove the outcome. To act on the data you have to bring your own audit, which most operators don't have time to do.

$29 felt high for a teaser. The $29 Lite tier limits you to 3 engines and 15 prompts. To track ChatGPT plus Claude plus Perplexity plus Gemini plus Copilot on more than 15 prompts I'd need to upgrade to Standard ($189/mo) or Pro ($989/mo). That math doesn't work for an independent operator.

What I use instead

My current measurement stack is three tools, total cost about $30/month, that together cover more of the citation landscape than Otterly did on its own.

Bing Webmaster Tools AI Performance dashboard (free). This is the biggest unlock. Microsoft launched the AI Performance dashboard in February 2026. It shows real ChatGPT citation events — the actual queries that drove the AI to cite your page, the actual sub-queries from query fan-out, the actual pages cited. First-party data, not sampled prompt estimation. Setup takes 4 minutes if you're already in Google Search Console.

The catch: BWT only covers ChatGPT (which runs on Bing's index) and Microsoft Copilot. It doesn't show Claude, Perplexity, or Gemini data. But ChatGPT plus Copilot is 60-70% of the AI search market in 2026, so this single tool covers more than half the citation landscape with first-party precision that no third-party tool can match.

Reffed Watch ($59/mo). Disclosure: I run Reffed. But the reason I use Watch isn't loyalty to my own product — it's that Watch fills in the gap BWT leaves. Watch runs a weekly deep audit on whatever domain you point it at, scoring readiness across all six AI engines (ChatGPT, Claude, Perplexity, Google AI Overviews, Gemini, Copilot) with per-engine breakdown. It's not citation tracking in the Otterly sense — it's readiness tracking. It tells you whether your site is structurally set up to be cited, not which specific prompts have cited you.

I use Watch and BWT together: BWT tells me what's actually happening, Watch tells me whether I'm structurally improving or slipping. Different questions, complementary answers.

Manual prompt testing once a month (free). Once a month I spend 30 minutes running about 12 buyer prompts manually through ChatGPT, Claude, and Perplexity. I log which ones cite me. This is the cheap, low-tech version of what Otterly automates. The downside is variance — running the same prompt manually once isn't reliable. The upside is I can change the prompts month-to-month based on what BWT shows me are real buyer query patterns. Otterly forces you to commit to a static prompt library; manual testing lets you adapt monthly to the actual citation landscape.

When Otterly is actually the right answer

There are real scenarios where Otterly outperforms my stack.

If you're an in-house marketing team at a $20M+ ARR company with multiple stakeholders who need a shared dashboard, Otterly Standard or Pro is probably worth it. The competitor benchmarking specifically is something neither BWT nor Reffed does as well. If your job is to show your CMO that your brand is gaining share of voice against three named competitors, Otterly is built for that conversation.

If you're tracking specifically Perplexity citations (Perplexity gives you native Brand Analytics, but it's noisy), Otterly's Perplexity coverage is competent. BWT doesn't cover Perplexity at all.

If you run an agency tracking citations for multiple clients, Otterly's multi-client features may save you time vs running BWT and Reffed Watch separately per client.

If you have $200+/mo in budget for AI citation tracking specifically and you want one tool to manage it, Otterly Standard is a reasonable choice. The product is well-built and improving fast.

When my stack is the right answer

If you're a solo founder, indie operator, or small marketing team with a budget under $100/mo, my stack covers more of the citation picture than Otterly Lite does at less than half the cost.

If you care more about why you're cited or not (the actionable layer) than that you're cited (the reporting layer), BWT plus Reffed plus manual prompts gives you both. Otterly gives you mostly the reporting layer with limited diagnostic depth.

If ChatGPT is your primary AI citation target — which it is for most B2B and many D2C categories — BWT alone covers it better than any third-party tool can. Use the free Microsoft tool first. Pay for third-party tracking only when you've outgrown BWT's coverage.

The Profound / Athena / Peec / Brand Radar comparison

A few people will ask why I didn't compare Otterly to Profound, Athena, Peec, or Ahrefs Brand Radar. Short answer: all of those are similar to Otterly architecturally. They run prompts against AI engines and log brand mentions. The differences are mostly UI quality, pricing tiers, and which engines they cover. The fundamental "you only see what you measure" limitation applies to all of them. So my analysis of why I downgraded Otterly transfers reasonably well to those alternatives.

If you're choosing between Otterly and one of those: try the free tier of each, run the same week's worth of tracking, and see which UI fits your workflow. The data quality differences are smaller than the marketing makes them seem.

What this means for you

If you're spending money on AI citation tracking right now, ask yourself: are you using the data to do something specific, or is it dashboard furniture? If it's furniture, downgrade. If it's actually changing what you ship, the spend is justified. For most independent operators and small teams, BWT + Reffed Watch + monthly manual testing covers more of the picture for less money than any single third-party tool.

If you want to see your site's current AI search readiness across all six engines, run a free Reffed audit — no signup, no credit card, full per-engine breakdown. If you want the weekly Monday automation, Reffed Watch is $59/mo and complements Bing Webmaster Tools nicely.