Measurement

How to Measure Whether AI Actually Mentions Your Brand

You can't improve what you don't measure — and "I asked ChatGPT once and it mentioned us" isn't measurement. Here's how to build a real AI-visibility baseline.

Ozvor Research26 June 20268 min read

Key takeaways

AI answers are non-deterministic: ask the same prompt twice and the named brands can differ — so single checks are noise, not data.
A real baseline means a fixed prompt set, run repeatedly, across multiple engines, scored consistently.
Track three things: presence (are you named?), position (how prominently?), and sentiment (how favourably?).
Re-run on a schedule so you can see whether your GEO work is actually moving the needle.

Ask an AI assistant whether it recommends your business and it might say yes. Ask again an hour later and it might not mention you at all. That's not a glitch — it's how these systems work. Which means the casual "I checked and we're in there" tells you almost nothing. To know where you really stand, you need to measure like a scientist, not a tourist.

Why measuring AI visibility is genuinely hard

Large language models are non-deterministic: the same prompt can yield different answers across runs, because of sampling, live-retrieval variation, and personalisation. Search Engine Land's repeated-run testing found only about five brands tend to surface per category — and the set shifts between runs. So a single query is a coin flip, not a measurement.

Search Engine Land — repeated ChatGPT runs & brand visibility (~5 brands surface per category) — searchengineland.com/repeated-chatgpt-runs-brand-visibility-468552

The implication is simple but easy to ignore: you must run each prompt multiple times and aggregate, or you're measuring randomness.

A measurement method that holds up

Fix a prompt set. Write 20–50 prompts a real customer would ask — "best [category] in [city]", "alternatives to [competitor]", "is [your brand] any good?". Keep them stable so results are comparable over time.
Cover the engines that matter. Run the set across ChatGPT, Claude, Perplexity, Gemini, and Google AI Overviews — visibility differs sharply between them.
Repeat each prompt. Run every prompt several times per engine and aggregate. One run is anecdote; several is data.
Score consistently. For each result, record presence (named or not), position (first, in a list, an afterthought), and sentiment (positive, neutral, negative).
Log the sources. Note which sites the engine cited — that tells you where to invest (your site, Reddit, reviews, LinkedIn).

The three metrics that matter

Presence (Share of Voice). Across your prompt set and engines, how often are you named at all — and how does that compare to competitors? This is your headline number.
Position. Being named first or in the lead sentence is worth far more than a passing mention at the end. Weight accordingly.
Sentiment. AI doesn't just list you — it characterises you. "Reliable and well-reviewed" and "a budget option with mixed feedback" are both mentions; only one wins customers.

Why cadence beats one-off checks

AI search rewards freshness, and the underlying models and indexes change constantly. A baseline you measure once decays immediately. Re-running your prompt set on a schedule — monthly is a sensible floor — turns a snapshot into a trend line, which is the only way to know whether publishing that content, earning those reviews, or fixing your schema actually changed anything.

Ahrefs — fresh content and AI citations — ahrefs.com/blog/fresh-content/

DIY or tooled?

You can absolutely start by hand: a spreadsheet, a fixed prompt list, and a disciplined monthly hour. It's tedious and the repetition-and-aggregation step is easy to skimp on, but it's real measurement and it's free. As it grows — more prompts, more engines, more competitors, more runs — automated tracking pays for itself by doing the repetition consistently and scoring it the same way every time. Either way, the principle is identical: fixed prompts, multiple runs, multiple engines, consistent scoring, on a schedule.

Start with a baseline this week

Pick ten prompts your customers actually ask. Run each three times across two or three engines. Tally presence, position, and sentiment, and note the sources cited. That single afternoon gives you something most of your competitors have never had: an honest answer to "when a customer asks AI, does it name us — and what does it say?" Everything else in GEO is about moving those numbers.

Sources

Search Engine Land — repeated ChatGPT runs & brand visibility (~5 brands surface per category) — searchengineland.com/repeated-chatgpt-runs-brand-visibility-468552
Ahrefs — fresh content and AI citations — ahrefs.com/blog/fresh-content/
Siana Marketing — where ChatGPT gets its information (2026 report) — sianamarketing.com
Semrush — the most-cited domains in AI: a 3-month study — semrush.com/blog/most-cited-domains-ai/
Aggarwal et al., "GEO: Generative Engine Optimization," Princeton / Georgia Tech / Allen Institute for AI / IIT Delhi, KDD 2024 — arxiv.org/abs/2311.09735