What is the AI Referral Index (ARI)?

ARI is the percentage of times a practice is surfaced as a trusted option across a structured 75-prompt panel of patient queries, measured monthly across the five major AI engines (ChatGPT, Google AI Overviews, Perplexity, Gemini, and Claude). ARI measures recommendation surface area — when AI is asked, how often does your name come up?

How is ARI calculated?

ARI is calculated in two steps. Per-engine ARI = (Referred Queries ÷ Total Queries) × 100. Composite ARI = Σ (ARI per engine × Engine Weight). Engine weights for v1.2: ChatGPT 0.32, Google AI Overviews 0.28, Perplexity 0.16, Gemini 0.14, Claude 0.10.

What is the Apicus Query Panel?

The Query Panel is the set of 75 prompts measured for each practice each month. It is custom-built around the practice's metro, case mix, local competitor set, and patient acquisition priorities. Prompts span seven categories: Direct local, Procedure-specific local, Decision-stage, Cost and financing, Comparative, Symptom-driven, and Re-treatment.

Which AI engines does Apicus measure?

Apicus measures five AI engines: ChatGPT (OpenAI), Google AI Overviews (Google), Perplexity (Perplexity AI), Gemini (Google), and Claude (Anthropic). All measurements use public, consumer-facing engine surfaces — not API access — to mirror what a real prospective patient sees.

What is the Source Attribution Gap?

The Source Attribution Gap measures the difference between a practice's current authority sources and those of the highest-ARI competitor in their market. AI engines consistently weight corroborated third-party signals — such as Healthgrades, local media coverage, Reddit discussions, and industry publications — more heavily than isolated self-published claims. Closing this gap is the core mechanism of ARI improvement.

What is Recommendation Strength?

Recommendation Strength scores each AI citation on a 0–3 scale: 0 = not cited, 1 = mentioned in a list without endorsement, 2 = described positively as a credible option, 3 = positioned as the primary recommendation with reinforcing language. ARI measures whether a practice appears; Recommendation Strength measures how strongly the engine positions the practice when it does.

What is the 120-Day Results Guarantee?

Apicus guarantees that any qualifying practice will be referred across at least 60% of its custom 75-prompt Query Panel within 120 days of engagement start, or Apicus continues working at no additional cost for up to 90 additional days (through Day 210). The guarantee applies to practices that complete onboarding, approve and publish all content Apicus produces, and maintain the citations Apicus places.

The AI Referral Index Methodology

§1Definition

ARI is the percentage of times a practice is surfaced as a trusted option across a structured 75-prompt panel of patient queries, measured monthly across the five major AI engines.

ARI measures recommendation surface area. It does not measure search rank, click-through, website traffic, or conversion. ARI answers a single question: when AI is asked, how often does your name come up?

ARI measures recommendation frequency, not recommendation strength. A practice briefly mentioned among many options is treated differently from a practice positioned as a primary recommendation. Qualitative recommendation analysis is reported separately where applicable.

In plain language

If a hundred prospective patients each opened ChatGPT and asked the most common implant-related questions about your market, how many of those answers would mention your practice?

§2The formula

Two formulas. The first is the per-engine ARI, the cleanest measurement, taken inside a single AI engine. The second is the Engine-Weighted ARI, the composite headline number reported in client deliverables.

PER-ENGINE ARI

ARI(engine) = ( Referred Queries ÷ Total Queries ) × 100

ENGINE-WEIGHTED ARI (COMPOSITE HEADLINE)

ARI = Σ ( ARI(engine) × Engine Weight )

Engine weights for v1.2 are calibrated to consumer search behavior in U.S. dental implant queries:

Engine	Weight	Rationale
ChatGPT	0.32	Largest active user base for purchase-intent queries.
Google AI Overviews	0.28	Highest blended exposure via standard Google search.
Perplexity	0.16	Strong citation transparency, growing intent share.
Gemini	0.14	Google ecosystem integration, Android default.
Claude	0.10	Smaller user base, premium intent profile.

Note

Engine weights are reviewed every 90 days. Material changes increment the methodology version (see §13).

§3The Query Panel

The Query Panel is the set of 75 prompts measured for each practice. It is custom-built during onboarding around four inputs: practice metro, case mix (single-tooth-heavy vs full-arch-heavy), local competitor set, and the practice's stated patient acquisition priorities. The panel is customized to the practice's specific case mix — implants, full-arch, gum grafting, bone grafting, oral surgery, cosmetic perio, and sedation — not implants exclusively.

Prompts are distributed across seven categories that mirror the actual decision journey of a high-value procedure patient:

Category	#	Example prompt
Direct local	12	best implant dentist in [metro]
Procedure-specific local	18	best gum grafting specialist in [metro]
Decision-stage	10	oral surgery recovery time
Cost & financing	8	how much does bone grafting cost in [metro]
Comparative	8	[competitor practice] reviews
Symptom-driven	10	missing teeth options near me
Re-treatment / failure	9	failed dental implant fix [metro]

The 25 highest-value queries

A subset of 25 queries, drawn primarily from the procedure-specific, direct local, and cost categories, is designated as the practice's highest-value query set. These prompts are the ones most likely to convert to high-ticket consultations (full-arch, All-on-X, full-mouth restoration). The 120-Day Results Guarantee is benchmarked against the full Query Panel. A practice must be referred across at least 60% of all 75 queries by Day 120.

Query categories are intentionally weighted toward commercial-intent and treatment-selection behavior rather than informational traffic volume alone. A citation for "All-on-4 specialist [metro]" carries materially greater strategic value than a citation for broad informational queries with low treatment intent.

§4The five AI engines

ARI is currently measured across the five engines that account for the overwhelming majority of consumer-facing AI search activity in the United States.

—

ChatGPT

OpenAI's default web model, accessed via chatgpt.com in an anonymous session.

—

Claude

Anthropic's default web model, accessed via claude.ai in an anonymous session.

—

Perplexity

Default model, web mode (not pro/research mode), accessed via perplexity.ai.

—

Gemini

Google's default web model, accessed via gemini.google.com.

—

Google AI Overviews

The AI-generated answer block as it renders directly on google.com search results.

Surface choice

All measurements use public, consumer-facing engine surfaces. We do not use API access. The goal is to mirror what a real prospective patient sees.

§5Sampling protocol

Frequency

Once per month, executed during the first seven business days.

Repetition

Each prompt is run three times per engine. The median outcome is recorded to control for the inherent stochasticity of LLM responses.

Geographic targeting

Each measurement is geo-anchored to the practice's primary metro using IP geolocation. Secondary runs are taken from suburban anchor points where the catchment extends.

Session hygiene

All measurements are taken in fresh, anonymous sessions. No logged-in accounts, no chat history, no personalization signals.

§6Engine Coverage Weight

ARI alone can mislead. A practice cited 70% of the time on a single engine but never named on the other four has a real distribution problem. Most of its theoretical patient pool is being pointed elsewhere.

ENGINE COVERAGE

Coverage = Engines citing the practice for ≥1 query ÷ 5

A practice with 50% ARI concentrated in one engine has Coverage = 0.20. A practice with 30% ARI distributed across four engines has Coverage = 0.80, and is, in most market scenarios, the more durable position. Both numbers are reported side-by-side in client deliverables.

§7Implied Authority Index

The Implied Authority Index (IAI) is an optional secondary metric used in highly competitive multi-doctor metros. It is a 0–100 composite combining three inputs to measure the qualitative depth of a practice's visibility.

Component	Weight	What it captures
ARI	50%	Recommendation surface area across the engine set.
Engine Coverage	25%	Distribution of citations across all five engines.
Citation Source Quality	25%	The authority of the third-party sources the engines reference when naming the practice.

Citation Source Quality is the most nuanced input. A practice cited because the engine references the practice's own property (its website) is weaker than a practice cited because Healthline, the local NBC affiliate, or a peer-reviewed registry references it. The IAI captures that difference numerically.

The IAI also incorporates Source Attribution Gap analysis, which evaluates the relative authority and diversity of the external sources AI engines use when recommending competing practices. A practice cited across a broader network of trusted third-party sources generally maintains a more durable recommendation position than a practice dependent primarily on first-party content.

§8Recommendation Strength

Not all AI citations carry equal weight.

A practice briefly listed among ten options is qualitatively different from a practice positioned as the primary recommendation with strong endorsement language. ARI measures whether a practice appears. Recommendation Strength measures how strongly the engine positions the practice when it does appear.

Recommendation Strength is reported as a secondary qualitative layer in competitive metros and high-value query audits.

Scoring model

Each citation is classified into one of four categories:

Score	Classification	Example behavior
0	Not cited	Practice not named
1	Mentioned	Practice included in a broader list without endorsement
2	Recommended	Practice described positively or surfaced as a credible option
3	Strongly endorsed	Practice positioned as a leading or preferred choice with reinforcing language

Example

Practice A is cited in 40% of queries, but only as one name among many.

Practice B is cited in 35% of queries, but repeatedly described as “one of the best implant specialists in the metro” across multiple engines.

In real patient behavior, Practice B may hold the stronger perceived authority position despite the lower raw ARI.

Relationship to ARI

Recommendation Strength does not alter ARI calculations.

ARI remains the primary visibility metric because it is binary, stable, and benchmarkable across markets and time periods. Recommendation Strength exists to add qualitative context to how AI engines frame the practice once cited.

§9Source Attribution Gap

In many markets, the visibility gap is not caused by clinical capability or treatment quality. It is caused by uneven third-party corroboration.

Example

Competitor practices may be repeatedly supported by Healthgrades, local media features, Reddit discussions, “best implant dentist” articles, and review-platform signals.

Meanwhile, the audited practice may rely primarily on first-party website content with limited external authority reinforcement.

AI engines consistently weight corroborated third-party signals more heavily than isolated self-published claims.

Common citation-source categories

Source type	Example
Review platforms	Healthgrades, Zocdoc, Yelp
Local authority	Tampa Magazine, NBC affiliate coverage
Industry publications	Dental Economics, Inside Dentistry
Community discussion	Reddit, Quora, local forums
Structured educational content	FAQ pages, procedure explainers
Comparative content	"Best implant dentist" lists
First-party content	Practice website, blog, service pages

Two practices with similar ARI scores may have dramatically different long-term positioning durability.

A practice supported by:

— local authority coverage
— high-trust healthcare platforms
— structured educational content
— and broad third-party corroboration

is generally more resilient across engine updates than a practice dependent almost entirely on its own website.

Relationship to the Implied Authority Index (IAI)

Source Attribution Gap analysis informs the Citation Source Quality component inside the Implied Authority Index (IAI).

Practices supported primarily by authoritative third-party sources will generally score higher than practices relying mostly on first-party references.

§10Entity Association

AI engines organize information through relationships between practices, procedures, locations, and authority signals.

Practices are more likely to surface for high-value implant queries when strong associations exist between the practice and specific treatment entities such as:

— All-on-4
— full-mouth restoration
— gum grafting
— bone grafting
— oral surgery
— sedation dentistry

Weak or inconsistent associations can reduce visibility even when a practice clinically performs the procedure.

Common entity signals

Signal type	Example
Structured data	FAQ Schema, Organization Schema
Procedure pages	Dedicated All-on-4 or sedation pages
Third-party mentions	Local articles or directories
Review language	Patients mentioning implants or sedation
Geographic association	"Dental implants in South Tampa"

AI engines favor practices with clear, repeated, and corroborated associations between:

— the practice
— the procedure
— the geography
— and trusted external sources

A substantial portion of AI visibility optimization is the process of strengthening those associations across the web.

§11Beyond ARI · visibility as the contract metric

A reasonable reader of this methodology asks the obvious question: why measure referrals rather than traffic, leads, or consults? The answer is structural. AI search has broken attribution in ways conventional analytics tools cannot fully recover.

The answer-only patient

When ChatGPT answers “best implant dentist in Charlotte” and names two or three practices, a substantial fraction of patients do not click anything. They take the names, open Google, search the practice directly, and arrive at the website as branded organic traffic, or as direct traffic, or as a phone call from the parking lot. The journey began in ChatGPT. The analytics tag never sees it.

Referrer data is unreliable across engines

Even when a patient does click a citation, the referrer signal varies wildly. Perplexity passes clean referrer data. ChatGPT often passes none. Google AI Overviews blend with organic search referrers. Voice assistants and in-app surfaces (Siri, Alexa, ChatGPT mobile) leave no digital trail at all.

Conversion is a function of the practice, not the visibility

Whether an AI-referred patient books a consult depends on the practice's intake script, treatment coordinator, financing options, and consult experience. A vendor measuring its work by consult volume is measuring the practice's operations, not its own delivery. ARI isolates the variable Apicus actually controls.

Why this matters for the guarantee

The 120-Day Results Guarantee commits Apicus to a metric we can deliver with discipline. If we miss the benchmark by Day 120, we keep working at no additional cost until we hit it, for up to an additional 90 days. A traffic or consult guarantee would commit us to dozens of variables outside the visibility vendor's hands. We refuse the second guarantee because we believe the first is the honest one.

The three-tier measurement model

Every Apicus client receives three layers of measurement. Only the first is the basis of the engagement and the guarantee.

Tier	Metric	Status
Tier 1	ARI (per the methodology above)	Agreement metric · what the 120-Day Results Guarantee commits to
Tier 2	AI-Referred Sessions	Directional signal · honestly under-counted
Tier 3	Practice-Reported Intake	Qualitative confirmation · gathered quarterly

Tier 2 · AI-Referred Sessions

We instrument client analytics for traffic that arrives with identifiable AI provenance: clicks from Perplexity, ChatGPT, and Google AI Overview citations; UTM-tagged inbound from citation-stack assets we deploy; and dynamic call-tracking on dedicated landing pages. This is reported as a directional trend, not a benchmark.

Disclosed in every monthly report

Tier 2 figures undercount the true AI-driven funnel, likely by a factor of 2× or more, because of the answer-only patient and the referrer-data gaps described above. We do not let this number drive engagement decisions.

Tier 3 · Practice-Reported Intake

Once per quarter, the practice's treatment coordinator or front-desk team logs how new consult patients describe finding the practice. Phrases like “I asked ChatGPT,” “my AI assistant referred you,” or “Google said you specialize in” are recorded.

The act of asking is itself an intervention. Most practices have no idea how many of their new patients first heard their name from an AI engine until they start listening for it. Tier 3 frequently surfaces the most powerful evidence in the engagement, and it costs the practice roughly 60 seconds per new-patient intake.

In summary

ARI is the metric Apicus guarantees because it is the metric Apicus can guarantee honestly. The other two tiers exist to confirm, directionally and qualitatively, that the visibility work is moving real patient behavior. None of them substitute for ARI. All three reinforce it.

§12Limitations & honest caveats

We are explicit about what ARI cannot do. A measurement standard worth citing is one that names its own failure modes.

— ARI measures visibility, not conversion. A practice with 60% ARI can close fewer consults than a practice with 40%. That is a function of the practice's patient experience, treatment coordinator, and intake process, not of AI visibility. We do not promise consult volume.
— AI engine behavior is non-deterministic. Even with median-of-three sampling, a 2-point monthly variance is normal and not necessarily indicative of methodology drift or performance regression.
— Engine algorithm changes can shift baselines without warning. When this occurs, we publish a methodology bulletin and re-baseline affected client benchmarks.
— Geo-targeting is imperfect. AI engines do not always honor IP-based location signals cleanly. Patient queries originating outside the targeted metro are not measured.
— We do not currently measure voice assistants or in-app surfaces. Siri, Alexa, Google Assistant, ChatGPT mobile app, and Bing Chat are out of scope for v1.2. They are candidates for future versions.
— The methodology is opinionated, not neutral. Engine weights, query category distribution, and IAI components reflect Apicus's judgment of what matters for high-value oral health procedures specifically. They may not generalize to other verticals.

§13Versioning

Each material change to the methodology, such as engine weight adjustments, query panel revisions, or sampling protocol changes, increments the version number and is documented below.

v1.2 May 2026 Current

Added Engine Coverage Weight as a co-reported metric
Engine weights revised based on Q1 2026 usage data
Query Panel expanded from 60 to 75 prompts
Standardized median-of-three sampling protocol

v1.1 February 2026

Added Perplexity to the engine set
Standardized monthly measurement cadence
Introduced the highest-value query subset

v1.0 November 2025

Initial methodology release
Four-engine baseline (ChatGPT, Claude, Gemini, Google AI Overviews)
60-prompt query panel

§14Citation

If you reference this methodology in research, content, vendor evaluations, or client deliverables, please cite as follows:

SUGGESTED CITATION

Apicus Research. (2026). The AI Referral Index (ARI) Methodology, v1.2. Retrieved from apicus.ai/methodology

Direct attribution is appreciated but not required. Apicus maintains the ARI methodology as a public standard for AI visibility measurement.

The AI Referral Index.