v1.2 · PUBLIC Last revised · May 2026 Status · Active standard
Methodology

The AI Referral Index.

A public methodology for measuring whether AI engines name your practice when patients ask which specialist to see.

Maintained by
Apicus Research
License
Public · attribution requested
Reading time
≈ 9 minutes

ARI is the headline metric every Apicus client tracks. It exists to give that question a rigorous, repeatable answer.

§1Definition

ARI is the percentage of times a practice is surfaced as a trusted option across a structured 75-prompt panel of patient queries, measured monthly across the five major AI engines.

ARI measures recommendation surface area. It does not measure search rank, click-through, website traffic, or conversion. ARI answers a single question: when AI is asked, how often does your name come up?

ARI measures recommendation frequency, not recommendation strength. A practice briefly mentioned among many options is treated differently from a practice positioned as a primary recommendation. Qualitative recommendation analysis is reported separately where applicable.

In plain language

If a hundred prospective patients each opened ChatGPT and asked the most common implant-related questions about your market, how many of those answers would mention your practice?


§2The formula

Two formulas. The first is the per-engine ARI, the cleanest measurement, taken inside a single AI engine. The second is the Engine-Weighted ARI, the composite headline number reported in client deliverables.

PER-ENGINE ARI

ARI(engine) = ( Referred Queries ÷ Total Queries ) × 100

ENGINE-WEIGHTED ARI (COMPOSITE HEADLINE)

ARI = Σ ( ARI(engine) × Engine Weight )

Engine weights for v1.2 are calibrated to consumer search behavior in U.S. dental implant queries:

Engine Weight Rationale
ChatGPT 0.32 Largest active user base for purchase-intent queries.
Google AI Overviews 0.28 Highest blended exposure via standard Google search.
Perplexity 0.16 Strong citation transparency, growing intent share.
Gemini 0.14 Google ecosystem integration, Android default.
Claude 0.10 Smaller user base, premium intent profile.

Note

Engine weights are reviewed every 90 days. Material changes increment the methodology version (see §13).


§3The Query Panel

The Query Panel is the set of 75 prompts measured for each practice. It is custom-built during onboarding around four inputs: practice metro, case mix (single-tooth-heavy vs full-arch-heavy), local competitor set, and the practice's stated patient acquisition priorities. The panel is customized to the practice's specific case mix — implants, full-arch, gum grafting, bone grafting, oral surgery, cosmetic perio, and sedation — not implants exclusively.

Prompts are distributed across seven categories that mirror the actual decision journey of a high-value procedure patient:

Category # Example prompt
Direct local 12 best implant dentist in [metro]
Procedure-specific local 18 best gum grafting specialist in [metro]
Decision-stage 10 oral surgery recovery time
Cost & financing 8 how much does bone grafting cost in [metro]
Comparative 8 [competitor practice] reviews
Symptom-driven 10 missing teeth options near me
Re-treatment / failure 9 failed dental implant fix [metro]

The 25 highest-value queries

A subset of 25 queries, drawn primarily from the procedure-specific, direct local, and cost categories, is designated as the practice's highest-value query set. These prompts are the ones most likely to convert to high-ticket consultations (full-arch, All-on-X, full-mouth restoration). The 120-Day Results Guarantee is benchmarked against the full Query Panel. A practice must be referred across at least 60% of all 75 queries by Day 120.

Query categories are intentionally weighted toward commercial-intent and treatment-selection behavior rather than informational traffic volume alone. A citation for "All-on-4 specialist [metro]" carries materially greater strategic value than a citation for broad informational queries with low treatment intent.


§4The five AI engines

ARI is currently measured across the five engines that account for the overwhelming majority of consumer-facing AI search activity in the United States.

ChatGPT

OpenAI's default web model, accessed via chatgpt.com in an anonymous session.

Claude

Anthropic's default web model, accessed via claude.ai in an anonymous session.

Perplexity

Default model, web mode (not pro/research mode), accessed via perplexity.ai.

Gemini

Google's default web model, accessed via gemini.google.com.

Google AI Overviews

The AI-generated answer block as it renders directly on google.com search results.

Surface choice

All measurements use public, consumer-facing engine surfaces. We do not use API access. The goal is to mirror what a real prospective patient sees.


§5Sampling protocol

Frequency

Once per month, executed during the first seven business days.

Repetition

Each prompt is run three times per engine. The median outcome is recorded to control for the inherent stochasticity of LLM responses.

Geographic targeting

Each measurement is geo-anchored to the practice's primary metro using IP geolocation. Secondary runs are taken from suburban anchor points where the catchment extends.

Session hygiene

All measurements are taken in fresh, anonymous sessions. No logged-in accounts, no chat history, no personalization signals.

Archive

All raw responses, timestamps, and engine version metadata are archived for twelve months.


§6Engine Coverage Weight

ARI alone can mislead. A practice cited 70% of the time on a single engine but never named on the other four has a real distribution problem. Most of its theoretical patient pool is being pointed elsewhere.

ENGINE COVERAGE

Coverage = Engines citing the practice for ≥1 query ÷ 5

A practice with 50% ARI concentrated in one engine has Coverage = 0.20. A practice with 30% ARI distributed across four engines has Coverage = 0.80, and is, in most market scenarios, the more durable position. Both numbers are reported side-by-side in client deliverables.


§7Implied Authority Index

The Implied Authority Index (IAI) is an optional secondary metric used in highly competitive multi-doctor metros. It is a 0–100 composite combining three inputs to measure the qualitative depth of a practice's visibility.

Component Weight What it captures
ARI 50% Recommendation surface area across the engine set.
Engine Coverage 25% Distribution of citations across all five engines.
Citation Source Quality 25% The authority of the third-party sources the engines reference when naming the practice.

Citation Source Quality is the most nuanced input. A practice cited because the engine references the practice's own property (its website) is weaker than a practice cited because Healthline, the local NBC affiliate, or a peer-reviewed registry references it. The IAI captures that difference numerically.

The IAI also incorporates Source Attribution Gap analysis, which evaluates the relative authority and diversity of the external sources AI engines use when recommending competing practices. A practice cited across a broader network of trusted third-party sources generally maintains a more durable recommendation position than a practice dependent primarily on first-party content.


§8Recommendation Strength

Not all AI citations carry equal weight.

A practice briefly listed among ten options is qualitatively different from a practice positioned as the primary recommendation with strong endorsement language. ARI measures whether a practice appears. Recommendation Strength measures how strongly the engine positions the practice when it does appear.

Recommendation Strength is reported as a secondary qualitative layer in competitive metros and high-value query audits.

Scoring model

Each citation is classified into one of four categories:

Score Classification Example behavior
0 Not cited Practice not named
1 Mentioned Practice included in a broader list without endorsement
2 Recommended Practice described positively or surfaced as a credible option
3 Strongly endorsed Practice positioned as a leading or preferred choice with reinforcing language

Example

Practice A is cited in 40% of queries, but only as one name among many.

Practice B is cited in 35% of queries, but repeatedly described as “one of the best implant specialists in the metro” across multiple engines.

In real patient behavior, Practice B may hold the stronger perceived authority position despite the lower raw ARI.

Relationship to ARI

Recommendation Strength does not alter ARI calculations.

ARI remains the primary visibility metric because it is binary, stable, and benchmarkable across markets and time periods. Recommendation Strength exists to add qualitative context to how AI engines frame the practice once cited.


§9Source Attribution Gap

In many markets, the visibility gap is not caused by clinical capability or treatment quality. It is caused by uneven third-party corroboration.

Example

Competitor practices may be repeatedly supported by Healthgrades, local media features, Reddit discussions, “best implant dentist” articles, and review-platform signals.

Meanwhile, the audited practice may rely primarily on first-party website content with limited external authority reinforcement.

AI engines consistently weight corroborated third-party signals more heavily than isolated self-published claims.

Common citation-source categories

Source type Example
Review platforms Healthgrades, Zocdoc, Yelp
Local authority Tampa Magazine, NBC affiliate coverage
Industry publications Dental Economics, Inside Dentistry
Community discussion Reddit, Quora, local forums
Structured educational content FAQ pages, procedure explainers
Comparative content "Best implant dentist" lists
First-party content Practice website, blog, service pages

Two practices with similar ARI scores may have dramatically different long-term positioning durability.

A practice supported by:

  • local authority coverage
  • high-trust healthcare platforms
  • structured educational content
  • and broad third-party corroboration

is generally more resilient across engine updates than a practice dependent almost entirely on its own website.

Relationship to the Implied Authority Index (IAI)

Source Attribution Gap analysis informs the Citation Source Quality component inside the Implied Authority Index (IAI).

Practices supported primarily by authoritative third-party sources will generally score higher than practices relying mostly on first-party references.


§10Entity Association

AI engines organize information through relationships between practices, procedures, locations, and authority signals.

Practices are more likely to surface for high-value implant queries when strong associations exist between the practice and specific treatment entities such as:

  • All-on-4
  • full-mouth restoration
  • gum grafting
  • bone grafting
  • oral surgery
  • sedation dentistry

Weak or inconsistent associations can reduce visibility even when a practice clinically performs the procedure.

Common entity signals

Signal type Example
Structured data FAQ Schema, Organization Schema
Procedure pages Dedicated All-on-4 or sedation pages
Third-party mentions Local articles or directories
Review language Patients mentioning implants or sedation
Geographic association "Dental implants in South Tampa"

AI engines favor practices with clear, repeated, and corroborated associations between:

  • the practice
  • the procedure
  • the geography
  • and trusted external sources

A substantial portion of AI visibility optimization is the process of strengthening those associations across the web.


§11Beyond ARI · visibility as the contract metric

A reasonable reader of this methodology asks the obvious question: why measure referrals rather than traffic, leads, or consults? The answer is structural. AI search has broken attribution in ways conventional analytics tools cannot fully recover.

The answer-only patient

When ChatGPT answers “best implant dentist in Charlotte” and names two or three practices, a substantial fraction of patients do not click anything. They take the names, open Google, search the practice directly, and arrive at the website as branded organic traffic, or as direct traffic, or as a phone call from the parking lot. The journey began in ChatGPT. The analytics tag never sees it.

Referrer data is unreliable across engines

Even when a patient does click a citation, the referrer signal varies wildly. Perplexity passes clean referrer data. ChatGPT often passes none. Google AI Overviews blend with organic search referrers. Voice assistants and in-app surfaces (Siri, Alexa, ChatGPT mobile) leave no digital trail at all.

Conversion is a function of the practice, not the visibility

Whether an AI-referred patient books a consult depends on the practice's intake script, treatment coordinator, financing options, and consult experience. A vendor measuring its work by consult volume is measuring the practice's operations, not its own delivery. ARI isolates the variable Apicus actually controls.

Why this matters for the guarantee

The 120-Day Results Guarantee commits Apicus to a metric we can deliver with discipline. If we miss the benchmark by Day 120, we keep working at no additional cost until we hit it, for up to an additional 90 days. A traffic or consult guarantee would commit us to dozens of variables outside the visibility vendor's hands. We refuse the second guarantee because we believe the first is the honest one.

The three-tier measurement model

Every Apicus client receives three layers of measurement. Only the first is the basis of the engagement and the guarantee.

Tier Metric Status
Tier 1 ARI (per the methodology above) Agreement metric · what the 120-Day Results Guarantee commits to
Tier 2 AI-Referred Sessions Directional signal · honestly under-counted
Tier 3 Practice-Reported Intake Qualitative confirmation · gathered quarterly

Tier 2 · AI-Referred Sessions

We instrument client analytics for traffic that arrives with identifiable AI provenance: clicks from Perplexity, ChatGPT, and Google AI Overview citations; UTM-tagged inbound from citation-stack assets we deploy; and dynamic call-tracking on dedicated landing pages. This is reported as a directional trend, not a benchmark.

Disclosed in every monthly report

Tier 2 figures undercount the true AI-driven funnel, likely by a factor of 2× or more, because of the answer-only patient and the referrer-data gaps described above. We do not let this number drive engagement decisions.

Tier 3 · Practice-Reported Intake

Once per quarter, the practice's treatment coordinator or front-desk team logs how new consult patients describe finding the practice. Phrases like “I asked ChatGPT,” “my AI assistant referred you,” or “Google said you specialize in” are recorded.

The act of asking is itself an intervention. Most practices have no idea how many of their new patients first heard their name from an AI engine until they start listening for it. Tier 3 frequently surfaces the most powerful evidence in the engagement, and it costs the practice roughly 60 seconds per new-patient intake.

In summary

ARI is the metric Apicus guarantees because it is the metric Apicus can guarantee honestly. The other two tiers exist to confirm, directionally and qualitatively, that the visibility work is moving real patient behavior. None of them substitute for ARI. All three reinforce it.


§12Limitations & honest caveats

We are explicit about what ARI cannot do. A measurement standard worth citing is one that names its own failure modes.

  • ARI measures visibility, not conversion. A practice with 60% ARI can close fewer consults than a practice with 40%. That is a function of the practice's patient experience, treatment coordinator, and intake process, not of AI visibility. We do not promise consult volume.
  • AI engine behavior is non-deterministic. Even with median-of-three sampling, a 2-point monthly variance is normal and not necessarily indicative of methodology drift or performance regression.
  • Engine algorithm changes can shift baselines without warning. When this occurs, we publish a methodology bulletin and re-baseline affected client benchmarks.
  • Geo-targeting is imperfect. AI engines do not always honor IP-based location signals cleanly. Patient queries originating outside the targeted metro are not measured.
  • We do not currently measure voice assistants or in-app surfaces. Siri, Alexa, Google Assistant, ChatGPT mobile app, and Bing Chat are out of scope for v1.2. They are candidates for future versions.
  • The methodology is opinionated, not neutral. Engine weights, query category distribution, and IAI components reflect Apicus's judgment of what matters for high-value oral health procedures specifically. They may not generalize to other verticals.

§13Versioning

Each material change to the methodology, such as engine weight adjustments, query panel revisions, or sampling protocol changes, increments the version number and is documented below.

v1.2 May 2026 Current
  • Added Engine Coverage Weight as a co-reported metric
  • Engine weights revised based on Q1 2026 usage data
  • Query Panel expanded from 60 to 75 prompts
  • Standardized median-of-three sampling protocol
v1.1 February 2026
  • Added Perplexity to the engine set
  • Standardized monthly measurement cadence
  • Introduced the highest-value query subset
v1.0 November 2025
  • Initial methodology release
  • Four-engine baseline (ChatGPT, Claude, Gemini, Google AI Overviews)
  • 60-prompt query panel

§14Citation

If you reference this methodology in research, content, vendor evaluations, or client deliverables, please cite as follows:

SUGGESTED CITATION

Apicus Research. (2026). The AI Referral Index (ARI) Methodology, v1.2. Retrieved from apicus.ai/methodology

Direct attribution is appreciated but not required. Apicus maintains the ARI methodology as a public standard for AI visibility measurement.

First step

See your practice's ARI.

The fastest way to understand the methodology is to see it run on your own practice. We'll measure your top 25 patient queries across all five engines and send you the audit. No call, no commitment, just the PDF.

Request Your Audit →