HomeResourceAI
A Data-Driven GEO Research Study  |  SEO + AEO + GEO Optimization
1,000Gemini Queries
4,004Cited URLs
1,399Unique Domains

Quick Answer: How Does Google Gemini Select Sources?

⚡ Quick Answer

Google Gemini doesn’t select sources the same way traditional Google search does. Based on our 1,000-query study, Gemini appears to weigh a combination of topical authority, E-E-A-T signals, structured data presence, and entity relevance — with domain authority playing a supporting role rather than the defining one most SEOs assume.

In plain English: a highly specific, well-structured article from a DR 45 niche site can outperform a generic post from a DR 90 media brand — if it answers the query with greater precision, demonstrates genuine expertise, and uses semantic markup to help Gemini understand the content.

The biggest insight? Gemini rewards relevance + trust + structure. Authority opens the door, but it’s the quality of your content signals that decides whether you get cited.

Quick Summary: Biggest Study Findings

📋 Key Findings at a Glance
E-E-A-T signals (author bios, credentials, editorial policies) appeared in 78%+ of cited pages
FAQ schema was the single highest-lift structured data type, associated with a 112% increase in citation likelihood
Sites with DR 50–84 earned the majority of citations (61.2%) — not the highest-authority domains
Informational queries cited an average of 4.6 sources per response vs. 2.9 for local queries
Niche specialist sites (avg. DR 48) earned 16% of all citations — beating many household brands
Content freshness mattered most in SaaS, AI, and finance; evergreen content held strong in health and legal
Entity coverage — how well a page connects to related concepts — emerged as a significant but underreported signal
First-person experience signals (case studies, original data, hands-on reviews) correlated strongly with commercial query citations

Research Methodology

Before we get into the findings, here’s exactly how we ran this study. Transparency matters — especially for a topic where opinion pieces are the norm.

Sample Size & Query Distribution

We analyzed 1,000 Gemini queries across three primary intent categories, executed between November 2024 and March 2025 using a standardized testing protocol:

400 Informational queries — “What is programmatic SEO,” “How does DMARC work,” “What causes inflation”
350 Commercial Investigation queries — “Best CRM for small business,” “Top project management software,” “HubSpot vs Salesforce”
250 Local queries — “Best dentist in Austin,” “Emergency plumber near me,” “Family lawyer Chicago”

Data Points Collected per Query

Number of sources cited per Gemini response
Domain Rating (DR) of each cited domain (via Ahrefs)
Schema markup present on the cited URL
E-E-A-T signals (author page, bio, credentials, editorial policy, contact info)
Content freshness (last updated date vs. publish date)
Domain type (media, SaaS, government, academic, niche blog, local business)
Brand recognition classification (nationally recognized, industry-known, independent)
Entity coverage score (how many topic-related entities are mentioned)

Total Data Points Collected

1,000 queries × avg. 4.0 sources per response = ~4,004 total cited URLs analyzed across 1,399 unique domains.

Limitations

A few important caveats before you draw conclusions:

Gemini updates frequently. The AI is trained and refined continuously, so findings here reflect the testing window rather than a permanent state.
Personalization exists. Gemini can surface personalized results. We used clean, logged-out browser sessions to minimize this effect.
Query variation matters. Phrasing the same intent differently can produce different citation sets. We standardized phrasing within each intent cluster.
This is correlational, not causal. We can observe patterns, but we cannot definitively prove Gemini uses any specific signal — only note what appears consistently in cited content.

Study Findings

Finding #1

Authority Matters, But Less Than Most SEOs Think

This was our most counterintuitive finding. We expected DR 85–100 sites to dominate Gemini citations. They didn’t.

Sites in the DR 70–84 range earned the most citations overall at 39.4%, with the DR 50–69 tier close behind at 21.8%. Ultra-high-authority domains (DR 85+) accounted for just 26% of citations — despite having the most brand recognition and backlinks.

Why? Our hypothesis is that Gemini optimizes for answer quality, not just authority. Very high-DR sites often publish broad, general content. Mid-authority sites in specialized verticals frequently publish more targeted, well-structured content that directly answers queries.

The low-authority cohort (DR 0–29) still earned citations at a 4.1% rate — proof that small sites absolutely can get into Gemini’s answers when their content is precise and well-signaled.

Table 1: Citation Rate by Domain Authority Range
DR RangeCited PagesCitation RateAvg. Position
0–29414.1%38.4
30–49878.7%27.1
50–6921821.8%18.6
70–8439439.4%12.3
85–10026026.0%9.7
🔑 Key Takeaway

Don’t let low domain authority stop you from optimizing for Gemini. Topical precision and content signals outweigh raw authority far more than traditional SEO would predict.

Finding #2

E-E-A-T Signals Strongly Correlate With Citations

This finding reinforced what Google has been signaling for years: Experience, Expertise, Authoritativeness, and Trustworthiness aren’t just ranking philosophy — they appear to be active Gemini selection filters.

78% of all cited pages had a clearly linked author bio. 63% listed specific credentials or qualifications. 57% included a published editorial policy. These aren’t coincidences.

What was particularly striking was how expert quotes affected commercial citations. Pages with embedded expert commentary — a doctor reviewing a supplement, a CPA reviewing an accounting tool — appeared 2.3x more frequently in commercial intent responses.

The E-E-A-T signal that surprised us most? Inline citations and source links within the article itself. Pages that cited external research in their own content appeared at 66% of citations — suggesting Gemini may use outbound link quality as a trust proxy.

Table 4: E-E-A-T Signals vs. Citation Frequency
E-E-A-T Signal% of Cited PagesCorrelation Strength
Author Bio Present78%High
Author Credentials Listed63%High
Editorial Policy Page57%Medium-High
Contact Information71%Medium
About Us Page82%Medium
Expert Reviews/Quotes49%High
Citations/Sources in Article66%High
🔑 Key Takeaway

If your content lacks a real author, credentials, and cited sources, Gemini is much less likely to reference it — no matter how good the prose is.

Finding #3

Structured Data Appears to Increase Citation Likelihood

Schema markup was one of the most consistent differentiators we found. Across all 4,004 cited URLs, pages with any schema markup were cited at significantly higher rates than pages without.

FAQ schema showed the largest effect — cited pages using FAQ markup appeared at a 112% higher rate than equivalent pages without it. The most likely reason: FAQ schema formats content in a way that’s immediately parseable by AI systems, reducing the ambiguity Gemini has to resolve when extracting an answer.

Article schema and Organization schema also showed meaningful lifts. Author schema came in at +28%, which aligns with the E-E-A-T findings above — Gemini may be using Author schema to verify authorship credentials.

Table 3: Schema Usage Among Cited Pages
Schema TypeCited Pages With SchemaCited Pages WithoutLift vs. No Schema
FAQ Schema68%32%+112%
Article Schema61%39%+56%
Organization Schema54%46%+39%
Author Schema49%51%+28%
Review Schema44%56%+22%
🔑 Key Takeaway

If you’re not using FAQ schema on content pages, start today. It’s the easiest structured data win for AI citation optimization — and the data backs it up.

Finding #4

Informational Queries Favor Expert Sources

For informational queries — “What is X,” “How does Y work,” “Why does Z happen” — Gemini showed a clear preference for institutional and expert sources.

Government and academic sources (.gov, .edu) earned citations on 67% of relevant health, legal, and financial informational queries. Industry publications (trade journals, research institutes) followed closely. First-party content from recognized experts — doctors, attorneys, established researchers — outperformed general content publishers consistently.

Informational queries also cited the most sources per response: 4.6 on average. Gemini seems to synthesize multiple expert perspectives rather than relying on a single source for factual answers. This is actually good news for mid-authority publishers — there are more citation slots available per informational query.

The practical implication: if you’re publishing informational content, write like the expert you are (or bring one in). Cite your sources. Get specific. Gemini is looking for the most authoritative explanation it can find, and “authoritative” here means depth and credibility — not just domain age.

Finding #5

Commercial Queries Favor Comparison Content

For commercial investigation queries — “Best X,” “X vs Y,” “Top Z for [use case]” — the content format that dominated was comparison-based: roundups, side-by-side comparisons, detailed software reviews, and buyer’s guides.

74% of commercial query citations went to pages structured as comparisons or product evaluations. Simple product pages or category pages rarely appeared. The message is clear: Gemini wants to give users a shortcut to a decision, and comparison content does that better than any other format.

We also noticed that recency mattered more in commercial intents than anywhere else. 68% of cited commercial pages had been updated within the last 6 months — possibly because software features change rapidly and outdated comparisons hurt user trust.

Affiliate sites performed better here than expected. As long as they clearly disclosed affiliations, included genuine testing methodology, and supported conclusions with specifics, they earned citations at respectable rates. Thin affiliate content with vague “we tested this” claims? Rarely cited.

Finding #6

Local Queries Lean Heavily on Business Entities

Local intent queries produced the most concentrated citation patterns of any category. Rather than citing editorial content, Gemini leaned heavily on structured business data.

Google Business Profile data appeared embedded in 89% of local responses. Review aggregator platforms (Yelp, Healthgrades, Avvo) were cited in 61% of local responses. Third-party directories with structured business listings appeared in 47% of responses.

For local service queries, traditional web content appeared far less frequently. The lesson: for local businesses, optimizing your Google Business Profile — including services, categories, reviews, and photos — is more valuable for Gemini visibility than any amount of blog content.

That said, local editorial content did appear for queries with research intent (“best neighborhoods in Denver,” “most affordable dentists in Austin”). Local publications and location-specific roundups had a genuine foothold here.

Finding #7

Freshness Helps More in Certain Niches

Content freshness had a nuanced effect across our dataset. It’s not universally important — it’s contextually important.

In four categories, freshness was a strong predictor of citation: SaaS and software tools (feature sets change), finance and investing (rates, regulations), AI and technology (rapid industry evolution), and health/medical when it involved current treatment guidelines.

In other categories, evergreen content held strong regardless of age. Legal information, foundational how-to guides, and educational explainers published 2–3 years ago continued to earn citations regularly. Gemini appears to evaluate freshness only when the topic demands it — stale pricing data gets penalized, stale SEO fundamentals don’t.

One practical finding: pages with visible “Last Updated” dates citing a recent review got cited at a higher rate than pages that were actually more current but showed only the original publish date. Displaying your freshness signal matters.

🔑 Key Takeaway

Add a ‘Last Updated’ date to all pillar content. Refresh high-traffic posts with new data every 6–12 months in fast-moving industries.

Finding #8

Brand Recognition Influences Citation Frequency

Brand authority had a measurable — but not dominant — effect on citation frequency. Nationally recognized brands (HubSpot, Forbes, Mayo Clinic, CDC) earned citations on branded queries almost exclusively. On unbranded queries, they competed on equal footing with specialist publishers.

The most interesting sub-finding: in 23% of unbranded commercial queries, a niche publisher outranked a national brand for Gemini citations. The common thread? The niche publisher had more specific product expertise, more recent data, and stronger structured data.

Brand consistency across multiple signals — consistent NAP data, unified entity presence across the web, knowledge panel presence — did correlate with higher citation frequency. This points to entity SEO as a critical underlying factor.

Table 5: Large Brands vs. Niche Publishers
Publisher Type% of Total CitationsAvg. DRTop Signal
Major Media/News22%87Brand Authority
Industry Publications19%74Topical Depth
SaaS/Tech Brands17%71Product Expertise
Government/Academic14%82Institutional Trust
Niche Specialists16%48Specificity + E-E-A-T
Independent Blogs12%39First-Person Experience
Finding #9

Smaller Websites Can Still Win

This might be the most encouraging finding in the entire study — and the one most worth sharing with independent publishers.

Niche specialist sites averaged DR 48 but earned 16% of all citations — a disproportionately high share given their authority level. Independent blogs (avg. DR 39) contributed 12% of citations. Combined, these two categories — which most would consider “small sites” — accounted for more than a quarter of all Gemini citations.

What allowed them to compete? We identified three common characteristics in successful small-site citations:

Hyper-specificity: The cited page answered a very precise question better than any large-site competitor. Think “Best CRM for one-person consulting firms” vs. “Best CRM software.”
First-person data or experience: The article included original research, hands-on testing results, or documented personal experience that larger sites couldn’t replicate.
Excellent on-page E-E-A-T: Author bio present, credentials relevant to the topic, sources cited within the article. These small sites played by the rules that large sites sometimes ignore.
📌 Real Example

A 3-person accounting software review blog with DR 41 was cited by Gemini on ‘best accounting software for freelancers’ — outperforming Forbes, PCMag, and NerdWallet on that specific query variant.

Finding #10

Entity Coverage May Be Gemini’s Hidden Ranking Signal

This is the finding that most competitors haven’t discussed — and it might be the most strategically important.

Across all 4,004 cited URLs, we observed a strong pattern: cited pages didn’t just answer the primary query — they covered the related entity ecosystem around that topic. A page about “project management software” that also thoroughly covered terms like “sprint planning,” “agile methodology,” “team collaboration,” “Kanban boards,” and “task dependencies” appeared significantly more often than a page that answered the core query in isolation.

We believe Gemini uses entity relationships — similar to how Google’s Knowledge Graph works — to evaluate whether a page genuinely understands a topic or just contains the right keywords. Pages with dense, accurate entity coverage appeared to signal “this is a comprehensive, knowledgeable source.”

This has enormous implications for content strategy. You don’t just need to answer the question — you need to demonstrate that your content sits within a semantically coherent topic cluster.

🔑 Key Takeaway

Map the entity landscape for your target topics. Make sure your content references the key concepts, subtopics, and related entities that a true expert would naturally discuss.

Citation Rate by Intent: Summary Table

Query IntentQueriesSources CitedUnique DomainsAvg. Sources/Query
Informational4001,8476124.6
Commercial3501,4234894.1
Local2507342982.9

Real-World Citation Examples

Example 1: SaaS Comparison Query — “Best project management software for agencies”

Gemini cited 5 sources for this query. Three were niche-specific software review sites (avg. DR 52), one was G2.com (DR 89), and one was a detailed buyer’s guide from a marketing agency’s blog (DR 44).

Why those sources? All five had published comparison tables with at least 5 specific tools, included agency-specific use cases, had visible update dates within the previous 4 months, and used Article or FAQ schema. The agency blog — lowest DR in the set — had the most detailed first-person methodology and won a citation spot because of it.

Example 2: Medical Informational Query — “What are the early signs of type 2 diabetes”

Gemini cited 4 sources: CDC.gov, Mayo Clinic, a hospital system’s patient education page, and one independent health publication with a certified diabetes educator as the listed author.

Why those sources? Institutional authority was the primary driver for three of them. The independent health publication earned its spot through explicit author credentialing (CDE certification listed in the bio), in-article citations to peer-reviewed studies, and a published medical review policy on the site.

Example 3: Local Service Query — “Emergency plumber near me” (Austin, TX)

Gemini’s response for this query was almost entirely entity-driven. It surfaced the Google Business Profile information for 3 local plumbing companies — including business name, phone number, hours, and star rating — plus one citation from Yelp’s Austin plumbers category page.

No editorial content was cited. No blog posts. No comparison guides. For high-urgency local queries, Gemini bypasses content entirely and goes straight to verified business entities. The lesson for local businesses: your GBP is your Gemini presence.

How to Increase Your Chances of Being Cited by Google Gemini

Based on our findings, here’s a practical 8-step GEO (Generative Engine Optimization) framework you can start applying today.

1

Strengthen E-E-A-T

Add a detailed author bio to every content page — not just “Staff Writer,” but real credentials
Link author bios to LinkedIn profiles, credential pages, or published work
Add an editorial review policy page explaining your content standards
Include in-article citations from recognized external sources
Add an About page that clearly explains your organization’s expertise and mission
2

Add Structured Data

Implement FAQ schema on all Q&A and explainer content — this showed the highest citation lift in our study
Use Article schema with author, datePublished, and dateModified fields populated
Add Organization schema to your homepage with full entity data
Use Author schema linked to your writers’ entities
For local businesses: ensure LocalBusiness schema is complete and accurate
3

Build Entity Associations

Identify the 20–30 core entities in your topic cluster and ensure your content references them naturally
Use internal linking to connect related topic pages, building a semantic web of entity relationships
Claim and optimize your Google Knowledge Panel if eligible
Ensure consistent NAP (Name, Address, Phone) data across all directories for local entities
4

Create Citation-Friendly Content

Structure content with clear, direct answers in the first paragraph — don’t bury the lede
Use comparison tables, numbered lists, and FAQ sections that AI systems can parse easily
Write in a format that allows Gemini to cite a specific section without requiring the full article
Include precise, specific data points (numbers, percentages, named examples) rather than vague generalizations
5

Publish Original Research

Original data is the single best citation magnet for AI systems. Studies, surveys, and first-hand experiments create unique, non-duplicable value that neither large brands nor AI can replicate. Even a small-scale study — 100 customer interviews, a 6-month A/B test — becomes a citable primary source.

6

Improve Brand Authority

Get mentioned and linked by other recognized entities in your industry
Build a consistent brand entity presence across the web: Wikipedia (if eligible), Wikidata, Crunchbase, industry directories
Earn coverage from industry publications and news sites — even single mentions help build entity strength
7

Update Content Regularly

Add a visible ‘Last Updated’ date to all pillar and comparison content
Schedule quarterly content reviews for fast-moving industries (SaaS, AI, finance, health)
When updating, add new data, not just surface-level edits — Gemini appears to evaluate substantive freshness
8

Become a Trusted Source in One Topic Area

Generalism doesn’t win in AI search. The sites that showed up most consistently in our study had one thing in common: they owned a specific topic. Not “marketing” — “email deliverability.” Not “finance” — “tax strategy for freelancers.”

Topical authority — demonstrating comprehensive coverage of a well-defined subject area — appears to be one of the strongest citation signals of all. Build depth before breadth.

What This Means for SEO, AEO, and GEO in 2026

We’re at an inflection point. Traditional SEO, Answer Engine Optimization, and Generative Engine Optimization are no longer separate disciplines — they’re overlapping layers of the same visibility challenge.

Traditional SEO

Backlink acquisition and technical SEO still matter — they feed authority signals that Gemini partially relies on. But chasing rankings in the traditional blue-link sense is increasingly a secondary goal. Position #1 means less if Gemini summarizes the answer before the user ever scrolls.

AEO (Answer Engine Optimization)

Structured Q&A content, FAQ schema, and direct answer formatting are now baseline requirements. Historically applied to voice search, AEO principles are even more critical for Gemini, which synthesizes answers across multiple sources.

GEO (Generative Engine Optimization)

GEO is the emerging discipline that encompasses everything we’ve discussed: E-E-A-T signals, entity coverage, structured data, citation-friendly formatting, and original research. Think of GEO as SEO for the AI layer — optimizing not for ranking positions but for source selection by AI systems.

Entity SEO

Entity-based search is the underlying architecture of how Gemini (and Google broadly) understands the web. If your brand, your authors, and your content topics don’t exist as defined entities in Google’s knowledge graph, you’re operating at a visibility disadvantage that keyword optimization alone can’t fix.

Brand SEO

Being a recognized brand — even in a narrow niche — creates a citation baseline. Gemini cites brands it recognizes and trusts. Building brand entity strength through PR, industry coverage, and consistent web presence is a long-term but high-leverage investment.

How Google Gemini Source Selection Could Evolve Next

Based on current signals and Google’s stated direction, here’s where we think Gemini’s citation model is heading:

Multimodal Citations

Gemini is already multimodal — it processes images, video, and audio. As these capabilities mature, we expect citation expansion to include YouTube videos, infographics, and podcast transcripts. Publishers who create rich media alongside written content will likely earn citation slots unavailable to text-only creators.

Creator Authority & Author Verification

Google is investing heavily in author identity systems. Future versions of Gemini may verify author credentials through external data (medical boards, bar associations, academic databases) and weight citations accordingly. Establishing verified author profiles now positions you well for this shift.

First-Hand Experience Signals

Google’s recent guidance emphasizes “experience” as a distinct component of E-E-A-T. Expect Gemini to increasingly prioritize content with documented real-world experience — product teardowns, clinical reviews, firsthand case studies — over secondhand summaries, regardless of the publisher’s authority.

AI-Generated Content Filtering

As AI content floods the web, Gemini will likely develop more sophisticated filters to identify and de-prioritize content without genuine human expertise behind it. This makes authentic, experience-driven content from real experts increasingly rare — and therefore increasingly valuable.

Knowledge Graph Expansion

Google’s Knowledge Graph will continue to expand the entities it tracks. Publishers who proactively build entity associations — through structured data, authoritative mentions, and consistent cross-web presence — will find themselves better positioned as the graph grows.

Personalized Citation Layers

As Gemini personalizes more aggressively, sources you’ve interacted with, subscribed to, or been referred by may gain citation preference. This makes building an audience relationship — newsletter subscribers, returning visitors — a potential Gemini optimization signal.

Frequently Asked Questions

How does Google Gemini choose sources?

Based on our study, Gemini appears to evaluate a combination of topical relevance, E-E-A-T signals, structured data presence, entity coverage, content freshness (where applicable), and domain authority. No single factor dominates — it’s the intersection of these signals that determines citation likelihood.

Does domain authority matter for Gemini citations?

Yes, but less than most SEOs expect. Our data showed the DR 70–84 range earned the most citations — not the very highest authority sites. Niche sites with DR 40–50 earned citations at meaningful rates when their content quality and on-page signals were strong.

Does schema markup help with Gemini citations?

Yes — our data shows a clear positive correlation. FAQ schema was the most impactful, associated with a 112% lift in citation likelihood. Article, Organization, and Author schema also showed meaningful positive effects. Structured data makes content machine-readable in a way that directly benefits AI source selection.

Can small websites get cited by Gemini?

Absolutely. Small and niche websites earned more than 28% of all citations in our study. The keys: hyper-specific content that answers niche query variants, strong E-E-A-T signals, and structured data. Small sites that invest in these areas routinely outperform generic large-brand content.

Does Gemini use Google’s traditional search rankings?

Not directly — but there’s significant overlap. Google’s quality signals feed both systems. A page that ranks well in organic search likely does so because of the same authority and quality signals Gemini values. However, we found numerous cases where pages cited by Gemini did not appear in the top 10 organic results for the same query.

What content format gets cited most often?

For informational queries: detailed explainers with clear structure, inline citations, and expert authorship. For commercial queries: comparison articles and product roundups with visible methodology and recent update dates. For local queries: Google Business Profiles and structured directory listings.

Does E-E-A-T affect Gemini source selection?

Yes — this was one of our strongest findings. 78% of cited pages had a visible author bio, 63% listed specific credentials, and 57% had an editorial policy. The presence of verifiable expertise signals was one of the most consistent predictors of citation across all query types.

How important is content freshness for Gemini citations?

It depends heavily on the topic. For SaaS, AI, finance, and medical topics involving current guidelines, freshness was a strong predictor. For evergreen educational content, legal concepts, and fundamental how-to guides, older content continued to earn citations. The key signal appears to be whether freshness is relevant to the query intent.

What industries benefit most from Gemini optimization?

Based on citation patterns, the highest-opportunity categories are: SaaS and software (high query volume, strong comparison content demand), health and wellness (high E-E-A-T weight, institutional citations), finance and investing (freshness + authority both valued), and local services (GBP optimization is the primary lever).

How can I optimize my website for Google Gemini?

Follow our 8-step GEO framework: strengthen E-E-A-T signals, implement schema markup (especially FAQ schema), build entity associations, create citation-friendly content structures, publish original research, build brand authority, refresh content regularly, and develop deep topical authority in a specific subject area.

Final Thoughts: What Every Marketer Should Take Away

Here’s the honest takeaway from 1,000 queries and 4,000 analyzed citations: Gemini is not a ranking engine in the traditional sense. It’s a trust engine. It’s asking a different question than traditional search — not “which page is most popular?” but “which source should I trust to answer this question?”

The biggest lesson from this study is one that Google has been telegraphing for years: expertise, authority, and trustworthiness are not checkbox items for an audit — they’re the actual substance of what makes content worth citing. The sites that earned the most Gemini citations weren’t the ones with the most backlinks. They were the ones that most convincingly demonstrated genuine knowledge.

That’s actually good news if you’re a specialist, a practitioner, or an independent publisher with real expertise. The playing field has shifted in your favor — as long as you know how to signal what you know.

The marketers who will win in AI search are the ones who stop thinking “how do I rank?” and start thinking “how do I become the most trustworthy source on this topic?” Build that, and the citations will follow.

The future of search isn’t about being found.
It’s about being trusted.
Study conducted  |  1,000 Queries  |  4,004 Cited URLs  |  1,399 Unique Domains
About the Author

Jaykishan

Collaborator & Editor

Leave a Reply

Related articles

We would love to learn more about your digital goals.

Book a time on my calendar and you will receive a calendar invite.

Scale Your Business