HomeResourceAI
Research Study
A Data-Driven GEO Research Study  ·  SEO + AEO + GEO Optimization
1,000 Queries
4,004 URLs Analyzed

Quick Answer: How Does Google Gemini Select Sources?

Google Gemini doesn’t select sources the same way traditional Google search does. Based on our 1,000-query study, Gemini appears to weigh a combination of topical authority, E-E-A-T signals, structured data presence, and entity relevance — with domain authority playing a supporting role rather than the defining one most SEOs assume.

In plain English: a highly specific, well-structured article from a DR 45 niche site can outperform a generic post from a DR 90 media brand — if it answers the query with greater precision, demonstrates genuine expertise, and uses semantic markup to help Gemini understand the content.

Key Insight

Gemini rewards relevance + trust + structure. Authority opens the door, but it’s the quality of your content signals that decides whether you get cited.

Quick Summary: Biggest Study Findings

78%

Cited pages had clearly linked author bios with E-E-A-T signals

+112%

FAQ schema lift in citation likelihood — highest of any markup type

61%

Of all citations went to DR 50–84 sites — not the highest-authority domains

4.6x

Avg. sources per informational query vs. 2.9 for local queries

16%

Niche specialist sites (avg. DR 48) earned 16% of all citations — beating major brands

2.3x

More citations for pages with expert quotes on commercial queries

How We Ran This Study

Before we get into the findings, here’s exactly how we ran this study. Transparency matters — especially for a topic where opinion pieces are the norm.

Sample Size & Query Distribution

We analyzed 1,000 Gemini queries across three primary intent categories, executed between November 2024 and March 2025 using a standardized testing protocol:

400
Informational queries — “What is programmatic SEO,” “How does DMARC work,” “What causes inflation”
350
Commercial Investigation queries — “Best CRM for small business,” “Top project management software,” “HubSpot vs Salesforce”
250
Local queries — “Best dentist in Austin,” “Emergency plumber near me,” “Family lawyer Chicago”

Data Points Collected per Query

Number of sources cited per Gemini response
Domain Rating (DR) of each cited domain via Ahrefs
Schema markup present on the cited URL
E-E-A-T signals — author page, bio, credentials, editorial policy, contact info
Content freshness — last updated date vs. publish date
Domain type — media, SaaS, government, academic, niche blog, local business
Brand recognition classification — nationally recognized, industry-known, independent
Entity coverage score — how many topic-related entities are mentioned
4,004
Total Cited URLs Analyzed
1,399
Unique Domains
1,000
Queries Analyzed

Limitations

Note

Gemini updates frequently. The AI is trained and refined continuously, so findings reflect the testing window rather than a permanent state.

Note

Personalization exists. We used clean, logged-out browser sessions to minimize this effect.

Note

Query variation matters. We standardized phrasing within each intent cluster.

Note

This is correlational, not causal. We can observe patterns, but we cannot definitively prove Gemini uses any specific signal — only note what appears consistently.

10 Key Findings

Finding #1

Authority Matters, But Less Than Most SEOs Think

This was our most counterintuitive finding. We expected DR 85–100 sites to dominate Gemini citations. They didn’t.

Sites in the DR 70–84 range earned the most citations overall at 39.4%, with the DR 50–69 tier close behind at 21.8%. Ultra-high-authority domains (DR 85+) accounted for just 26% of citations — despite having the most brand recognition and backlinks.

Why? Gemini optimizes for answer quality, not just authority. Very high-DR sites often publish broad, general content. Mid-authority sites in specialized verticals frequently publish more targeted, well-structured content that directly answers queries. The low-authority cohort (DR 0–29) still earned citations at a 4.1% rate — proof that small sites absolutely can get into Gemini’s answers when their content is precise and well-signaled.

Table 1: Citation Rate by Domain Authority Range
DR Range Cited Pages Citation Rate Avg. Position
0–29414.1%38.4
30–49878.7%27.1
50–6921821.8%18.6
70–8439439.4%12.3
85–10026026.0%9.7
Key Takeaway: Don’t let low domain authority stop you from optimizing for Gemini. Topical precision and content signals outweigh raw authority far more than traditional SEO would predict.
Finding #2

E-E-A-T Signals Strongly Correlate With Citations

This finding reinforced what Google has been signaling for years: Experience, Expertise, Authoritativeness, and Trustworthiness aren’t just ranking philosophy — they appear to be active Gemini selection filters.

78% of all cited pages had a clearly linked author bio. 63% listed specific credentials or qualifications. 57% included a published editorial policy. These aren’t coincidences.

What was particularly striking was how expert quotes affected commercial citations. Pages with embedded expert commentary — a doctor reviewing a supplement, a CPA reviewing an accounting tool — appeared 2.3x more frequently in commercial intent responses.

The E-E-A-T signal that surprised us most? Inline citations and source links within the article itself. Pages that cited external research in their own content appeared at 66% of citations — suggesting Gemini may use outbound link quality as a trust proxy.

Table 4: E-E-A-T Signals vs. Citation Frequency
E-E-A-T Signal % of Cited Pages Correlation Strength
Author Bio Present78%High
Author Credentials Listed63%High
Editorial Policy Page57%Medium-High
Contact Information71%Medium
About Us Page82%Medium
Expert Reviews/Quotes49%High
Citations/Sources in Article66%High
Key Takeaway: If your content lacks a real author, credentials, and cited sources, Gemini is much less likely to reference it — no matter how good the prose is.
Finding #3

Structured Data Appears to Increase Citation Likelihood

Schema markup was one of the most consistent differentiators we found. Across all 4,004 cited URLs, pages with any schema markup were cited at significantly higher rates than pages without.

FAQ schema showed the largest effect — cited pages using FAQ markup appeared at a 112% higher rate than equivalent pages without it. The most likely reason: FAQ schema formats content in a way that’s immediately parseable by AI systems, reducing the ambiguity Gemini has to resolve when extracting an answer.

Article schema and Organization schema also showed meaningful lifts. Author schema came in at +28%, which aligns with the E-E-A-T findings above — Gemini may be using Author schema to verify authorship credentials.

Table 3: Schema Usage Among Cited Pages
Schema Type With Schema Without Schema Lift
FAQ Schema68%32%+112%
Article Schema61%39%+56%
Organization Schema54%46%+39%
Author Schema49%51%+28%
Review Schema44%56%+22%
Key Takeaway: If you’re not using FAQ schema on content pages, start today. It’s the easiest structured data win for AI citation optimization — and the data backs it up.
Finding #4

Informational Queries Favor Expert Sources

For informational queries — “What is X,” “How does Y work,” “Why does Z happen” — Gemini showed a clear preference for institutional and expert sources. Government and academic sources (.gov, .edu) earned citations on 67% of relevant health, legal, and financial informational queries.

Informational queries also cited the most sources per response: 4.6 on average. Gemini seems to synthesize multiple expert perspectives rather than relying on a single source for factual answers. This is actually good news for mid-authority publishers — there are more citation slots available per informational query.

The practical implication: if you’re publishing informational content, write like the expert you are (or bring one in). Cite your sources. Get specific. Gemini is looking for the most authoritative explanation it can find, and “authoritative” here means depth and credibility — not just domain age.

Finding #5

Commercial Queries Favor Comparison Content

For commercial investigation queries — “Best X,” “X vs Y,” “Top Z for [use case]” — the content format that dominated was comparison-based: roundups, side-by-side comparisons, detailed software reviews, and buyer’s guides. 74% of commercial query citations went to pages structured as comparisons or product evaluations.

We also noticed that recency mattered more in commercial intents than anywhere else. 68% of cited commercial pages had been updated within the last 6 months — possibly because software features change rapidly and outdated comparisons hurt user trust.

Affiliate sites performed better here than expected. As long as they clearly disclosed affiliations, included genuine testing methodology, and supported conclusions with specifics, they earned citations at respectable rates. Thin affiliate content with vague “we tested this” claims? Rarely cited.

Finding #6

Local Queries Lean Heavily on Business Entities

Local intent queries produced the most concentrated citation patterns. Rather than citing editorial content, Gemini leaned heavily on structured business data. Google Business Profile data appeared embedded in 89% of local responses. Review aggregator platforms (Yelp, Healthgrades, Avvo) were cited in 61% of local responses.

For local businesses, optimizing your Google Business Profile — including services, categories, reviews, and photos — is more valuable for Gemini visibility than any amount of blog content.

Finding #7

Freshness Helps More in Certain Niches

Content freshness had a nuanced effect. It’s not universally important — it’s contextually important. In four categories, freshness was a strong predictor of citation: SaaS and software tools, finance and investing, AI and technology, and health/medical when it involved current treatment guidelines.

One practical finding: pages with visible “Last Updated” dates citing a recent review got cited at a higher rate than pages that were actually more current but showed only the original publish date. Displaying your freshness signal matters.

Key Takeaway: Add a ‘Last Updated’ date to all pillar content. Refresh high-traffic posts with new data every 6–12 months in fast-moving industries.
Finding #8

Brand Recognition Influences Citation Frequency

Brand authority had a measurable — but not dominant — effect on citation frequency. In 23% of unbranded commercial queries, a niche publisher outranked a national brand for Gemini citations. The common thread? The niche publisher had more specific product expertise, more recent data, and stronger structured data.

Brand consistency across multiple signals — consistent NAP data, unified entity presence across the web, knowledge panel presence — did correlate with higher citation frequency. This points to entity SEO as a critical underlying factor.

Table 5: Large Brands vs. Niche Publishers
Publisher Type % of Citations Avg. DR Top Signal
Major Media/News22%87Brand Authority
Industry Publications19%74Topical Depth
SaaS/Tech Brands17%71Product Expertise
Government/Academic14%82Institutional Trust
Niche Specialists16%48Specificity + E-E-A-T
Independent Blogs12%39First-Person Experience
Finding #9

Smaller Websites Can Still Win

This might be the most encouraging finding in the entire study. Niche specialist sites averaged DR 48 but earned 16% of all citations. Independent blogs (avg. DR 39) contributed 12% of citations. Combined, these two categories accounted for more than a quarter of all Gemini citations.

Three common characteristics in successful small-site citations:

01

Hyper-specificity: The cited page answered a very precise question better than any large-site competitor. Think “Best CRM for one-person consulting firms” vs. “Best CRM software.”

02

First-person data or experience: The article included original research, hands-on testing results, or documented personal experience that larger sites couldn’t replicate.

03

Excellent on-page E-E-A-T: Author bio present, credentials relevant to the topic, sources cited within the article.

★ Real Example

A 3-person accounting software review blog with DR 41 was cited by Gemini on “best accounting software for freelancers” — outperforming Forbes, PCMag, and NerdWallet on that specific query variant.

Finding #10

Entity Coverage May Be Gemini’s Hidden Ranking Signal

This is the finding that most competitors haven’t discussed — and it might be the most strategically important. Across all 4,004 cited URLs, we observed a strong pattern: cited pages didn’t just answer the primary query — they covered the related entity ecosystem around that topic.

A page about “project management software” that also thoroughly covered terms like “sprint planning,” “agile methodology,” “team collaboration,” “Kanban boards,” and “task dependencies” appeared significantly more often than a page that answered the core query in isolation.

We believe Gemini uses entity relationships — similar to how Google’s Knowledge Graph works — to evaluate whether a page genuinely understands a topic or just contains the right keywords.

Key Takeaway: Map the entity landscape for your target topics. Make sure your content references the key concepts, subtopics, and related entities that a true expert would naturally discuss.

Citation Rate by Intent: Summary Table

Query Intent Queries Sources Cited Unique Domains Avg. Sources/Query
Informational4001,8476124.6
Commercial3501,4234894.1
Local2507342982.9

Real-World Citation Examples

Example 1

SaaS Comparison Query — “Best project management software for agencies”

Gemini cited 5 sources for this query. Three were niche-specific software review sites (avg. DR 52), one was G2.com (DR 89), and one was a detailed buyer’s guide from a marketing agency’s blog (DR 44).

Why those sources? All five had published comparison tables with at least 5 specific tools, included agency-specific use cases, had visible update dates within the previous 4 months, and used Article or FAQ schema. The agency blog — lowest DR in the set — had the most detailed first-person methodology and won a citation spot because of it.

Example 2

Medical Informational Query — “What are the early signs of type 2 diabetes”

Gemini cited 4 sources: CDC.gov, Mayo Clinic, a hospital system’s patient education page, and one independent health publication with a certified diabetes educator as the listed author.

Why those sources? Institutional authority was the primary driver for three of them. The independent health publication earned its spot through explicit author credentialing (CDE certification listed in the bio), in-article citations to peer-reviewed studies, and a published medical review policy on the site.

Example 3

Local Service Query — “Emergency plumber near me” (Austin, TX)

Gemini’s response was almost entirely entity-driven. It surfaced the Google Business Profile information for 3 local plumbing companies — including business name, phone number, hours, and star rating — plus one citation from Yelp’s Austin plumbers category page.

The lesson: No editorial content was cited. No blog posts. No comparison guides. For high-urgency local queries, Gemini bypasses content entirely and goes straight to verified business entities. Your GBP is your Gemini presence.

How to Increase Your Chances of Being Cited by Google Gemini

Based on our findings, here’s a practical 8-step GEO (Generative Engine Optimization) framework you can start applying today.

1

Strengthen E-E-A-T

Add a detailed author bio to every content page — real credentials, not “Staff Writer”
Link author bios to LinkedIn profiles, credential pages, or published work
Add an editorial review policy page explaining your content standards
Include in-article citations from recognized external sources
Add an About page that clearly explains your organization’s expertise and mission
2

Add Structured Data

Implement FAQ schema on all Q&A and explainer content — highest citation lift in our study
Use Article schema with author, datePublished, and dateModified fields populated
Add Organization schema to your homepage with full entity data
For local businesses: ensure LocalBusiness schema is complete and accurate
3

Build Entity Associations

Identify the 20–30 core entities in your topic cluster and reference them naturally
Use internal linking to connect related topic pages — build a semantic web
Claim and optimize your Google Knowledge Panel if eligible
Ensure consistent NAP data across all directories for local entities
4

Create Citation-Friendly Content

Structure content with clear, direct answers in the first paragraph — don’t bury the lede
Use comparison tables, numbered lists, and FAQ sections that AI systems can parse easily
Write in a format that allows Gemini to cite a specific section without requiring the full article
Include precise, specific data points — numbers, percentages, named examples — rather than vague generalizations
5

Publish Original Research

Original data is the single best citation magnet for AI systems. Studies, surveys, and first-hand experiments create unique, non-duplicable value that neither large brands nor AI can replicate. Even a small-scale study — 100 customer interviews, a 6-month A/B test — becomes a citable primary source.

6

Improve Brand Authority

Get mentioned and linked by other recognized entities in your industry
Build a consistent brand entity presence: Wikipedia (if eligible), Wikidata, Crunchbase, industry directories
Earn coverage from industry publications and news sites — even single mentions help build entity strength
7

Update Content Regularly

Add a visible ‘Last Updated’ date to all pillar and comparison content
Schedule quarterly content reviews for fast-moving industries — SaaS, AI, finance, health
When updating, add new data — not just surface-level edits. Gemini evaluates substantive freshness
8

Become a Trusted Source in One Topic Area

Generalism doesn’t win in AI search. The sites that showed up most consistently in our study had one thing in common: they owned a specific topic. Not “marketing” — “email deliverability.” Not “finance” — “tax strategy for freelancers.” Topical authority appears to be one of the strongest citation signals of all. Build depth before breadth.

What This Means for SEO, AEO, and GEO in 2026

We’re at an inflection point. Traditional SEO, Answer Engine Optimization, and Generative Engine Optimization are no longer separate disciplines — they’re overlapping layers of the same visibility challenge.

Traditional SEO

Backlink acquisition and technical SEO still matter — they feed authority signals that Gemini partially relies on. But chasing rankings in the traditional blue-link sense is increasingly a secondary goal. Position #1 means less if Gemini summarizes the answer before the user ever scrolls.

AEO (Answer Engine Optimization)

Structured Q&A content, FAQ schema, and direct answer formatting are now baseline requirements. Historically applied to voice search, AEO principles are even more critical for Gemini, which synthesizes answers across multiple sources.

GEO (Generative Engine Optimization)

GEO is the emerging discipline that encompasses everything we’ve discussed: E-E-A-T signals, entity coverage, structured data, citation-friendly formatting, and original research. Think of GEO as SEO for the AI layer — optimizing not for ranking positions but for source selection by AI systems.

Entity SEO

Entity-based search is the underlying architecture of how Gemini understands the web. If your brand, your authors, and your content topics don’t exist as defined entities in Google’s knowledge graph, you’re operating at a visibility disadvantage that keyword optimization alone can’t fix.

Brand SEO

Being a recognized brand — even in a narrow niche — creates a citation baseline. Gemini cites brands it recognizes and trusts. Building brand entity strength through PR, industry coverage, and consistent web presence is a long-term but high-leverage investment.

How Google Gemini Source Selection Could Evolve Next

Based on current signals and Google’s stated direction, here’s where we think Gemini’s citation model is heading:

📸

Multimodal Citations

Expansion to YouTube videos, infographics, and podcast transcripts as Gemini’s multimodal capabilities mature.

Creator Authority Verification

Future Gemini may verify author credentials through external databases — medical boards, bar associations, academic records.

🧪

First-Hand Experience Signals

Increasing priority for content with documented real-world experience — product teardowns, clinical reviews, firsthand case studies.

🤖

AI-Generated Content Filtering

More sophisticated filters to identify and de-prioritize content without genuine human expertise, making authentic experience-driven content increasingly rare and valuable.

🌐

Knowledge Graph Expansion

Publishers who proactively build entity associations — through structured data, authoritative mentions, and consistent cross-web presence — will be better positioned.

🎯

Personalized Citation Layers

Sources you’ve interacted with or subscribed to may gain citation preference — making audience-building a potential Gemini optimization signal.

Frequently Asked Questions

How does Google Gemini choose sources?+
Based on our study, Gemini appears to evaluate a combination of topical relevance, E-E-A-T signals, structured data presence, entity coverage, content freshness (where applicable), and domain authority. No single factor dominates — it’s the intersection of these signals that determines citation likelihood.
Does domain authority matter for Gemini citations?+
Yes, but less than most SEOs expect. Our data showed the DR 70–84 range earned the most citations — not the very highest authority sites. Niche sites with DR 40–50 earned citations at meaningful rates when their content quality and on-page signals were strong.
Does schema markup help with Gemini citations?+
Yes — our data shows a clear positive correlation. FAQ schema was the most impactful, associated with a 112% lift in citation likelihood. Article, Organization, and Author schema also showed meaningful positive effects. Structured data makes content machine-readable in a way that directly benefits AI source selection.
Can small websites get cited by Gemini?+
Absolutely. Small and niche websites earned more than 28% of all citations in our study. The keys: hyper-specific content that answers niche query variants, strong E-E-A-T signals, and structured data. Small sites that invest in these areas routinely outperform generic large-brand content.
Does Gemini use Google’s traditional search rankings?+
Not directly — but there’s significant overlap. Google’s quality signals feed both systems. However, we found numerous cases where pages cited by Gemini did not appear in the top 10 organic results for the same query.
What content format gets cited most often?+
For informational queries: detailed explainers with clear structure, inline citations, and expert authorship. For commercial queries: comparison articles and product roundups with visible methodology and recent update dates. For local queries: Google Business Profiles and structured directory listings.
Does E-E-A-T affect Gemini source selection?+
Yes — this was one of our strongest findings. 78% of cited pages had a visible author bio, 63% listed specific credentials, and 57% had an editorial policy. The presence of verifiable expertise signals was one of the most consistent predictors of citation across all query types.
How important is content freshness for Gemini citations?+
It depends heavily on the topic. For SaaS, AI, finance, and medical topics involving current guidelines, freshness was a strong predictor. For evergreen educational content, legal concepts, and fundamental how-to guides, older content continued to earn citations. The key signal appears to be whether freshness is relevant to the query intent.
What industries benefit most from Gemini optimization?+
Based on citation patterns, the highest-opportunity categories are: SaaS and software (high query volume, strong comparison content demand), health and wellness (high E-E-A-T weight, institutional citations), finance and investing (freshness + authority both valued), and local services (GBP optimization is the primary lever).
How can I optimize my website for Google Gemini?+
Follow our 8-step GEO framework: strengthen E-E-A-T signals, implement schema markup (especially FAQ schema), build entity associations, create citation-friendly content structures, publish original research, build brand authority, refresh content regularly, and develop deep topical authority in a specific subject area.

Final Thoughts: What Every Marketer Should Take Away

Here’s the honest takeaway from 1,000 queries and 4,000 analyzed citations: Gemini is not a ranking engine in the traditional sense. It’s a trust engine. It’s asking a different question than traditional search — not “which page is most popular?” but “which source should I trust to answer this question?”

The biggest lesson from this study is one that Google has been telegraphing for years: expertise, authority, and trustworthiness are not checkbox items for an audit — they’re the actual substance of what makes content worth citing. The sites that earned the most Gemini citations weren’t the ones with the most backlinks. They were the ones that most convincingly demonstrated genuine knowledge.

That’s actually good news if you’re a specialist, a practitioner, or an independent publisher with real expertise. The playing field has shifted in your favor — as long as you know how to signal what you know.

The marketers who will win in AI search are the ones who stop thinking “how do I rank?” and start thinking “how do I become the most trustworthy source on this topic?” Build that, and the citations will follow.

The future of search isn’t about being found.
It’s about being trusted.

Study conducted  |  1,000 Queries  |  4,004 Cited URLs  |  1,399 Unique Domains

About the Author

Jaykishan

Collaborator & Editor

Leave a Reply

Related articles

We would love to learn more about your digital goals.

Book a time on my calendar and you will receive a calendar invite.

Scale Your Business