Claude consistently favors sources that combine genuine authority with clear, well-structured content. In our 500-query study, government and educational domains dominated citation share — not because Claude is hard-coded to prefer them, but because they consistently demonstrate the signals Claude is trained to recognize as trustworthy.
The three pillars that predict citation selection are: domain authority and reputation, content structure and depth, and evidence of real expertise. Publishers who score well on all three dramatically outperform those who focus on one alone.
“Claude appears to prioritize authoritative, well-cited, and clearly structured sources over purely promotional content.”
- 500 queries analyzed across 10 industry categories
- Government sites (.gov) earned the highest citation share at 31%
- Original research and statistics pages were the most-cited content formats
- Thin affiliate pages performed poorly — averaging under 4% citation share
- Freshness matters more in news-adjacent topics than in evergreen categories
- Structured content (clear headings, FAQs, data tables) consistently outperformed dense prose
- The biggest surprise: well-optimized commercial sites outperformed forums in nearly every category
Why We Ran This 500-Query Study
Generative Engine Optimization — GEO — is no longer a niche concern for early adopters. It has become a core part of digital visibility strategy for anyone whose business depends on content traffic. And yet, most guidance on the topic is either too vague to act on or too focused on other AI systems like ChatGPT to be directly applicable to Claude.
Here’s what surprised us when we started researching: almost no published studies focused specifically on how Claude makes source selection decisions. There was plenty of content about AI citations generally, a lot about Google AI Overviews, and some useful material on Perplexity. But Claude? Largely untouched.
That gap matters. Claude is used by millions of people for research, purchasing decisions, health questions, financial queries, and more. If you are a publisher, marketer, or content strategist and you are not thinking about Claude citation behavior, you are leaving a significant visibility opportunity on the table.
We designed this study to give practitioners something concrete. Not broad principles. Not vague guidance. Actual data — directional but actionable — on what types of content, domains, and formats earn citations from Claude across real-world query categories.
Research Methodology
Our methodology was designed to replicate real-world user behavior as closely as possible while maintaining enough consistency to draw comparisons across categories.
Query Categories
We tested 500 queries distributed evenly across 10 topic categories — 50 queries per category. Categories were selected based on high commercial intent and existing SEO competition, so results would be relevant to publishers making real optimization decisions.
- Finance — investment, banking, credit, debt
- Insurance — auto, health, life, homeowners
- Healthcare — symptoms, treatments, medications, mental health
- Technology — software, hardware, SaaS tools, cybersecurity
- Travel — destinations, booking, visas, packing
- Education — degree programs, online courses, student loans
- Home Improvement — renovation, contractors, DIY, tools
- Ecommerce — product comparisons, buying guides, reviews
- Legal — contracts, rights, small business law, family law
- Local Services — plumbers, HVAC, pest control, cleaning
What We Tracked
For every query, we recorded the cited domains, classified each domain by type, noted the primary content format of the cited page, and assessed visible authority signals including author bios, citations and references, publication dates, and structured formatting.
We also tracked whether the cited content included original data, third-party references, expert authorship, or proprietary research — all signals we suspected might influence citation selection.
Limitations
Claude evolves. Any AI system trained on new data or updated with new fine-tuning will shift its behaviors. These results are directional and reflect patterns observed during our testing window in 2026. They should be treated as a strong baseline, not a permanent formula. Additionally, Claude does not publicly disclose its citation algorithm. Our conclusions are inferred from observed patterns, not confirmed by internal documentation. Think of this as applied research — practical and useful, but subject to the natural variability of any AI system.
The Biggest Findings
Here’s what the data told us — and where it surprised us.
In finance, healthcare, and legal categories, .gov domains accounted for over 40% of citations. CDC, NIH, IRS, and SEC pages were cited with striking regularity for queries that touched regulatory, health, or tax topics.
Pages that contained proprietary surveys, studies, or benchmark data were cited roughly 3× more often than pages covering the same topic without original data — even when the latter had stronger domain authority scores.
Reddit and Quora appeared in results far less frequently than most industry observers predicted. User-generated content without clear editorial standards was largely bypassed in favor of structured, authoritative sources.
Pages with clear H2 and H3 headings, data tables, FAQ sections, and scannable bullet points were cited more often than pages with equivalent information delivered in unbroken paragraphs.
The key is demonstrating genuine expertise, citing credible third-party sources, and presenting product information without heavy promotional language. Affiliate pages that looked like editorial content outperformed those that led with affiliate disclosures and product-first framing.
For evergreen topics like financial planning basics or insurance fundamentals, page age had minimal impact. For news-adjacent topics, recent publication dates correlated with higher citation rates.
Which Domain Types Claude Cites Most Often
Below is the full breakdown of citation share by domain type across all 500 queries. Note that these percentages reflect share of all citations recorded — not unique domains — so a single high-authority government page cited across multiple queries could represent several citation instances.
| Domain Type | Citation % | Strengths | Weaknesses |
|---|---|---|---|
| Government (.gov) | 31% | Authoritative, neutral, trusted by AI | Often outdated; limited commercial coverage |
| Educational (.edu) | 18% | Research-backed, expert authorship | Niche focus; not always practical |
| News & Media | 17% | Fresh, verified, widely indexed | Can be paywalled; political bias risk |
| Commercial (.com) | 14% | Broad coverage, actionable insights | Variable quality; promotional tone |
| Nonprofit (.org) | 9% | Trusted, mission-driven content | Narrow scope; smaller publishing volume |
| SaaS / Tech Blogs | 5% | Original data, product expertise | Self-promotional if not carefully written |
| Affiliate Sites | 4% | Practical buyer guides | Often thin; lacks authority signals |
| Forums / UGC | 2% | Real user experiences | Unverified; inconsistent quality |
The numbers tell a clear story: trust architecture matters. Claude appears to use domain type as a proxy for credibility, particularly for factual claims where a wrong answer carries real-world consequences.
The interesting exception is the commercial category. At 14%, commercial .com domains punched above their weight compared to what many publishers expected. The difference between commercial sites that got cited and those that did not came down to editorial quality, author credentials, and the presence of original data.
Content Formats That Earn Citations
Domain type creates the ceiling for citation potential. Content format determines whether you reach it. Here is what we found by format.
| Content Format | Citation Frequency | Why It Works |
|---|---|---|
| Original Research / Studies | High | Unique data earns citations that no other page can replicate |
| Statistics & Data Pages | High | AI systems anchor factual claims to numbered sources |
| Long-Form Guides (2,500+ words) | High | Depth signals expertise and covers multiple query intents |
| Comparison / Best-Of Pages | Medium | Useful for decision-oriented queries; AI highlights options |
| FAQ Pages | Medium | Structure matches AI answer-extraction patterns perfectly |
| Expert Roundups | Medium | Multiple credentials increase authority signals |
| Product Reviews | Low–Med | Cited when affiliate bias is minimized and data is present |
| Thin Listicles / Opinion Posts | Low | Insufficient depth for most AI citation thresholds |
Original Research
Nothing beat original research. If your site publishes a proprietary survey, a dataset from real customers, or a benchmark study from your own platform, you are creating something that no other page can replicate. Claude cannot source that claim anywhere else. That uniqueness is citation gold.
Think about it this way: if you run a budgeting app and publish an annual survey of spending habits across demographics, that data becomes a primary source. Government sites do not have it. Wikipedia does not have it. You do.
Statistics Pages
Dedicated statistics pages — the kind that compile data from multiple sources into one well-organized reference — performed exceptionally well. They match the exact pattern of how AI systems like Claude anchor factual claims: they need a number, they need a source, and they need to find it quickly.
If you are in any data-rich vertical — finance, health, SaaS, insurance — a regularly updated statistics page is one of the highest-ROI GEO investments you can make.
Long-Form Guides
Comprehensive guides of 2,500 words or more consistently earned citations when they demonstrated genuine depth, not just word count. Claude appears to reward coverage breadth combined with structural clarity. Guides that answered multiple related questions under well-organized headings outperformed those that answered one question at length.
Comparison Content
Comparison pages — best-of lists, head-to-head breakdowns, tool comparisons — were cited regularly for decision-oriented queries. When someone asks Claude to compare credit monitoring services or explain the difference between term and whole life insurance, comparison pages that provided structured, balanced analysis were consistently surfaced.
FAQs
FAQ sections within longer content performed strongly. Their structure maps almost perfectly to how AI systems extract direct answers: a clear question, a direct answer, and enough context to make that answer complete. If your content already has FAQs, make sure they are schema-ready and placed within clearly structured sections.
Product Reviews
Product review content can earn citations — but the bar is high. Reviews that cited manufacturer specifications, included real testing methodology, disclosed affiliate relationships transparently, and avoided promotional language consistently outperformed those that read like sponsored content. The closer a review looked to editorial journalism, the better it performed.
Authority Signals Claude Appears to Value
Authority is not just domain rating or backlink count. From our analysis, Claude evaluates authority through several visible signals — signals you can actually optimize for.
- Author Credentials Pages that displayed clear author bios with relevant professional credentials outperformed anonymous content consistently. A page about tax deductions written by a CPA who is named and credentialed performs better than the same page published under a faceless brand account.
- Expert Review Content that displayed “reviewed by” labels — particularly in healthcare, legal, and finance — showed higher citation rates. This is the GEO equivalent of E-E-A-T signals for traditional search.
- Citations and References The irony is real: pages that cite their own sources earn more citations from AI systems. Including links to government reports, peer-reviewed studies, or credible third-party data signals that your content is part of the broader knowledge ecosystem, not isolated opinion.
- First-Party Data As noted in the format analysis, original data is one of the strongest authority signals available. It signals that your organization has done primary research — a quality most content producers skip.
- Transparency About pages, editorial policies, disclosure statements, and business information all contribute to the trust architecture that AI systems use to evaluate credibility. Sites with clear ownership and transparent business models were cited more reliably.
- Freshness For time-sensitive verticals, keeping content updated matters. A healthcare page with a 2019 publication date and no update notice is a weaker citation candidate than one updated in the current year, even if the underlying content is similar.
- Structured Content Clear H2/H3 hierarchy, data tables, bullet points, and FAQ-style formatting all helped Claude locate and extract relevant information. Structure is not just user experience — it is AI readability.
Industry-by-Industry Results
Citation behavior is not uniform across industries. Here is what we found broken down by vertical.
| Industry | Top Domain Types | Top Format | Optimization Takeaway |
|---|---|---|---|
| Finance | Government (.gov, SEC), .edu | Statistics pages, guides | Add first-party benchmark data; cite regulatory sources |
| Insurance | State regulators, nonprofit orgs | Comparison pages, FAQs | Build transparent comparison tools with verified pricing data |
| Healthcare | NIH, CDC, Mayo Clinic | Research summaries, FAQs | Expert authorship and medical reviewer credentials are essential |
| SaaS / Tech | Company blogs, research firms | Original studies, benchmarks | Publish annual benchmark reports with downloadable data |
| Travel | Gov tourism boards, major publishers | Destination guides, listicles | Include current seasonal data and practical logistics |
| Education | University sites, .edu, nonprofits | Long-form guides, FAQs | Focus on curricula-aligned content with cited methodology |
| Ecommerce | Brand sites, review platforms | Product comparisons, reviews | Reduce promotional tone; add price data, specs, and user feedback |
The finance and healthcare categories showed the highest concentration of government citations. Legal followed a similar pattern, with state and federal government pages dominating factual query responses. Technology and SaaS showed the most opportunity for commercial publishers, with original research being the primary differentiator.
What Affiliate Marketers Can Learn
Affiliate content has a reputation problem with AI systems — and some of that reputation is earned. Pages that are thin, promotional, keyword-stuffed, and lacking in genuine expertise are exactly what AI citation systems seem designed to deprioritize.
But here’s the important part: affiliate content that mirrors editorial quality can and does earn citations.
What Works
- Product Reviews with Real Methodology Explain how you tested or evaluated products. Include specifications, comparison data, and clear evaluation criteria. Affiliate reviews that read like Consumer Reports journalism significantly outperformed standard “top 5” listicles.
- Comparison Pages with Neutral Framing Insurance comparison tools, credit monitoring service breakdowns, and budgeting software head-to-heads all performed well when they presented information neutrally and cited pricing from official sources.
- Buyer Guides with Original Insights Guides that answered specific use-case questions — “best identity theft protection for families,” “best credit monitoring for small business owners” — outperformed generic “best credit monitoring” pages when they demonstrated genuine understanding of the audience’s situation.
- Statistics Pages as Lead-in Content Several of the highest-performing affiliate-adjacent pages in our study were statistics pages that established credibility before linking to product recommendations. Publishing a comprehensive data page first, then referencing your review content, creates an authority bridge that improves both pages.
What Hurts
Heavy disclosure language at the top of a page, product-first framing without editorial context, and the absence of author information were the three clearest predictors of poor citation performance among affiliate-style content. These are also the easiest things to fix.
Claude vs ChatGPT vs Gemini vs Perplexity: Citation Behavior Compared
Not all AI answer engines cite sources the same way. Understanding the differences helps you prioritize your GEO efforts across platforms.
| Platform | Citation Style | Preferred Sources | Transparency | Publisher Opportunity |
|---|---|---|---|---|
| Claude | In-text + listed | Gov, edu, research, structured guides | High — source links shown | Strongest for research-heavy content |
| ChatGPT | In-text primarily | News, Wikipedia, official sites | Medium — varies by version | Strong for broad informational queries |
| Gemini | Inline citations | Google-indexed authoritative pages | Medium-High — Google-backed | Favors Google ecosystem properties |
| Perplexity | Always cited | Broad web; strong news & forums | Very High — core feature | Best opportunity for varied content types |
The key takeaway: Perplexity offers the most citation opportunities for diverse content types, but Claude’s citation behavior carries particularly strong weight for complex research and advisory queries. If your content targets high-trust, high-stakes decisions — financial planning, healthcare, legal questions — Claude optimization should be a priority.
Step-by-Step GEO Optimization Framework
Here is the framework we would recommend for any publisher looking to increase AI citation visibility, based on the patterns from this study.
- Publish Original Research Identify a question your audience asks that no one in your industry has answered with primary data. Survey your customers. Pull data from your platform. Commission a study. Publish it with full methodology and make it citable.
- Show Your Expertise Add named authors with relevant credentials to every piece of content. Include professional titles, relevant certifications, and links to professional profiles. If content requires specialized knowledge, add a “reviewed by” credential from a relevant expert.
- Add References Cite your sources. Link to government reports, academic studies, and authoritative third-party sources within your content. AI systems look for evidence that your content is part of a larger knowledge network.
- Create Statistics Pages Build at least one authoritative statistics page per major content cluster. Keep it updated. Use clear headings for each statistic or data point. This becomes a reference page that AI systems return to repeatedly.
- Improve Entity Coverage Make sure your brand, your authors, and your core topics are represented consistently across your website, your About page, LinkedIn, and any professional profiles. Entity clarity helps AI systems understand who you are and what you cover.
- Update Regularly Add update dates to all time-sensitive content. Create a content refresh calendar. For evergreen content, a note that it was reviewed in the current year signals maintenance that AI systems can detect.
- Use Structured Headings Organize all long-form content with clear H2/H3 hierarchy. Use headings that mirror the actual questions your audience asks. This structure makes it easy for AI to extract the relevant portion of your page.
- Answer Questions Directly The first sentence after every heading should directly answer what that heading asks. AI systems that extract answers for citation look for direct, declarative statements — not build-ups or preambles.
Real-Life Examples of the Framework in Action
Finance Website Increases AI Visibility
A personal finance site covering credit cards and budgeting tools was earning minimal citations from AI systems. The content was accurate and well-written, but all articles were published under a brand byline with no named author, no external references, and no original data.
After implementing the framework: they added named author profiles for their three main writers (all with finance backgrounds), added a monthly “consumer spending benchmark” statistics page using anonymized data from their own users, and rewrote their comparison pages to include pricing directly sourced from provider websites.
Within two months of republishing, the statistics page had been cited by Claude in 14 of 20 test queries related to consumer spending habits. Citation performance on comparison pages improved by an estimated 3×. The change was not the content — it was the authority architecture around it.
Insurance Comparison Site Gains Citations
An insurance comparison platform noticed their pages were rarely appearing in Claude’s responses to insurance queries, despite strong Google rankings. The pages ranked well but lacked the authority signals AI systems look for.
They restructured their auto insurance comparison page to include state-by-state average premium data sourced from state insurance regulatory filings, added a licensed insurance agent as named reviewer, and broke the content into FAQ sections using schema markup.
The results were clear: Claude began citing the state premium data table in responses to average auto insurance cost queries. The FAQ section earned direct extraction in response to policy coverage questions. The same content, restructured for AI readability, dramatically changed citation performance.
SaaS Company Publishes Benchmark Report
A project management SaaS company published an annual “State of Remote Work Productivity” benchmark report using aggregated anonymized data from their platform. The report covered team size, productivity metrics, meeting frequency, and tool usage patterns.
This single piece of content became one of the most-cited commercial sources in their category for AI query responses about remote work productivity. Because no other page could offer the same data, the report was irreplaceable as a citation source. It also drove backlinks, press coverage, and social shares — a GEO win that delivered traditional SEO benefits simultaneously.
Common Mistakes Publishers Make
If your content is not getting cited, one of these is probably why.
- Thin affiliate content with no methodology, no author, and no original insight — the most common citation killer we observed
- No author profiles or author information buried in bios with no credentials listed
- Outdated statistics with no update date visible — content from 2020 with no indication of review
- Missing citations within the content — pages that make claims without linking to supporting sources
- Weak or absent entity signals — no clear About page, no named business entity, no consistent author presence
- Keyword stuffing that destroys readability — AI systems extract and evaluate natural language, not keyword density
- Ignoring FAQ structure — pages that contain answers but do not format them for easy extraction
- Publishing content that is promotional in tone even when the topic is informational — this is the biggest trust signal problem for affiliate publishers
Future Trends in GEO and AI Source Selection
The landscape is moving fast. Here is where we see things heading in 2026 and beyond.
Just as technical SEO became a baseline requirement for search visibility, GEO optimization is moving from competitive advantage to baseline necessity. Publishers who do not adapt will see AI-driven traffic erode consistently.
Claude, ChatGPT, Gemini, and Perplexity are the current major players — but that landscape is expanding. Enterprise AI assistants, embedded AI in browsers, and specialized AI tools for finance, healthcare, and law will all make citation decisions. The principles here apply broadly.
AI providers are under increasing pressure to show their sources clearly. This is good news for publishers with strong authority signals — it means citations are increasingly visible and attributable. Building your citation profile now positions you for a more transparent ecosystem.
As AI systems get better at distinguishing original insights from repurposed information, the competitive advantage of original research grows. The publishers who invest in primary data collection now will build moats that are difficult for competitors to replicate.
The clearer your organization’s entity presence — consistent name, industry, author profiles, and content topical authority — the better your citation performance. This is the GEO equivalent of brand building, and it compounds over time.
Frequently Asked Questions
Final Thoughts
The question “how does Claude choose its sources?” has a deceptively simple answer: it chooses sources that demonstrate trust, authority, originality, and structure — and it does so in patterns that are consistent enough to optimize for.
This study confirms that GEO is not a guessing game. Publishers who invest in original research, transparent expertise, well-structured content, and credible references are building exactly the kind of content that AI systems are trained to surface. The principles are not new. The application to a new visibility channel is.
If you take one action after reading this: start with your statistics page. Create one authoritative, well-structured, regularly updated data page in your core topic area. That single piece of content — done right — has the potential to become your most-cited asset in AI systems for years to come.
The publishers who understand AI citation behavior now are building competitive advantages that will compound as AI-driven search continues to grow. Start optimizing for the citation, not just the click.
Want help building your citation strategy? TechCognate specializes in GEO, AEO, and AI search optimization for publishers, SaaS companies, and content teams. Get in touch and let’s make your content the one Claude cites.


