Is Your Content Discoverable by Humans Using AI? | BDIGITAL

I run three websites. A media portfolio for my videography business, a tech blog for AI consulting, and a truck build site documenting my Tacoma Trailhunter. Traditional SEO covered Google. But when someone asks ChatGPT “who shoots Cirque Series trail running in Arizona” or asks Perplexity “what are 74Weld portal axles,” my sites were invisible.

That is a problem, and the data says it is getting worse fast.

The Zero-Click Crisis

58.5% of Google searches now end without a click. The searcher gets their answer from a featured snippet, a knowledge panel, or an AI Overview and never visits your site. For searches where Google’s AI Overview appears, organic click-through rates drop by 61%. AI Overviews have appeared in up to 25% of Google searches at peak.

This is not a future problem. Publishers are already reporting 20-90% traffic losses from AI-generated answers that summarize their content without sending visitors. Gartner projects a 25% decline in traditional search volume by the end of 2026.

But here is the part most people miss. AI search is not just stealing traffic. It is creating a new channel. By June 2025, AI referrals to top websites were up 357% year-over-year, reaching 1.13 billion visits per month. And early data suggests those visitors convert well. Similarweb found around 7% for transactional AI referrals, which is competitive with organic search.

It’s not a question of whether people are using AI to find businesses like yours. They already are. The question is whether your content shows up when they ask.

What Actually Works (and What Might)

I spent time researching what techniques have real evidence behind them before implementing anything. Here is what I found.

Proven techniques with data:

Researchers at IIT Delhi and Princeton published a study on Generative Engine Optimization at KDD 2024. They found that adding citations, statistics, and direct quotations to your content increases visibility in AI-generated answers by up to 40%. This is the single most impactful finding in the space right now.

FAQ schema helps AI systems extract your content. BrightEdge found a 44% increase in AI search citations for sites with structured data, and a 50-site study by Relixir found a median 22% citation lift with FAQ and HowTo schema. The reason is simple: FAQ schema gives LLMs clean, structured question-answer pairs they can quote directly with attribution. One important caveat: Google restricted FAQ rich results to government and health sites in August 2023, so you will not see expandable FAQ accordions in Google SERPs. The value now is in AI citation, not Google rich snippets.

Cross-site entity linking via sameAs structured data has supporting evidence. In a Schema App case study, sites that added entity links for location disambiguation saw 46% more impressions and 42% more clicks on non-branded queries. That study was specifically about clarifying which geographic entity a page referred to, not cross-domain linking between your own properties. But the underlying principle is sound: when an LLM understands that your three websites, your LinkedIn, and your Instagram all belong to the same person, it builds a stronger entity profile. Whether that directly translates to more citations is logical but not yet proven at scale.

Speculative but low-cost:

Consider the llms.txt specification, a proposed standard by Jeremy Howard (fast.ai co-founder) for helping LLMs understand your site. Think of it as a markdown file at your site root that summarizes who you are and what you do. I should be upfront: the evidence is not encouraging. No major AI company has confirmed their crawlers use it. Google’s John Mueller called it “the new meta keywords”. An SE Ranking study of 300,000 domains found no correlation between having llms.txt and being cited by AI. Server log analyses show major AI crawlers do not request the file. It takes 20 minutes to create and costs nothing, so I keep mine as a hedge. But I would not prioritize it over the proven techniques above.

SpeakableSpecification schema was designed to tell voice assistants which parts of your page to read aloud. It has been in beta since 2018, limited to US Google News publishers only. Google Assistant, the only platform that ever used it, is shutting down in March 2026, replaced by Gemini. No AI chatbot has confirmed reading it. I still have it in my templates because it costs nothing, but I would not recommend anyone prioritize it.

I implemented all of these techniques. I chose the proven ones because the data supports them, and the speculative ones because they cost almost nothing. But I want to be honest about which is which.

What I Built

Here is what changed across all three sites.

llms.txt and llms-full.txt

Each site now has two files at the root. llms.txt is a short summary: business name, services, contact info, notable clients, links to related sites. About 40 lines. llms-full.txt is the deep version with full service descriptions, blog post summaries, equipment lists, and FAQs. About 200 lines.

Here’s the key insight from the GEO research: include the actual answers to questions people ask. Not marketing copy. If someone asks an AI “what camera does Bob Ulrich use,” the answer should be right there: Sony FX30, Sony A7V, Sony A1, GM II lenses, DaVinci Resolve. If someone asks “what are portal axles,” the explanation should be complete enough that an LLM can quote it directly.

Each file cross-references the other two sites, creating a connected information graph that any AI reading one site can follow to the others.

AI Crawler Permissions

Before this work, my main site’s robots.txt welcomed 5 AI crawlers. The tech and truck sites had zero AI-specific rules.

Now all three sites welcome 14 crawlers:

Crawler	Service
GPTBot	ChatGPT / OpenAI
ChatGPT-User	ChatGPT browsing
OAI-SearchBot	OpenAI search
ClaudeBot	Claude / Anthropic
anthropic-ai	Anthropic training
Google-Extended	Google AI / Gemini
PerplexityBot	Perplexity AI search
Applebot-Extended	Apple Intelligence / Siri
Meta-ExternalAgent	Meta AI
Bytespider	ByteDance / TikTok
cohere-ai	Cohere models
CCBot	Common Crawl (training data)
Amazonbot	Amazon / Alexa
YouBot	You.com

That is 14 crawlers times 3 sites, which means 42 explicit access grants where previously there were 5.

A note on strategy: some site owners distinguish between training bots (GPTBot, CCBot) and retrieval bots (ChatGPT-User, PerplexityBot). They block training but allow retrieval. That is a valid choice if you are worried about your content being used for model training. I chose to allow everything because my goal is maximum surface area, and training data inclusion means LLMs are more likely to know about my business even without live retrieval.

Cross-Site Entity Linking

Schema.org’s sameAs property tells search engines and LLMs that multiple URLs belong to the same entity. Before this work, each site’s structured data linked to its own social profiles but not to the other sites.

Now every site’s JSON-LD includes sameAs links to all three domains plus LinkedIn, GitHub, Instagram, and YouTube. The main site uses LocalBusiness schema, the tech site uses ConsultingService, the truck site uses WebSite. All three cross-reference each other.

Every blog post’s BlogPosting schema includes author sameAs arrays that cross-reference all three sites. An LLM reading a truck blog post about CBI bumpers now understands that the author also runs an AI consulting practice and a media production business.

Entity consolidation is the principle at work. When an LLM builds its understanding of who you are, it connects all your properties into one entity instead of treating them as three strangers. In the Schema App case study, this helped with location disambiguation. Whether it has the same effect for cross-domain personal branding is logical but unproven.

FAQ Schema for LLM Citation

This is the highest-ROI change. FAQPage JSON-LD gives LLMs structured question-answer pairs they can quote with attribution. Each pair is a citable unit.

I added FAQ schema to three high-value pages:

Main site /services — 5 FAQs covering turnaround time, travel availability, usage rights, FAA drone certification, and camera equipment.

Tech site homepage — 4 FAQs covering vibe coding, AI consulting services, tools used, and Claude Code remote control.

Truck site /build — 5 FAQs covering portal axles, daily driving feasibility, tire sizing, CBI Covert bumper, and billet flex joints vs Heim joints.

That is 14 structured question-answer pairs across three sites. The llms-full.txt files mirror these same questions with even more detailed answers, creating multiple paths for AI systems to find the same information. Note: Google restricted FAQ rich results to government and health sites in August 2023, so these will not produce visible FAQ accordions in Google search results. Value here comes from giving AI systems structured, quotable content, not from Google rich snippets.

Speakable Schema on Blog Posts

I added SpeakableSpecification to every blog post template:

"speakable": {
  "@type": "SpeakableSpecification",
  "cssSelector": ["article h1", "article header p", "article .prose p:first-of-type"]
}

I want to be honest about this one: there is no evidence that any AI system reads this markup. Google’s speakable implementation has been in beta since 2018, limited to US Google News publishers. Google Assistant, the only platform that ever supported it, is being replaced by Gemini in March 2026. No LLM vendor has confirmed using speakable selectors for content extraction.

I keep it in my templates because it is three lines of code that cost nothing. But if I were starting over today, this would be at the bottom of my priority list. Real value is in the content structure itself: clear headings, concise opening paragraphs, information-dense summaries. Not the markup pointing at them.

The Compounding Math

Here is what changed in concrete terms.

Before:

1 sparse llms.txt on 1 site
5 AI crawlers allowed on 1 site
0 FAQ schema anywhere
3 isolated entity graphs

After:

6 LLM context files across 3 sites (unproven but low-cost)
42 total crawler access grants (14 per site)
14 FAQ structured answers across 3 sites (supported by multiple studies)
1 unified entity graph linking 3 domains + 4 social profiles (logical, limited direct evidence)

A single question like “what camera does Bob Ulrich use” can now be answered from the main site’s FAQ schema and the main site’s llms-full.txt. Multiple sources confirming the same answer makes an LLM more likely to surface it. That is the GEO research in action: citations and repetition across sources boost visibility by up to 40%.

Cross-site entity linking means a question about portal axles on the truck site can lead an LLM to discover the media services on the main site. A question about vibe coding on the tech site can lead to the truck build documentation. Each site acts as a discovery vector for the others, at least in theory.

Every new blog post automatically inherits the cross-site author linking and AI crawler access. The infrastructure compounds with every piece of content published. But I want to be clear: the proven techniques (FAQ schema, GEO content principles, entity linking) are doing the heavy lifting. The speculative ones (llms.txt, speakable) are cheap insurance with no confirmed payoff.

What You Are Missing Without This

If you have not thought about AI search at all, here is what is at stake.

Traffic you are already losing. AI Overviews summarize content from other sites and show it above your organic listing. That 61% CTR drop is real and measured. Sites with well-structured content and FAQ schema are more likely to be the ones cited.

The fastest-growing referral channel. AI referral traffic grew 357% year-over-year to 1.13 billion visits per month. If your content is invisible to ChatGPT, Perplexity, Google AI Overviews, Apple Intelligence, and Meta AI, you are missing a channel that is growing faster than any other.

Entity authority. Without sameAs linking, your LinkedIn, your website, your Instagram, and your YouTube look like separate people to an LLM. You are splitting your authority across disconnected properties instead of building one recognizable entity.

Citation opportunities. FAQ schema gives AI systems structured, quotable answers. Without it, the AI has to parse your unstructured content and may prefer a competitor whose content is easier to extract from. But the biggest factor remains content quality. No amount of schema compensates for thin content.

How to Implement This on Your Own Site

Here is the priority order based on what the research says has the most impact.

1. Add FAQ schema to your highest-traffic pages. This is the highest-ROI change for AI discoverability. Google will not show FAQ rich results for most sites (they restricted that to government and health sites in August 2023), but the structured data still helps AI systems extract and cite your content. Pick your services page, your about page, your most-visited blog posts. Add FAQPage JSON-LD with real questions people actually ask. Not marketing questions. Real ones. “How much does X cost?” “What equipment do you use?” “Where are you located?” Make the answers specific and quotable.

2. Cross-link your properties with sameAs. If you have multiple sites, social profiles, or directory listings, link them all with sameAs in your structured data. Build one entity, not many. The evidence for this is strongest in location disambiguation, but the logic extends to personal and business branding, helping AI systems consolidate your authority instead of fragmenting it.

3. Update robots.txt for AI crawlers. Add explicit Allow: / rules for every AI crawler you want to welcome. The full list is in the table above. This takes 5 minutes and opens 14 doors that were previously closed.

4. Create llms.txt and llms-full.txt (optional). This is the most speculative technique on this list. Large-scale studies have found no correlation between having llms.txt and being cited by AI, and server log analyses show major AI crawlers do not request the file. Google’s John Mueller has compared it to the deprecated meta keywords tag. That said, it takes 20 minutes and costs nothing. If you create one, focus on answering real questions with specific, quotable facts. That content is useful regardless of whether any crawler reads the file.

5. Skip speakable schema. I originally recommended SpeakableSpecification here, but after further research I cannot justify it. It has been in beta for 8 years, only ever worked for US Google News publishers on Google Assistant, and Google Assistant is shutting down in March 2026. No AI chatbot reads it. I still have it in my templates, but I would not tell anyone else to spend time on it. Focus on writing clear, structured content instead. That is what actually helps AI systems extract and cite your work.

6. Apply GEO principles to your content. The KDD 2024 research is clear: citations, statistics, and direct quotations boost AI visibility by 30-40%. When you write content, include specific numbers, cite sources, and use quotable sentences. This is not just good for AI. It is good writing.

All of this took about an hour across three sites. Every build still passes, all tests pass, and the changes are purely additive. No existing SEO was harmed.

If you want to see the actual files, the llms.txt for each site is live: bdigitalmedia.io/llms.txt, tech.bdigitalmedia.io/llms.txt, truck.bdigitalmedia.io/llms.txt.

My portfolio is where you can see the media work. The tech site covers AI consulting and development. If you want help implementing LLM SEO for your own sites, get in touch.