Why Isn't My Website Showing Up in ChatGPT? Fix It

You ask ChatGPT a question your site answers perfectly, and it cites three competitors instead. Here is why that happens and the specific fix for each cause.

By Outline Technologies June 26, 2026 14 min read
XLinkedIn

The short version: ChatGPT ignores your site for a short list of fixable reasons: blocked crawlers, unreadable content, weak entity signals, and thin answers. Diagnose which one applies, fix it, and give the index 4 to 12 weeks to catch up.

How ChatGPT Actually Finds Websites

Before you fix anything, you need to know how the machine works. ChatGPT pulls from your site through two completely separate paths, and they fail for different reasons.

The first path is training data. OpenAI trains its models on a giant snapshot of the web collected by a crawler called GPTBot, plus public datasets like Common Crawl. This is frozen knowledge. When ChatGPT answers from training data, it is recalling patterns it absorbed months or years ago. There are no live links. Any reference it produces in this mode is reconstructed from memory, which is exactly why it sometimes invents URLs that never existed.

The second path is live browsing, branded as ChatGPT Search (you may remember the SearchGPT preview). When ChatGPT decides a question needs fresh information, it runs a real web search, reads the top pages, and returns an answer with numbered clickable citations. This is where most of the citations you actually see come from. The search index behind it leans heavily on Bing. Studies suggest roughly 85 to 90 percent of ChatGPT Search citations also rank in Bing's top results.

Two paths, two failure modes. Training data decides whether ChatGPT "knows" your brand exists. Live browsing decides whether it cites your specific page today. You can be invisible in one and present in the other.

That split matters for diagnosis. If ChatGPT has never heard of your company even when you name it directly, you have a training-data and entity problem. If it knows your brand but never pulls your pages into a live answer, you have a crawl, content, or ranking problem. Most of the fixes below target the browsing path, because that is the one you can move in weeks instead of years. Keep both paths in mind as you read.

Reason 1: You Are Blocking the AI Crawlers

This is the most common cause and the most embarrassing one, because it is usually an accident. OpenAI runs three separate bots, and blocking the wrong one quietly removes you from ChatGPT entirely.

A lot of sites added a blanket Disallow for GPTBot during the 2023 panic about AI scraping, then forgot. Worse, some security plugins, CDN bot-management rules, and WAF presets silently block all three under an "AI bots" toggle. You can have a perfect article and still be invisible because a checkbox in Cloudflare is doing its job.

The fix: open your robots.txt and confirm you are not disallowing OAI-SearchBot or ChatGPT-User. A reasonable setup that stays in training while allowing search looks like allowing all three, or at minimum allowing OAI-SearchBot. Then check the layer above robots.txt. Look in your CDN or firewall for any "block AI crawlers" rule and turn it off for the OpenAI agents. Finally, verify the bots actually reach you by checking server logs for OAI-SearchBot hits.

Generate a clean file with the robots.txt generator, then confirm nothing downstream is overriding it with the AI Crawler Checker. This single fix resolves more "why am I not in ChatGPT" cases than everything else combined, so start here before you touch your content.

Reason 2: No Schema Markup

Schema markup is structured data in your HTML that tells machines what a page actually is. A human reads a page and infers "this is a how-to guide written by an SEO expert at a company called Acme." A crawler needs that spelled out. Schema does the spelling.

ChatGPT and the search index behind it use structured data to understand entities, relationships, and content type with high confidence. Without it, the model has to guess from raw text, and guesses are weaker than facts. Schema will not force a citation on its own. But it removes ambiguity, and ambiguity is friction. When two pages answer a question equally well and one has clean Article and Organization schema while the other has none, the marked-up page is easier to trust and cite.

The types that move the needle for AI visibility:

The fix: add valid JSON-LD schema to your key pages. Do not fake it. Marking up a review rating that does not exist, or stuffing FAQ schema with questions no human asked, is the kind of thing that gets a page distrusted. Build correct markup with the schema generator, paste it into your page head, and validate it. Start with Organization sitewide and Article on every blog post, then expand. Most teams see the easiest wins from getting authorship and dates right, because that is what AI freshness checks key on.

Reason 3: Your Content Is Not Quotable

Here is an uncomfortable truth. A lot of content ranks fine in Google and never gets cited by ChatGPT, because ranking and being quotable are different skills. Google can reward a 2,000-word page where the answer is buried in paragraph nine. ChatGPT wants a clean, liftable claim it can drop into a sentence with attribution.

Think about how the model uses your page. It reads the source, extracts a specific statement, and rewrites the answer around it. If your writing never makes a flat, standalone claim, there is nothing to extract. Vague, hedged, marketing-speak prose is quotation poison. "Our innovative solution helps businesses achieve their goals" gives the model nothing. "Schema markup does not directly improve rankings, but it increases citation rate by removing ambiguity" gives it something to lift.

The fix: write in extractable units. Concrete tactics:

Run a draft through the content grader to see whether your page has clear, extractable answers or just rambles toward them. The test that matters: skim any section and ask whether one sentence could stand alone as a cited answer. If not, rewrite until one can. Quotability is the cheapest lever here, because you control it entirely and it does not depend on anyone else's index.

Reason 4: Weak Entity and Brand Signal

ChatGPT does not think in pages. It thinks in entities. A brand, a person, a product, a concept, all connected in a web of relationships the model built during training. If your brand is not a recognized entity, you are not a candidate the model reaches for, no matter how good one page is.

Test it directly. Ask ChatGPT "what is [your company]?" If it confidently describes you, you are an entity and your problem is downstream (crawling, content, ranking). If it says it has no information, or confuses you with something else, or hedges, your entity signal is weak. That is the root cause, and no amount of on-page tweaking fixes it.

Entity strength comes from being mentioned, consistently, across many independent sources the model trusts. Not backlinks in the old SEO sense. Mentions. Your brand name showing up in articles, directories, forums, podcast transcripts, and reference sites, described the same way each time.

The fix: build entity presence deliberately.

This is the slowest fix on the list and the highest-leverage. A strong entity gets cited across hundreds of queries because the model already trusts who you are. Most teams find this takes months of steady mentions, not a single campaign. Start now, because the training-data path only updates when models retrain.

Reason 5: Thin or Buried Answers

Two failure modes hide under one symptom here. Thin content, and buried content. Both leave ChatGPT with nothing solid to cite, and they need different fixes.

Thin means the page technically covers the topic but says little. Three hundred words that restate the question, add a stock photo, and link to a contact form. The model reads it, finds no substantive claim, and moves to a competitor who actually answered. Thin pages also signal low effort, which drags trust down across your whole domain.

Buried is the sneakier one. The answer is genuinely there, just wrapped in 600 words of preamble before you reach it. ChatGPT often reads efficiently and weights the early, structured parts of a page. If your real answer arrives after a long origin story and three calls to action, it may never get extracted even though it exists.

The fix for thin: add real depth or merge the page into something stronger. Cover the actual sub-questions a person would ask next. Include specifics, examples, edge cases, and numbers. Depth here means answering completely, not padding to a word count. A focused 900-word page that fully answers one question beats a bloated 3,000-word one.

The fix for buried: restructure so answers come first. Use descriptive headings phrased as the questions people actually ask. Put the direct answer immediately under each heading. Move backstory, methodology, and persuasion below the answer, not above it. Add a short summary block at the top of long pieces.

The content grader flags pages where the answer-to-fluff ratio is off and where your key point sits too far down. Fix the structure first, because it is faster than writing new depth and often surfaces an answer that was hiding all along.

Reason 6: JavaScript-Rendered Content the Crawler Cannot Read

You can see your content perfectly in a browser and the crawler can see nothing. That is the JavaScript-rendering trap, and it silently zeroes out sites built on heavy client-side frameworks.

The mechanics: your server sends a near-empty HTML shell, then JavaScript runs in the browser and paints in the real content. Humans never notice. But many crawlers, including AI crawlers, are far less patient and far less capable at executing JavaScript than Google's renderer. If your article text only appears after a React or Vue bundle hydrates, the bot may grab the empty shell, see no content, and leave. To ChatGPT, that page is blank.

How to check fast: view the raw HTML source, not the rendered DOM. In your browser, use "view source" (not inspect element) or fetch the URL with a plain tool that does not run scripts. If your headings and body copy are missing from that raw HTML, AI crawlers are probably missing them too.

The fix: get your real content into the HTML the server sends, before JavaScript runs.

This one is binary. Either the content is in the source or it is not. Fix it and you do not improve your visibility incrementally, you turn it on. Many teams discover this is the entire reason a technically excellent site gets zero citations.

Reason 7: You Are Not in Common Crawl

Common Crawl is a free, open snapshot of the web, refreshed roughly monthly, and it is one of the foundational datasets AI models train on. If your pages are not in Common Crawl, you are missing from a major slice of the training-data path. The model literally never read you.

Why a site gets skipped: it is new and not yet discovered, it has few or no inbound links so crawlers never find a path to it, it blocks the CCBot user agent in robots.txt, or its content sits behind JavaScript and registers as empty even when fetched. Sometimes it is just bad luck on crawl timing, but usually it is one of those four.

Checking is straightforward. Common Crawl publishes its index, and several free tools let you query whether a domain appears in recent crawls. Search the Common Crawl index for your domain. If you get nothing across the last few monthly crawls, you have confirmed the gap.

The fix:

Set expectations honestly. Common Crawl feeds training, and training is the slow path. Getting indexed there pays off when models retrain, not next week. It is worth doing for long-term presence, but for fast wins, prioritize the live-search fixes. Use the AI Crawler Checker to confirm CCBot is allowed before anything else.

Reason 8: Low Topical Authority

Even with crawlers allowed and content clean, ChatGPT still has to choose which sources to trust for a given question. It does not pick at random. It leans toward sites that demonstrate authority on the specific topic, and the live-search path inherits much of that judgment from the underlying search index, which heavily reflects Bing's ranking.

Topical authority is depth on a subject, not size of domain. A focused site that has published twenty genuinely useful pieces on schema markup can outrank a giant general-interest publisher on a schema question, because it has shown sustained expertise in that lane. The model treats that depth as a trust signal. One isolated post on a topic you otherwise never cover reads as a one-off, and one-offs rarely get cited.

The fix: build subject depth on purpose, not breadth for its own sake.

Authority compounds. The third strong piece in a cluster lifts the first two, and a well-covered topic starts getting cited across many related queries at once. Run your cornerstone pages through the content grader to find weak spots, and treat topical authority as the project that makes every other fix work harder.

Reason 9: No llms.txt File

The llms.txt file is a newer convention, and being honest about its status matters. It is a plain-text or markdown file at the root of your domain that gives AI systems a clean, curated map of your most important content, free of navigation, ads, and clutter. Think of it as a guided tour you hand to a model instead of making it parse your full site.

What it is not, yet, is a guaranteed ticket into ChatGPT. Adoption across AI providers is still uneven, and no major model has promised to honor it. So treat llms.txt as low-cost insurance, not a silver bullet. It takes an hour to create, it cannot hurt you, and it positions you well if and when providers lean on it more. That is a good trade. Just do not expect citations to appear the moment you publish one.

A useful llms.txt does a few things:

The fix: generate one with the llms.txt generator, place it at the root of your domain, and keep it updated as your important pages change. Pair it with solid schema and clean HTML, because llms.txt complements those signals rather than replacing them. It belongs near the bottom of your priority list, below crawler access and content quality, but it is genuinely worth the hour it costs.

Reason 10: Your Site Is Simply Too New

Sometimes nothing is broken. Your site is just new, and the systems that feed ChatGPT have not caught up to you yet. This is real, it is common, and it is the one cause you cannot engineer your way around in a weekend.

Both paths have lag. The training-data path is the worst offender. A model trained before your site existed has zero knowledge of you, and that does not change until the next training run, which is on OpenAI's schedule, not yours. The live-search path is faster but still not instant. A brand-new domain has no inbound links, no crawl history, and no track record, so search indexes treat it cautiously before they will surface it in answers.

There is also a trust component. New domains start with low authority by default. Search systems have learned to be wary of fresh sites because spam churns out new domains constantly. You have to earn your way out of the probation period with signals that you are real and useful.

The fix is mostly patience plus the right groundwork:

Do not panic at week three. A new site that does everything right still typically waits a couple of months before live search starts citing it, and longer before training data reflects it. The work you do now is what makes that arrival happen at all.

How to Actually Check Which Reason Applies

Ten possible causes is a lot. Do not guess and do not fix randomly. Run a quick diagnostic to find which ones actually apply to your site, then fix in priority order.

Start with the two checks that catch the most common failures:

  1. Verify crawler access. Run your domain through the AI Crawler Checker. It reads your robots.txt and tells you whether GPTBot, OAI-SearchBot, ChatGPT-User, and CCBot can reach you. If OAI-SearchBot is blocked, stop here, fix it, and recheck. Nothing else matters until the bots can get in.
  2. Run a full audit. The AI SEO Audit checks the rest in one pass: whether your content renders in raw HTML, whether schema is present and valid, whether you have an llms.txt, whether answers are structured and extractable, and where the gaps are. It turns "something is wrong" into a specific punch list.

Then do the manual checks the tools cannot:

Fix in this order: crawler access first, then rendering, then content quality and schema, then entity and authority, then llms.txt. The early items are fast and binary. The later ones are slow and compounding. Do the cheap fixes before the patient ones.

Re-run the audit after each round of fixes so you can see the punch list shrink. That feedback loop keeps you honest about what is actually resolved versus what you only think you fixed.

A Realistic Timeline for Showing Up

Set honest expectations, because the gap between fixing something and seeing a citation is where most people give up too early. The timeline depends entirely on which path you are fixing.

Days, not weeks: crawler access. Unblocking OAI-SearchBot or removing a CDN bot rule takes effect as soon as the index recrawls you, often within days. This is the only near-instant fix, which is another reason to check it first.

Two to eight weeks: live-search visibility. Fixing rendering, content structure, schema, and on-page answers feeds the live-browsing path. Once the search index recrawls and reranks your improved pages, you can start appearing in browsed answers. Most teams find meaningful movement in the four-to-eight week range after the fixes land, assuming the pages are genuinely good.

Two to six months: entity and authority. Building brand mentions, topical depth, and inbound links is steady compounding work. You will not see it flip on. You will notice, over a couple of months, that ChatGPT cites you across more queries and describes your brand more accurately. This is the work that turns occasional citations into reliable ones.

Longest and least predictable: training data. Getting into the next model's parametric knowledge depends on OpenAI's retraining schedule, which you do not control. Being in Common Crawl and earning broad mentions positions you for it, but the payoff lands whenever the next major model ships.

The practical playbook: do the fast, binary fixes this week, the content and schema work this month, and the entity and authority work continuously starting now. Re-run the AI SEO Audit monthly to track progress. A site that does all of this typically goes from invisible to regularly cited within a quarter, sometimes faster on the live-search path. The sites that never show up are almost always the ones that fixed one thing, saw nothing in a week, and quit.

Frequently Asked Questions

Usually because their pages are easier for the live-search path to read and trust. Common gaps: they allow OAI-SearchBot while you block it, their answers are front-loaded and quotable while yours are buried, or they have stronger entity and topical authority signals. Ask ChatGPT your target question with search on, see who it cites, and compare their crawler access, page structure, and brand presence against yours.
Not from live search, but from future training. GPTBot collects training data, OAI-SearchBot builds the index for ChatGPT Search, and ChatGPT-User fetches pages on request. Blocking GPTBot opts you out of model training while leaving live citations intact. Blocking OAI-SearchBot is the one that removes you from browsed answers. Many sites accidentally block all three through a single CDN or firewall AI-bot toggle, so check that layer too.
No. Schema removes ambiguity about what your page is, who wrote it, and when, which makes you easier to trust and cite. It does not force a citation by itself. Treat it as one signal among several. Add valid Organization and Article schema as a baseline, never fake ratings or FAQs, and pair it with quotable content and good crawler access. Schema helps most when two pages are otherwise equal.
View the raw page source, not the inspected DOM. In your browser, use view-source rather than inspect element, or fetch the URL with a tool that does not run scripts. If your headings and body text are missing from that raw HTML, AI crawlers likely see a blank page. The fix is server-side rendering, static generation, or prerendering for bots so your real content ships in the initial HTML.
It is a markdown file at your domain root that gives AI systems a clean, curated map of your important pages without navigation and clutter. Adoption is still uneven and no major model guarantees it honors the file, so treat it as cheap insurance, not a silver bullet. It takes about an hour to create, cannot hurt you, and positions you well if providers lean on it more. Put it low on your priority list, below crawler access and content quality.
It depends on the path. Unblocking crawlers can take effect within days. Content, schema, and rendering fixes feed live search and typically show movement in four to eight weeks after a recrawl. Entity and authority building compounds over two to six months. Getting into training data depends on OpenAI's retraining schedule and is the slowest. A site that fixes everything usually goes from invisible to regularly cited within a quarter.
Very possibly. New domains have no crawl history, no inbound links, and no track record, so live search treats them cautiously and training data has never seen them at all. There may be nothing broken. Do every other fix now so nothing else holds you back once you are discovered, earn early links and brand mentions, and publish consistently. Most new sites wait a couple of months before live search starts citing them.
Outline Technologies logo

Outline Technologies

We build SEO, GEO, and AI optimization tools and strategies. FreeGPTSEO is our free toolkit for checking and improving AI search visibility.

Check How You Score Right Now

Run a free AI SEO audit on your site. See your score across schema, content, meta tags, and AI crawler access. Takes 5 seconds.

Run Free Audit
Last updated: June 26, 2026