What is a Noindex Meta Tag?

Noindex is a meta tag that instructs search engines not to include a specific page in their search results.
The noindex tag can be placed in the section of the page’s HTML or in an HTML header returned by the web server.
If a page has been indexed by Google and then a noindex tag was added, the page will be removed from Google search results after Google crawls the page again.


Example:

<meta name='robots' content='noindex, follow' />
Noindex Meta Tag in view-source.

Noindex Pages and LLM Usage

What “noindex” actually does

  • A noindex tag tells search engines not to include a page in their public search results.
  • It does not hide the page from the internet.
  • It does not block access.
  • It does not prevent an AI system from reading it if the system is given the URL or the text directly.

Effect on LLM training

  • LLM training datasets are collected long before the model is released.
  • Whether a noindex page is included depends entirely on how the dataset was built.
  • Some crawlers respect noindex; others don’t.
  • Even if included, the model does not store the page as a retrievable document — it only learns statistical patterns.
  • Therefore:
    A noindex page cannot be reliably “cited” by a pretrained LLM unless the system explicitly provides that page during inference.

Effect on retrieval‑augmented systems (RAG)

This is the only scenario where a noindex page becomes a guaranteed usable source.

If a system:

  • downloads the page,
  • indexes it in a private vector store,
  • or feeds it directly into the LLM’s context window,

then:

  • the LLM can read it,
  • ground its answer in it,
  • and cite it as a source even if the page is noindex.

In RAG systems, noindex is irrelevant because the content is not discovered by search engines — it is manually supplied.

What noindex does not do

  • It does not prevent an LLM from using the page if the developer/user feeds it in.
  • It does not guarantee privacy.
  • It does not guarantee exclusion from future datasets.
  • It does not make the page “invisible” to AI.

The real‑world truth

  • Noindex only affects search engine indexing.
  • LLMs do not rely on search engine indexing.
  • LLMs can use noindex pages as sources ONLY when the content is explicitly provided to them.

You might also be interested in:

What is HTTP Header?

What is Redirect?

What is 301 Redirect?

What is 200 OK Status?

What is 302 Redirect?

What is 404 Error?

What is Soft 404 Error?