What can cause "Discovered - currently not indexed" in the new GWT

42,169

Solution 1

It is just part of the process. There is nothing for you to do until you actually get errors.

To understand the categories, you need to understand how indexing works which is something that is done pretty much continuously:

Googlebot fetches a page which means downloading its content onto Google servers. When this happens, the page is crawled.
It later puts the content of the downloaded page into the index. This means the page is indexed.
While crawling the page, it finds like and places them in a queue. Those links are discovered.

So:

Discovered not indexed means the link has been added to the queue of things that Googlebot may eventually crawl. Since the web is virtually infinite and there is a prioritization, it may never actually get there.
Crawled not currently indexed means that the page was downloaded onto Google servers but its contents have not been inserted into the index.

Solution 2

Google may discover and crawl your pages, but it does not mean that it will necessarily index them.

There are many reasons why Google might not index a page. Perhaps it found duplicate content. Perhaps it does not feel that it offers enough value for any specific search queries. Google may have found something about your page that it doesn't like.

Whatever the reason might be, Google just hasn't decided to index some of the pages that it has discovered and crawled on your site. It's pretty normal for some of your pages not to be indexed. Some of the pages on my sites aren't indexed despite it having better content than many indexed pages and having a substantial amount of internal links. Google's indexing system is a machine learning algorithm. And so it may decide not to index some pages sometimes for a variety of unknown reasons.

Solution 3

Googlebot has its own queue, based on different parameters. This is the message about those URLs are even queued.

Aj Cohn has a kind of funny opinion about the meaning of this message:

Discovered – currently not indexed seems to indicate that they see it in your sitemap but based on how other content looks they’re not even going to bother crawling it. Essentially, “Ya ugly!” Or, maybe it’s just a representation of poor crawl efficiency.

Frankly, I’m not entirely sure that the definition of Discovered is accurate since many of the sample URLs under this status have a Last crawled date. That seems to contradict the definition provided.

Solution 4

Even though Google discovered the URL it did not feel it was important enough to spend time crawling. If you want this page to receive organic search traffic, consider linking to it more from within your own website. Be sure to promote this content to others with the hope that you can earn backlinks from external websites. External links to your content is a signal to Google that a page is valuable and considered to be trustworthy, which increases the odds of it being indexed.

View more solutions

42,169

matt137

Updated on September 18, 2022

Comments

matt137 over 1 year

The new GWT shows sitemaps links divided into new categories. Two that confuse me: 1. Discovered- currently not indexed 2. Crawled - currently not indexed

What are the possible reasons for this, and are there any site wide implications? Is this a sign from Google I should consider removing these?
- BigName over 3 years
  
  There can be many reasons. One of them is that Google did not find the content important or useful to crawl and index. Here is a fix for that: therealweblog.blogspot.com/2020/07/…
Stephen Ostermiller over 6 years

I removed the "read the manual" bit. That isn't helpful. Especially since Google's documentation is pretty poor.
Stephen Ostermiller over 6 years

I think "not currently indexed" could also mean that Google is choosing not to index it. It could be duplicate or look low quality. It might not have enough Pagerank. Google might think it won't match enough searches.
markof over 6 years

This is what I was hinting at with prioritization. This happens with links that are suspected duplicates (sometimes because they differ on only a parameter or match another canonical).
sylbru over 5 years

"Perhaps it found duplicate content" > I’d like to add that In this case though, since the message is "Discovered - currently not indexed," it indicates that the page hasn't even been fetched (or it would be "Crawled - currently not indexed"). So at this point they don't know anything about its content. So I guess prioritization at this stage depends on the URL itself, the number of pages linking to it, and possibiy other things.
GLCoder over 4 years

Pages drop in and out of "Crawled not currently indexed" without any attention from me. The information is also dubious. Example: an inspected url results in a message that it IS indexed, but not in any sitemap... the page is in the sitemap and has been for years. The recent algo update sent this report hither and thither.