Increasingly, websites suffer from Google indexing problems without the website owner’s knowledge. And although this happens most often with online stores, it happens just as often with normal content websites.
In recent years, out of every 10 website analyses, I came across 5-6 websites with this exact problem. So that’s a good 50%. And in most cases the website owners had no idea about it. What a shame, because it means you are missing out on visitors!
Although indexation problems are nothing new, I am increasingly encountering complaints through various SEO channels. This is probably due to the reports in Google Search Console. You used to find this out only by doing an analysis with Screaming Frog or Sitebulb, for example. But nowadays you can find many of these indexing issues through Google Search Console.
And in particular, I’m talking about the coverage report: crawled – not currently indexed. See below for an example of my own site.
Below this graph are all kinds of URLs that are found by Google, but not shown in the search engine.
Start with the basics: technology
If you want to rank higher in Google, you must first have a good foundation. And that means a website built in such a way that all pages can be found by following internal links. But that’s certainly not all …
In fact, you need to have only the pages indexed that are NEEDED. Nothing more, nothing less.
Just do a quick check, by doing the following in Google. Enter your own domain name, and before that type site: So like it says below: site:mydomain.com. And now let Google do a quick search.
Then you’ll see under the name of your domain something like this: About 76results. This means that Google has indexed about 76 pages. This is just a rough estimate, because Google never shows the exact number. But it will give you an indication if something is wrong.
If that number does not match the number of pages you have on your Web site, you will probably suffer from Crawl and Indexation problems.
Which pages shouldn’t be indexed by Google?
Images
I’ll refer to an old Yoast SEO plugin problem, where images were indexed as being normal pages. That had become a very big problem for many websites. In fact, images should not be indexed as pages. Even today, some websites are still reaping the sour fruits of this fiasco.
Don’t believe me?
Just search for example: img_1280 (or any number) in Google, and just click on a few search results. You’ll see that some pages will show up with just an image. And that’s where everything went completely wrong. This was (and still is) a known problem, and of course you want to avoid something like this. And that’s why an in-depth SEO audit is so much needed. Indexing problems exist!
Filters
Many online stores use layered navigation (filters) to search for products. For example, you can filter by price, size or color. Some websites are set up so that each option in a filter gets a unique, indexable URL. Imagine you have 4 filters, with each filter having 10 choices (for example, think of a filter for Color: and then 10 different colors).
How many options are there if there are 4 filters with 10 choices each? 10,000 (10x10x10x10)
And do you think Google is happy with 10,000 URLs of which 99.9% have exactly the same content?
Nope!
Below is an example of such a filtering problem. This is from a gardening accessories store. The URL contains the type of plants, sun exposure level, and hardiness zone, and soil moisture level).
Authors, Tags, Archives
Are you using WordPress?
WordPress automatically creates URLs for all Tags, Authors and Archives.
Tags, Author pages, etc. generally do not have unique content. Usually it is just an overview of all articles belonging to that category/author, with no added value to the site. Such an overview might be fine for a user who visits the site often, but not really for search engines!
Search engines see this kind of uninteresting page as low quality, and it pollutes their index a bit. And the more you have this kind of page indexed, the more likely you are that Google is going to kind. If you want to find out how well they actually perform, simply use Google Search Console and see how many impressions and clicks they get.
Indexing problems caused by Canonical Tags
Canonical tags indicate which URL the search engines should index. For example, consider a Shopify Web store (which is a perfect example of this problem). All products are given their own unique URL. For example, mystore.com/product/red-nike-shoe-with-laces.
But depending on which category you are going to view the product from, the URL changes. With Shopify it then often becomes: mystore.com/collections/nike/red-nike-shoe-with-laces
So then there are 2 URLs where the product can be viewed. This is when a Canonical URL is used, so that the search engine knows exactly which URL to index and show in the search results.
Unfortunately, things sometimes go wrong with the use of Canonical tags. For example, I encountered a gross error during an SEO audit for a car dealership. To put a car ad on their own website, they had to upload all the data including photos to a web builder’s server. This then loaded the data onto the seller’s website via iframe.
This system was specially developed so that it could easily be linked directly to large other car sales platforms…. Super convenient because the listings could be seen on multiple platforms without having to upload it to each platform individually… but it was tremendously poorly executed.
The problem was that each listing was given the canonical URL of the main category. This was mystore.com/occassions/ which gave all cars the same canonical URL so Google did not index all occassions, and Bing did not index any cars at all!
Fortunately, in some (not all) cases Google ignored the canonical URL (and does so in good faith) and just indexed them with the correct URL, but Bing refused. We are now almost 6 years on, and this party still has the same problem, not showing some occasions in Google, and nothing in Bing.
What is the result of an indexing problem?
First, it can result in Google indexing all sorts of unnecessary, unimportant pages.
Second, the number of low-quality pages (especially if there are many) affects the search results of the whole site. Just think of those 10,000 filter options from above. These will then be ignored, causing some products or categories to not show up in Google.
And if a page is not in Google…. then those pages will never receive visitors from Google search.
Once you know which pages don’t belong in the index, you can exclude them for indexing. Start with certain filters, tags, authors, etc. And then look at pages like Hello World that comes with WordPress by default. Another tip is to use the site: search operator in Google, and search for the words copy, -2 etc. Because many times the CMS will add those automatically when a page is duplicated.
Do you need help solving your indexing problems?
First find the culprit.
- your website must be well designed so that all important pages can be found by a crawler.
- use a sitemap.xml file
- create an html sitemap.
- use URL inspecition tool in Google for URLs that are not indexed.
- create a (back)link to the URL that has problems getting indexed.
- use the Google Index API
Frequently asked questions about indexation
What are indexing problems?
Google keeps a whole library of all the websites and pages that can be found there. So this index is filled with all sorts of things it comes across. But you, as a website owner, can specify which pages can/can’t be indexed. Yet it regularly goes wrong. Either too many pages are indexed, or sometimes indexing fails.
What are the most well-known indexing problems with WordPress?
Probably the most well-known at the moment is: pages are found by Google, but not indexed. The biggest problem I encounter is still that far too many unnecessary pages are indexed.
How do you find these problems?
You can find them manually in Google by performing a search with Site: However, you can also find problems in the reports within Google Search Console. Finally, you can find them with Screaming Frog, for example.