Google indexing Wordpress wp-content directories and all internal files

6,162

Solution 1

Fist of all, you should block the indexation through:

Secondly, start deleting the URLs that have been already indexed from Google Index using Google URL removal tool

Sadly, the above tool only lets you delete one by one. You can generate 410 error to those URLs (the second option I commented before) but keep in mind that it's an advanced server configuration cause you just want to prevent the URLs from being indexed but you still need to access them as admin.

Solution 2

The problem is that your server is creating an automatic page for all directories that lists all the files in that directory.

In Apache, this feature is provided by mod_autoindex. If there is no, index.html file in the directory, this Apache module will automatically create a page that shows a directory listing.

One way to stop it from happening is to create blank index.html files in each and every directory. Google won't index blank pages. It would treat them as "soft 404" pages.

Another way is to disable directory listings. This can either be done in your Apache .conf files or in your .htaccess file as long as AllowOverides is on.

Options -Indexes

For more examples see the "Directory Listings" section of https://wiki.apache.org/httpd/DirectoryListings

Share:
6,162

Related videos on Youtube

Kittu
Author by

Kittu

Updated on September 18, 2022

Comments

  • Kittu
    Kittu over 1 year

    Recently I noticed that the total number of crawl pages have been increased. So I searched on google using site: and got several pages with a directory wp-content. I also checked Google Search Console, and there was huge hike in total number of crawled pages in 17th June 2018:

    enter image description here

    And on same date, website traffic dropped, if I am not wrong, Google crawled several unwanted pages under the wp-content directory and that might be the main issue, please see this screen shot (traffic started going down from 16th June 2018): enter image description here

    Here is screen shot of wp-content directory indexed:

    enter image description here

    Kindly guide, how to stop google to stop crawling "wp-content" directory so that Duplicate and blank pages won't get crawled and at the same time, it should not affect crawling original pages.

  • Kittu
    Kittu almost 6 years
    I have block the wp-content directory using robots then google won't be able to crawl images, and will start showing several 404 error message. Same thing if i block it using .htaccess file.
  • Emirodgar
    Emirodgar almost 6 years
    Robots.txt it's not intended to remove URLs from the index, only to prevent crawling of URLs/folders/parameters/etc. If they have already been indexed, you need to remove them with the tool I mentioned before.
  • Simon Hayter
    Simon Hayter almost 6 years
    My advice is not to use either robots.txt or htaccess deny. Just edit the .htaccess and add Options -Indexes to the top of the file, done, sorted and it'll help secure other folders that you may not be aware off being indexed.
  • Kittu
    Kittu almost 6 years
    Did exactly same thing, Options - Indexed, thank Simon!!
  • Kittu
    Kittu almost 6 years
    Not sure about index.html page,tried this method long back but had bad impact: searchenginejournal.com: Potentially Misidentified as Soft 404 There are also cases wherein the page isn’t actually missing, but certain characteristics have triggered Google to categorize it as a missing page. Some of these characteristics include a small amount or lack of content and having too many similar pages on the site. These characteristics are also similar to the factors that the Panda algorithm tackles. The Panda update considers thin and duplicate content as negative ranking factors.