Remove subdomains from Google index and stop indexing them

38,359

Solution 1

You can remove the sub-domains in webmaster tools, but first you need to add the sub domains as seperate sites and then submit a site removal. They should be gone within a day or so.

See these instructions for removing a site from google : https://support.google.com/webmasters/answer/1663427?hl=en

Solution 2

Short answer.

Put <meta name="robots" content="noindex"> in the header of your HTML for all pages. Once the search engines have spidered these pages and you are sure of it, put

User-agent: *
Disallow: /

...in a robots.txt file in the root directory of each sub-domain.

This will take time of course. It can take 30-60 days typically for say Google to notice the changes and reflect it in the SERPs. It can take less or more time depending upon how Googles gauges freshness for your sub-domains.

Share:
38,359
CamSpy
Author by

CamSpy

Updated on September 18, 2022

Comments

  • CamSpy
    CamSpy over 1 year

    I am running static content through a CDN, cdn1-cdn5 I am using such subdomains for that.

    I am loading just images, CSS and JS files this way, but apparently Google has indexed some pages on subdomains and they now appear in the Google index and they are duplicates of my "normal" pages.

    The thing is that CDN is set the way to have files appear on subdomains without any extra uploading of stuff, meaning subdomains are mirror copies of content that is on main site, I can't upload files to subdomains, I can upload to main site and change www to cdn1 in address bar to show the same content through the CDN as is on my site.

    I have 2 questions:

    1. how do I remove the subdomains from Google index in GWT if it only allows to write anything that goes after `http://domain.com/ ?
    2. how do I prevent from bots indexing the pages on subdomains when I can't upload special robots.txt files or upload a google's verification files to them to prove my ownership in GWT?

    Maybe there is something else that I need to know related to this matter?

    UPDATE: text in bold is updated

    • PatomaS
      PatomaS about 10 years
      Are you only interested on robots.txt solutions? If so, the question will remain closed as duplicate, but if other solutions work for you, rephrase the question and flag it for moderation attention.
    • Admin
      Admin almost 8 years
      When you remove/block the domain //cdn.xxxx.xx in WMT you also remove all the indexed pictures. Is that what you want?
  • CamSpy
    CamSpy about 10 years
    @pantomas I want pages removed from index and not indexed in the future, not use of canonicalization.
  • CamSpy
    CamSpy about 10 years
    I updated the question with more detailed info, telling that content on subdomains is mirror of the main site for CDN purposes, I cannot edit pages on subdomains or upload anything to them
  • CamSpy
    CamSpy about 10 years
    how do I prove my ownership of the subdomains in GWT if I cannot upload the files or edit the pages in there (see my updated question please)
  • Pensai
    Pensai about 10 years
    If you have access the to domain name info then you can verify by dns.
  • CamSpy
    CamSpy about 10 years
    what do I add after Disallow: to have cdn1.xxx.com not allowed to be indexed in the robots.txt of the xxx.com domain?
  • closetnoc
    closetnoc about 10 years
    Actually, you would put the robots.txt in the sub-domain root directories. You cannot refer to sub-domains from the parent domain robots.txt file because search engines (and the web by default) sees these as separate sites.
  • closetnoc
    closetnoc about 10 years
    Wow. You did say that before didn't you? You got me. I am stumped. You may have to deal with Google and Bing through their respective Webmaster Tools. One question: If someone accesses a file on the sub-domain, does it appear in the parent domain access log files? If so, perhaps a blanket 301 redirect from the sub-domains to the parent domain in the .htaccess on the parent domain would do.
  • PatomaS
    PatomaS about 10 years
    @CamSpy That's what the 301 is for.
  • Matija Nalis
    Matija Nalis about 9 years
    do just the <meta name="robots" content="noindex">, but DO NOT followup with robots.txt (see support.google.com/webmasters/answer/93710). noindex will be quite enough, and robots.txt would disable it so the snippets could still be indexed and shown.
  • closetnoc
    closetnoc about 9 years
    @MatijaNalis I appreciate your comment. I reread my answer and while I tried to say once spidered, it is not entirely clear what I meant- meaning that the pages have been removed from the index- THEN- do the robots.txt thing. So your comment is very much appreciated. Thanks!!
  • buycanna.io
    buycanna.io almost 5 years
    It is advised to not use 301 redirects, but only 401 or 410 http status codes. I've experienced this recently on my own platform. My traffic is soaring! I believe it is because Google thinks you are "cloaking" pages, blackhat type SEO behavior is heavily penalized.
  • user3035649
    user3035649 about 2 years
    Note that it doesn't tell you this, but you must include either https:// or http:// before your domain name, otherwise it will give you a (seemingly) unrelated error that the domain doesn't match the property, or something along those line. And if you're submitting a removal request for a subdomain (like dev.example.com) then I think this can only be done by first making a separate property for this domain -- it can't be done on the property for your top level domain name of example.com.