Another website is mirroring and ranks above my site in search results

9,952

Solution 1

If they're just mirroring your site by feeding your site through a proxy script or regurgitating your HTML verbatum, you can add canonical URLs to your pages. This will let Google know your content is the original source and to show your URL in the search results, not their's.

Submit a DMCA request to Google. They're a little slow with them but they will ultimately remove those pages from their index.

Disavowing the links is a smart move.

I don't know if blocking the users is helpful though. Maybe putting a message at the top of your pages for them letting them know you are the original site and the other is a fraud might be a better solution.

Solution 2

You can file a DMCA complaint and if you are in the U.S., you can file a copyright civil law suit.

Here is a link to a short answer that explains how the DMCA complaint can help anyone:

Do you have to be in the United States to file a DMCA complaint?

... and another one the explains more...

How much of your content needs to be copied before you can file a DMCA complaint?

If you are in the U.S., you can hire a lawyer that is familiar with copyright issues and have them send a cease and desist letter. Give them 10 days (real days and not business days though business days would be fine too) to remove the content. You want to capture snap-shots of the offending site for your evidence and snap-shots of your site as well. If you check to see if a page has been removed, check the site directly and not search results.

If the page has not been removed within the time period, you can file a federal civil case that will take at least $10,000 to defend and hardly cost you anything. You will be in the drivers seat. It is likely that a settlement can be had for at least $10,000 and possibly more. You can get your costs back too. What is important is to offer a no-cost option to rectify the problem hence the cease and desist letter. After that, you are clear to file a case without responsibility to the defendant.

One other note is that you will need to demonstrate damages if you go to court. Loss of search traffic is damages. Here you will work with your attorney to collect metrics that illustrate loss of traffic and you will need to monetize the value of that traffic. Of course you can assume higher numbers here even with a %100 conversion rate. Just in case, I would be collecting metrics on traffic loss using Google Analytics and your log file analysis software today and into the future.

Please know that filing a case is not difficult or very expensive especially compared to the damages you are experiencing now and into the future. Copyright infringements have been going down lately, but those few that violate copyright are much bolder these days. We need to stop these people and the only real way is to put a cost factor into the business strategy that makes copyright infringement non-profitable.

Solution 3

You could track their IP (or IPs) and return totally different content for them to mirror - whatever you like. This way you get free space for advertising whatever, and you can use their high position in Google to your advantage.

I once used this to simply explain to the users on the mirrored website that this is the wrong domain. You can also post a simple HTTP redirect header.

Solution 4

A little late for you but best idea to protect your website (in the future) would be this: https://www.youtube.com/watch?v=I3pNLB3Cq24 (defcon 21, defense by numbers) faking the return code so users will see the content but bots will

  • throw the content away
  • crawl in circles
  • stop working

other possible ideas - make sure that your users don't see any of this:

  • let them save GB of information (while there are only a few kb on your server)
  • make the bots flood their own memory with fake links
  • send fake content (100% boolsh*t - you need to write stuff - like "Obama pregnant", "Spider-Man 5 - next summer", ... so your thieves can host it...)
  • send fake files (like 42.zip, if they do not check the copied content their users will have fun --> AV tools will show that something is wrong --> users will be p*ss*d...)
  • let them wait for more data (file size = 1-10 MB and send random cr*p with 1 Byte/s or less)

other ideas:

  • links protected by Javascript (old, no use anymore? but if they remain unchanged users will be send to you (for a while))
  • dynamic garbage (use comments or invisble items to make the bots download stuff users can't see - good bots won't fall for this)
  • block ip adresses that download too much / too fast / the wrong way (bots do not behave like humans 1) every link on each page 2) there is a pattern or total chaos in the way they choose the next link(s))
  • use Javascript to redirect to your server if the files are not hosted by your server (no help against theft but the thieves have to remove it or their users won't stay on their page - you could code it into different routines (like content decryption))

Solution 5

This is called a Google Proxy Hack, and it happened to me as well.

First things first:

  • Submit a DMCA complaint to the Web Host. Use this link to create a correctly formed complaint, and send it to the host's support or abuse email. If the host is in the US, they must take down the site. Even if they are not based in the US, they may choose to take down the site anyway. (That happened to me once.)
  • Use Google DMCA tool to request the mirrored URL's be removed from its search results.
  • Use Google's Scraper Report to report the failure in Google's algorithm.

Fundamentally though, this a failure on Google's part. For all that they say about ranking being based on "quality original content", this is an absurdly simple counter-example that quite frankly is just embarrassing.

Hopefully if enough people complain about it, eventually Google will get its act together and write the 10 lines of code it takes to check that a site is an exact mirror of a previously established site.

Also, be aware that using canonical URLs do not always work in this instance. Many of these proxy scripts change the canonical URLs to point to the mirror site, thus rendering them useless.

Finally, be aware that they may have also spammed your main site with garbage links in order to damage your rankings. (This happened to me as well.)

If you do some searching and creative thinking, there are some ways to fight back. I really don't think it's a good idea to publish a complete list here because that just makes the hackers' lives easier.

Share:
9,952

Related videos on Youtube

Marlboro Goodluck
Author by

Marlboro Goodluck

Updated on September 18, 2022

Comments

  • Marlboro Goodluck
    Marlboro Goodluck over 1 year

    There is a site of ill-repute known as thedirty which has completely mirrored my site and now has links appearing on Google at the #1 spot using my content. I checked my log files and noticed that this site has been crawling mine for sometime, and also has 10,000 links from their site to mine.

    I have blocked user access which is referred from this site and reported them as web spam to Google already. I also disavowed the domain.

    How are they getting top links in Google (even overtaking mine) for such nefarious tactics? What are the steps to completely eliminating an issue such as this?

    UPDATE 8/28/2014:

    I thought I would provide an update on this as I have more information now. So thedirty pointed their subdomains to my ip which had the effect of making their subdomains look like my website.

    For a couple of days this didn't matter much because using htaccess, I redirected all HOSTS not of my domain, back to my domain which basically meant I was getting their subdomains traffic links on Google. After a couple of days thedirty changed their subdomains to point back to their website so that I no longer benefit from this.

    So the whole point being they used my content to get top ranks on Google, and are now pointing those links back to their website to drive more traffic to theirs.

    It is a dirty tactic by a dirty website. My hope is that Google punishes such behavior.

    • John Conde
      John Conde over 9 years
      I edited out the part where you question their motive as that is off-topic here. But good question otherwise.
    • Bobby Tables
      Bobby Tables over 9 years
      Website cloning seems to be a new trick, a lot of websites are misused at the moment. There are news at Heise (german) about this topic. The usual solution (aside from reporting the fake-site) seems to be, to feed special content to the IP-adresses of the crawlers, so they will for example show a link to your real site.
    • Marlboro Goodluck
      Marlboro Goodluck over 9 years
      Another worry for the future -- now that Google has taken such a harsh stance against webspam -- is that competitors will purposely post my content on sites of bad reputation to hurt my reputation. Or will post spammy looking blogs pointing to my site without my knowledge.
    • user3055004
      user3055004 over 9 years
      @Jarrod Roberson: not really, anybody knows about a law suit, but how many of a tehnical solution?!
    • Admin
      Admin over 9 years
      @machineaddict - you miss my point, there is no technical solution to this, it is purely a legal problem.
    • thanby
      thanby over 9 years
      @JarrodRoberson Yet it is a problem faces exclusively by webmasters, and is therefore very on-topic for this site because asking it here will get responses from people who've also had to deal with it. It also seems people have submitted several technical solutions alongside the legal ones.
    • Admin
      Admin over 9 years
      its done, they have the content, there is no technical solution to change that! only legal recourse now!
    • Gabe Spradlin
      Gabe Spradlin over 9 years
      @JarrodRoberson Where is the thedaily located? If it is in a different country from the Op's then good luck with an legal solution. From a tech standpoint, I use my .htaccess to prevent just anything - not a browser or Google - from seeing my site. Obviously that doesn't help much now.
    • Admin
      Admin over 9 years
      @gabe - how blocking anyone from accessing your site a solution if it is supposed to be accessed by everyone? that makes absolutely no sense and is completely illogical.
    • Gabe Spradlin
      Gabe Spradlin over 9 years
      @Jarrod - You misunderstand my suggestion. Most websites derive no benefit from most of the bots that visit the site. Those bots only tie up resources. It is petty straight forward to block all (or at least most) bots while allowing google, bing, and yahoo. You can also leave browser based visitors unaffected. Browser based scraping is possible but much more resource intensive for the scraper, at least in my experience.
    • Admin
      Admin over 9 years
      You can't block an anonymous bot that imitates a real user in a real browser, I know I have written them, even proxying ssl transparently, done correctly they are completely transparent and virtually impossible to detect without sophisticated heuristics on the web server side, I know I have written those also.
  • closetnoc
    closetnoc over 9 years
    I have my own code that I still need to tune some that blocks spiders. I will be looking into your ideas because that is the kind of guy I am! ;-) Great tips!
  • David Mulder
    David Mulder over 9 years
    The entire problem you're sidestepping is that discovering who is behind the site is virtually impossible. I mean, they would've to be idiots to make it easy to trace the site back to an actual individual.
  • closetnoc
    closetnoc over 9 years
    @DavidMulder No sidestepping. A lawyer can subpoena the companies for the information they need. Even a kind letter is enough. If the information is not provided, then the attorney can require a deposition in court before a judge with a penalty of jail time if they do not show up, or provide the information requested. In the U.S., there is no hiding from the law- civil or otherwise. This still works internationally with some exception.
  • trlkly
    trlkly over 9 years
    The problem with blocking spiders and bots is that you probably don't want all of them blocked. Google's is pretty important, for example, if you want people to be able to find your website. (And since Google has your site in its cache, a web crawler doesn't actually need to crawl your site to duplicate it).
  • OJFord
    OJFord over 9 years
    If you're going to deliberately give them something alternate - I prefer Igor's answer of making it beneficial to you (redirect/say it's wrong/host ads) rather than fighting back.
  • Lakshmi Reddy
    Lakshmi Reddy over 9 years
    I actually find this pretty funny for some reason. +1
  • Marcks Thomas
    Marcks Thomas over 9 years
    @closetnoc: The company may not want to provide the information freely, or even be at liberty to do so. It is no certainty that a court will issue a subpoena or that the trail will remain in its jurisdiction. You may find yourself in a very costly and lengthy legal battle with a third party who might not even know the offender's real name. Judging by your thorough answer, you are undoubtatly aware of these obstacles, but I have to side with David Mulder: I think you're understating how difficult it is to trace the site back to an individual.
  • David Mulder
    David Mulder over 9 years
    @closetnoc: Yeah, except the hosting provider doesn't have the correct information. And the payment was probably done with a prepaid creditcard or some other prepaid card, a stolen creditcard, a bitcoin or some other untraceable transaction mechanism. Oh, and the hosting provider might not even be in the US in the first place. It's called the internet, like it or not.
  • closetnoc
    closetnoc over 9 years
    @DavidMulder I appreciate what you are saying. I am in the security business especially in the area of research on how to find the bad guys. Mostly of what you are talking about would be Chinese, Russian, or Polish. Still there are ways of determining who these people are through patterns and such. They give themselves away. This is specifically what I do. You have to try. You cannot just roll-over. A good Internet lawyer know people like me and how to get information. One thread is all I need and I usually get it. But it can be a real effort. But that is worst case scenario stuff.
  • closetnoc
    closetnoc over 9 years
    @DavidMulder One thing you are missing is that most of the worst blatant abusers of copyright are not only traceable, but companies and individuals that are easy to track down and bring to justice. Rarely are foreign sites stealing whole sites but more in the business of spamdexing foreign search engines with snippets of scraped content. That is what I have found.
  • Igor R
    Igor R over 9 years
    can also use meta tags and javascript redirects, one of the three will almost surely work. in any case, this is not a stable solution and will work only until they find out and start working against it. @Mehrdad, I guess it's funny because it's hacky :)
  • Ángel
    Ángel over 9 years
    If there is a bot copying the content, he could simply include authorship information in the page, so the copied content would contain something like "This was created by Foo, all rights reserved" which makes a really clear case (you could eg. hide that in acrostic, but being a bot, it will work in plain text, and make your case stronger).
  • Igor R
    Igor R over 9 years
    it also occured to me it's funny because the attacker is actually opening a vulnerability on himself, by letting the victim to his (attackers) playground, even if he can stop it at any time. which makes the attacker look pretty stupid.
  • Gabe Spradlin
    Gabe Spradlin over 9 years
    How does a site make money off the scraped content? Typically there is a product or ads on the page. And usually you can track the person through those ads and such. Or at least inform the ad network of their abuse.
  • closetnoc
    closetnoc over 9 years
    @gabe The scraped content is used to out perform the content owners site and display ads. It is that simple. Sometimes it is a combination of several sites content selected for particular keywords and designed to perform extremely well. There is software for this. As for ad networks, the network does not know scraped content from original. Since the junk pages are designed for high profit and high volume keywords, this is a very profitable business hence why people will do anything to get away of it.
  • Gabe Spradlin
    Gabe Spradlin over 9 years
    @closetnoc I only posed it as a question because I didn't know about this site in particular. I understand why they do it. There was discussion about not being able to determine the culprit. My point was that the ads can be used to track the culprit. If they are advertising with someone reputable they may get dropped for scraping your content. It will hurt them in the pocket book even if you can't get to them directly. Admittedly some add networks won't care so long as the revenue keeps coming in.
  • closetnoc
    closetnoc over 9 years
    @gabe Oh. I get it! Excellent point!! Seriously. I missed it entirely. I am not sure if filing a DMCA (with Google) and having an entry in Chilling Effects (a co-op of search engines, hosts, and other entities) would at least shut down the advertising by shutting down the site. I would assume Adsense should stop, but I am not sure if there is any history on tracking a culprit through ad revenue. It would make sense of course. Bad guys always give themselves away somehow- they have to. If is a law of nature in profiting from cheating. Thanks for the note! I was a bit slow yesterday.
  • Gabe Spradlin
    Gabe Spradlin over 9 years
    @closetnoc I always wonder how effective legal recourse is really going to be. The majority of the world does not live in the US or cate at all about DCMAs. The hosting company might, but only if theyes are based in the US and possibly a select few countries. And ultimately cutting of the money will get results a lot faster than you could probably get a court date.
  • closetnoc
    closetnoc over 9 years
    @gabe I do not know. In the U.S. it is very effective and I suspect also true in Europe and other countries. That aside, think of it this way, I subpoena and file a law suit in the U.S. and require you to appear. You do not. I get a default judgement and present the judgement and request a lien against property and income. What government is going to care enough to stand in the way? None. Getting the lien enforced could potentially be difficult to enforce, however, if you are a company, then it becomes easier to enforce. I place a lien on any property and income within any cooperating country.
  • closetnoc
    closetnoc over 9 years
    @gabe I know that copyright is enforceable internationally, however, it does take time. Why do I know this? Because my mother worked for one of the worlds largest law firms that represented corporations, and governments. Copyright and international copyright with trademark infringement was a major part of her work.
  • Hirohito  Kuroumaru
    Hirohito Kuroumaru over 9 years
    Canonical URL's do not always help. The script that mirrored my site changed the canonical URL's to point to the fake site as well; so it was pointless.
  • Hirohito  Kuroumaru
    Hirohito Kuroumaru over 9 years
    Their high position in Google is replacing the high position of the original website, so it's not really "free advertising".