Want to throw 404 Errors when URL contains a certain string - Wordpress

11,226

Solution 1

This could be achived using robots.txt but since you're asking how to throw the 404 page manualy here it is :

<?php
if ( preg_match('/thisisnotwanted/i',$_SERVER["REQUEST_URI"]) ) {
    header("HTTP/1.0 404 Not Found - Archive Empty");
    require TEMPLATEPATH.'/404.php';
    exit;
}
get_header();
?>

This bit of code is just an example on how you can display a 404 page, and it shouldn't be used in "production", instead use robots.txt as Michiel Pater sugested .

Solution 2

If you want to disallow Google from indexing pages, you should add a robots.txt file to the root folder of your website.

You could then put something like this in the file:

User-agent: *
Disallow: /thisisnotwanted

I assume you want to disallow the page from all search engines, but if you only want to disallow Google, the you should change the first line to User-agent: Google.

You can tell Google explicitly to remove the links using Webmaster tools. It could take a few days before Google will accept your request and remove the pages from their index.

For more information, please visit this website:
The Web Robots Pages

Share:
11,226
Sameer
Author by

Sameer

Updated on June 04, 2022

Comments

  • Sameer
    Sameer almost 2 years

    I am managing a wordpress blog and want to throw a 404 error whenever the url contains a string pattern (example: if the url contains "thisisnotwanted"). I was thinking I will be able to add something to the htaccess file like: Redirect "thisisnotwanted" 404

    Can someone help? I just don't want Google to index pages with this parameter.

  • Poelinca Dorin
    Poelinca Dorin about 13 years
    get_bloginfo('url' / 'home') - Returns the 'Site address (URI)' set in Settings > General. This data is retrieved from the '????' record in the wp_options table. Consider using home_url() instead. And it's not geting the url requested but the home page url . Please be carefull before you post an answer .
  • Michiel Pater
    Michiel Pater about 13 years
    You are redirecting the user to 404 error page. This is bad practice. It is better to use a 404 header like @poelinca.
  • Sameer
    Sameer about 13 years
    Thanks for the answers folks. I already have this pattern in the robots.txt file. Somehow these pages are still showing up in the index. Since its tough to manually feed into Webmaster tools (for removing the urls as they are not in one directory) I was thinking I can just throw 404 and have them removed faster.
  • Sameer
    Sameer about 13 years
    Thanks folks for your answers. I do have that entry in robots.txt. But somehow google has not removed them from the index. I was wondering if throwing 404 would help get these 2000+ pages out from the the index that were crawled by google (due to some plugin I used). Thanks again for your suggestions. If you have alternate suggest please do let me know. Thanks.
  • Poelinca Dorin
    Poelinca Dorin about 13 years
    you still dind't got my point , get_bloginfo('url') or home_url() or get_blobinfo('home') all return the same link on every page, say i request www.example.com/category/post_name, home_url will return www.example.com and for testing the not whanted string i need category/post_name instead .
  • Michiel Pater
    Michiel Pater about 13 years
    @Sameer: That is correct. Google does not instantly remove the pages from their index. Well, that explains a lot. You should add this information to your answer. Try using the code of @poelinca too.
  • Danial
    Danial about 13 years
    @poelinca: I did get it. Just made a mistake thats all.
  • Poelinca Dorin
    Poelinca Dorin about 13 years
    Using this bit of code will show the 404 page eaven when a user requests the page, it doesn't tests if the page was requested by google or not, so since nobody could access the page/post it would be easyer to just make the page/post not public anymore from wp-admin . If you need to display the page to the regular user and not being indexed then you're only option is to use robots.txt and wait till google removes the pages/posts from their index, allso make shure the posts/pages are not inside the xml you're uploading in google webmaster tools .
  • huff
    huff about 13 years
    @Michiel: I don't know, but I don't see why it wouldn't work. Unless wordpress uses thisisnotwanted in a self-generated link (and that would render that link useless).
  • Michiel Pater
    Michiel Pater about 13 years
    I think Wordpress uses the RewriteEngine for it's own configuration too. If that is the case, then you will need to find a way to merge the two configurations. It is a good idea though.
  • huff
    huff about 13 years
    @Michiel: Oh! I don't see it in my installation (The WordPress 'section' in the .htaccess is empty) -I guess because I have the 'pretty permalinks' disabled. Well, I would try adding the rule after # END WordPress.
  • Sameer
    Sameer about 13 years
    what had happened was I used a comment plugin which has "replytocom" passed as a parameter when someone wanted to reply to anyone's comments. This link was also picked up by Goog. And all the pages that were on the site that had comments got replicated as many times as there were comments on the page. Eg: if my page was mysite.com/category/file1.html - and that has 20 comments. in addition to this post, i had 20 more files like mysite.com/category/file1.html?replytocom=1234, mysite.com/category/file1.html?replytocom=12345 etc etc.. where "12345" etc were the comments ids.