Facebook not able to scrape my url

22,780

Solution 1

The Facebook documentation includes details on the Open Graph Protocol and how to include the correct meta tags so that Facebook can scrape your URL accurately.

https://developers.facebook.com/docs/opengraphprotocol/

Essentially what you'll want to do is include some special og:tags instead (or in addition) to your existing meta tags.

  <head>
    <title>Ninja Site</title>
    <meta property="og:title" content="The Ninja"/>
    <meta property="og:type" content="movie"/>
    <meta property="og:url" content="http://www.nin.ja"/>
    <meta property="og:image" content="http://nin.ja/ninja.jpg"/>
    <meta property="og:site_name" content="Ninja"/>
    <meta property="fb:admins" content="USER_ID"/>
    <meta property="og:description"
          content="Superhuman or supernatural powers were often
                   associated with the ninja. Some legends include
                   flight, invisibility and shapeshifting..."/>
    ...
  </head>

If you have an .htaccess file redirecting things and making it difficult for Facebook to scrape your URL you might be able to get away with detecting Facebook's crawler with your .htaccess and feeding it the correct tags. I believe the the user agent that the Facebook crawler provides is this :

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

The documentation also has a section talking about making sure that their crawlers can access your site.

Depending on your configuration you can test this by looking at your servers access_log. On a UNIX system running apache, the access log is located at /var/log/httpd/access_log.

So you could use an entry similar to this in your .htaccess file -

RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit
RewriteRule ^(.*)$ ogtags.php?$1 [L,QSA]

The [L,QSA] flags that I placed there state that this is the L​ast rule that will be enforced on the current request (L) and the QSA (Query String Append) states that any query string given will be passed along when the URL is rewritten. For example, a URL such as :

https://example.com/?id=foo&action=bar

Will be passed to ogtags.php like this - ogtags.php?id=foo&action=bar. Your ogtags.php file will gave to generate dynamic og:meta tags according to the parameters that were passed.

Now whenever your .htaccess file detects the Facebook user agent, it will pass him the ogtags.php file (that can contain the correct og:meta information). Please be aware of any other rules you have in your .htaccess and how they might affect new rules.

From the .htaccess entries that you have detailed, I would recommend placing this new "Facebook rule" as the very first rule.

Solution 2

I had the same problem, which was: Bad Response Code: URL returned a bad HTTP response code.

but oddly this is what solved it: I've added

    <meta property="og:locale" content="en_US" />

to my site HEAD tag and it worked.

Also, not to forget, in your application dashboard (where you get your APP ID) you must have atleast "Website with Facebook Login" enabled and enter the URL of the website. Otherwise it won't work...regardless if you are not using any Facebook Logins on your site.

Share:
22,780
Ninja
Author by

Ninja

Software developer and entrepreneur.

Updated on August 04, 2022

Comments

  • Ninja
    Ninja almost 2 years

    I have the HTML structure for my page as given below. I have added all the meta og tags, but still facebook is not able to scrape any info from my site.

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml"  xmlns:fb="http://www.facebook.com/2008/fbml">
        <head>
                <meta http-equiv="Content-Type" content="text/html;" charset=utf-8"></meta>
                <title>My Site</title>
                <meta content="This is my title" property="og:title">
                <meta content="This is my description" property="og:description">
                <meta content="http://ia.media-imdb.com/images/rock.jpg" property="og:image">
                <meta content="<MYPAGEID>" property="fb:page_id">
                .......
        </head>
        <body>
        .....
    

    When I input the URL in facebook debugger(https://developers.facebook.com/tools/debug), I get the following messages:

    Scrape Information
    Response Code   404
    
    Critical Errors That Must Be Fixed
    Bad Response Code   URL returned a bad HTTP response code.
    
    
    Errors that must be fixed
    
    Missing Required Property   The 'og:url' property is required, but not present.
    Missing Required Property   The 'og:type' property is required, but not present.
    Missing Required Property   The 'og:title' property is required, but not present.
    
    
    Open Graph Warnings That Should Be Fixed
    Inferred Property   The 'og:url' property should be explicitly provided, even if a    value can be inferred from other tags.
    Inferred Property   The 'og:title' property should be explicitly provided, even if a value can be inferred from other tags.
    

    Why is facebook not reading the meta tags info? The page is accessible and not hidden behind login etc.

    UPDATE

    Ok I did bit of debugging and this is what I found. I have htaccess rule set in my directory- I am using PHP Codeigniter framework and have htaccess rule to remove index.php from the url.

    So, when I feed the url to facebook debugger(https://developers.facebook.com/tools/debug) without index.php, facebook shows a 404, but when I feed url with index.php, it is able to parse my page.

    Now how do I make facebook scrape content when the url doesn't have index.php?

    This is my htaccess rule:

    <IfModule mod_rewrite.c>
        RewriteEngine On
        RewriteBase /
    
        #Removes access to the system folder by users.
        #Additionally this will allow you to create a System.php controller,
        #previously this would not have been possible.
        #'system' can be replaced if you have renamed your system folder.
        RewriteCond %{REQUEST_URI} ^system.*
        RewriteRule ^(.*)$ /index.php?/$1 [L]
    
        #When your application folder isn't in the system folder
        #This snippet prevents user access to the application folder
        #Submitted by: Fabdrol
        #Rename 'application' to your applications folder name.
        RewriteCond %{REQUEST_URI} ^application.*
        RewriteRule ^(.*)$ /index.php?/$1 [L]
    
        #Checks to see if the user is attempting to access a valid file,
        #such as an image or css document, if this isn't true it sends the
        #request to index.php
        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule ^(.*)$ index.php?/$1 [L]
    </IfModule>
    
    <IfModule !mod_rewrite.c>
        # If we don't have mod_rewrite installed, all 404's
        # can be sent to index.php, and everything works as normal.
        # Submitted by: ElliotHaughin
    
        ErrorDocument 404 /index.php
    </IfModule>
    
  • Ninja
    Ninja about 12 years
    Hi Lix, thanks a lot for the update. I have an issue though- in the rewrite rule, you have mentioned that I load ogtags.html but the meta tags will have dynamic content, based on the page that is being requested. I can't give a static html page there. I tried replacing ogtags.html with this rule: RewriteRule ^(.*)$ index.php?/$1 [L] but didn't help. Any thoughts on how to achieve this?
  • sergio
    sergio almost 11 years
    @Lix: do you have any idea why I get a 500 error from the facebook debugger tool when I use your two rules? thanks in advance...
  • Lix
    Lix almost 11 years
    Hey there @ser - Have you checked your server logs for requests from Facebook that are being denied? I just added this link to my answer here, it might be useful to you too.
  • sergio
    sergio almost 11 years
    @Lix: thank you very much for your reply! strange thing is: facebook debug tool can access mysite.dom/ogtags.php but for mysite.dom it returns 500... from the server logs I get 206 for mysite.dom/ogtags.php and 500 for all URIs within ogtags.php (e.g., og:image)... I see now that there could be an infinite recursion going on...
  • DS9
    DS9 almost 10 years
    @Lix : I have the same problem.Here
  • Lix
    Lix almost 10 years
    @DS9 - in the future - please don't attempt to contact users in this manner. You are getting views on your post (and even an "answer"). You left this comment only 10mins after you answered it... Try exercise a little patience next time.
  • DS9
    DS9 almost 10 years
    OK..no problem:). Thanks for advice. but actually i am facing this problem for hours.I am trying to find a solution but I failed, this is the reason i am try to contact you. and in your profile you write : Feel free to leave me a comment! :)
  • Lix
    Lix almost 10 years
    @DS9 - yea... leave me a comment - but that's not what you did. You wanted to redirect my attention to your new question that was not relevant to that post at all... Comments are not meant to be used for instant messaging.