Google indexed my escaped_fragment pages

5,494

As precised in the Google specification, you could use 302 redirection (but not 301) when they call you with ?_escaped_fragment_= to provide them the content. The problem I could see in your implementation is that, maybe, your HTML files contains some links that are relative to the redirected page or that point directly to others HTML files. For example if you have an href in your HTML file that points to something like http://yourdomain.com/htmlFiles/1.html (or 1.html in a relative way), this page would be indexed by Google.

In short what I say is that, starting from your sitemap, Google should correctly index your pages, but when Google use the links provided by your HTML files, Google probably index them directly without making the relation with the original #! urls.

Here you have different solutions to fix your problem :

  1. Don't use redirection. In your case it seems unnecessary, you could directly send the HTML files to Google when they request them. Just remove the R flag in your Apache rewrite rule. Moreover, this solution will avoid unnecessary round trip between Google and your server due to the redirection mechanism. It will save you (and Google) bandwidth and CPU cycles.

  2. Verify that your HTML files have the correct links in the different href. If they all point to your #! version, it should work.

  3. Add on each of your HTML files a canonical link to your content in #! (http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394). This could help Google to know what the correct URL to index.

Note: Each of these solution should work individually, but you could also combine part of them and even all of them.

Share:
5,494

Related videos on Youtube

DMin
Author by

DMin

Updated on September 18, 2022

Comments

  • DMin
    DMin almost 2 years

    My site is a single page web-app. I am following the suggestions based on making AJAX applications crawl-able.

    My URL looks like this:

    http://domain.com/#!pages/contactUs
    

    My understanding is:

    http://domain.com/#!chair/12 goes to http://domain.com/?_escaped_fragment=chair/12

    As I am not using any server-side scripting on this project, I have created HTML pages with the application states and put them in a folder like so:

    http://domain.com/htmlFiles/1.html
    

    In Apache I have forwarded requests that include _escaped_fragment_= to the right html page:

    RewriteEngine on
    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=chair\/([\w]*)
    RewriteRule ^(.*)$ htmlFiles/%1.html? [R=302,L]
    

    The forwarding works correctly and the appropriate page shows up if the _escaped_fragment URL is used.

    The sitemap I submitted to Google looks like this:

    <url>
    <loc>http://domain.com/#!pages/contactUs</loc>
    <lastmod>2012-12-30</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
    </url>
    

    The problem now is this: my whole htmlFiles folder (http://domain.com/htmlFiles/1.html) with the HTML files is indexed in Google. These pages are there in the first place just to show Google what content my actual pages contain.

    My entire website works from

    http://domain.com/
    

    These pages should not be coming up in the search results. As they had said they will only index pretty URLs, but still, I am reluctant to have them remove these pages as I don't know if it's going to hamper something else.

    Could it be that 302 is not the right redirect and 301 should be used instead?

    Also is there something wrong with this redirect approach thing in the first place?

  • DMin
    DMin over 11 years
    Thanks, from what you've mentioned I think the R is causing an issue, cause the pages don't give any .html links -- Nevertheless I am doing a total rewrite so that the whole thing comes to an index.php and based on the escaped_fragment gives the content dynamically. Learning the hard way. SEO needs to be built into the web-app while making it, rather than something that gets slapped onto the site after making it. Cheers! :)