Google indexed my escaped_fragment pages
As precised in the Google specification, you could use 302 redirection (but not 301) when they call you with ?_escaped_fragment_= to provide them the content.
The problem I could see in your implementation is that, maybe, your HTML files contains some links that are relative to the redirected page or that point directly to others HTML files. For example if you have an href in your HTML file that points to something like http://yourdomain.com/htmlFiles/1.html
(or 1.html in a relative way), this page would be indexed by Google.
In short what I say is that, starting from your sitemap, Google should correctly index your pages, but when Google use the links provided by your HTML files, Google probably index them directly without making the relation with the original #! urls.
Here you have different solutions to fix your problem :
Don't use redirection. In your case it seems unnecessary, you could directly send the HTML files to Google when they request them. Just remove the R flag in your Apache rewrite rule. Moreover, this solution will avoid unnecessary round trip between Google and your server due to the redirection mechanism. It will save you (and Google) bandwidth and CPU cycles.
Verify that your HTML files have the correct links in the different href. If they all point to your #! version, it should work.
Add on each of your HTML files a canonical link to your content in #! (http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394). This could help Google to know what the correct URL to index.
Note: Each of these solution should work individually, but you could also combine part of them and even all of them.
Related videos on Youtube
DMin
Updated on September 18, 2022Comments
-
DMin almost 2 years
My site is a single page web-app. I am following the suggestions based on making AJAX applications crawl-able.
My URL looks like this:
http://domain.com/#!pages/contactUs
My understanding is:
http://domain.com/#!chair/12
goes tohttp://domain.com/?_escaped_fragment=chair/12
As I am not using any server-side scripting on this project, I have created HTML pages with the application states and put them in a folder like so:
http://domain.com/htmlFiles/1.html
In Apache I have forwarded requests that include
_escaped_fragment_=
to the right html page:RewriteEngine on RewriteCond %{QUERY_STRING} ^_escaped_fragment_=chair\/([\w]*) RewriteRule ^(.*)$ htmlFiles/%1.html? [R=302,L]
The forwarding works correctly and the appropriate page shows up if the
_escaped_fragment
URL is used.The sitemap I submitted to Google looks like this:
<url> <loc>http://domain.com/#!pages/contactUs</loc> <lastmod>2012-12-30</lastmod> <changefreq>weekly</changefreq> <priority>0.8</priority> </url>
The problem now is this: my whole
htmlFiles
folder (http://domain.com/htmlFiles/1.html
) with the HTML files is indexed in Google. These pages are there in the first place just to show Google what content my actual pages contain.My entire website works from
http://domain.com/
These pages should not be coming up in the search results. As they had said they will only index pretty URLs, but still, I am reluctant to have them remove these pages as I don't know if it's going to hamper something else.
Could it be that 302 is not the right redirect and 301 should be used instead?
Also is there something wrong with this redirect approach thing in the first place?
-
DMin over 11 yearsThanks, from what you've mentioned I think the R is causing an issue, cause the pages don't give any .html links -- Nevertheless I am doing a total rewrite so that the whole thing comes to an index.php and based on the escaped_fragment gives the content dynamically. Learning the hard way. SEO needs to be built into the web-app while making it, rather than something that gets slapped onto the site after making it. Cheers! :)