Does Google penalize daily updated <lastmod> tags in sitemaps if the data is not daily updated?

9,267

Solution 1

I've never heard anything about a penalty due to this. At worst you're wasting the spider's time, but that's part of why we have computers in the first place: doing tedious repetitive things. Still, you should ideally be addressing the issue.

This...

My solution would be to only change the entry only if the new imported product data differs from the previous data.

...is what you should be doing in the first place, regardless of external considerations like sitemaps, etc. If your content isn't different(and I would include deleting and replacing with identical information in that description), then your lastmod date shouldn't be modified. Here you're wasting your own resources. You haven't said how many products are involved, but there's going to be a point where this process is going to get slow and computationally expensive.

Solution 2

I've never liked the idea of updating <lastmod> every day as itt's not just wrong, it's misleading search engines.

In a post over on SO, Google's Gary Illyes wrote:

The lastmod tag is optional in sitmaps and in most of the cases it's ignored by search engines, because webmasters are doing a horrible job keeping it accurate.

I've generally advocated for either using <lastmod> correctly, or not at all. Leaving it off (as well as <changefreq> & <priority>) even makes the file itself smaller and quicker for search engines to read as well.

Solution 3

No it simply ignores the information you have provided when it is incorrect. In this case, web crawlers figure out by themselves how often they should crawl your pages.

Solution 4

I don't work for Google, and can't say for sure what they actually do, but the sensible way for them to treat <lastmod> timestamps would be as hints not to waste time re-crawling pages that haven't changed.

So if you report all your pages as changed every day, Googlebot will just keep crawling all your pages in whatever order it feels like, rather than only focusing on the pages that have changed. In effect, it's just as if you didn't report any last modification timestamps at all.

The main reason to provide correct <lastmod> timestamps is to make changes to your site show up faster in Google's index. If you have hundreds of pages on your site, it's going to take a while for Google to crawl them all and find any changes. However, if you tell Googlebot which pages have changed recently, it can crawl those pages first and avoid wasting so much time on the rest.

Of course, you could just bump up Googlebot's crawl rate in Webmaster Tools instead and hope for the best. But really, it shouldn't be too hard to make your update script preserve timestamps. For example, I assume you're currently doing something like this:

for each product do:
    write new page content into product page file;
end do;

If so, just change it to something like this instead:

for each product do:
    read old page content from product page file into string A;
    write new page content into string B;
    if A is not equal to B then:
        write string B into product page file;
    end if;
end do;

Solution 5

No. Google will use lastmod as a hint (same as all sitemap values) but if it decides that your content is not getting updated daily then it will simply ignore it and revisit your pages on its own schedule.

Share:
9,267

Related videos on Youtube

Elicit
Author by

Elicit

Updated on September 18, 2022

Comments

  • Elicit
    Elicit over 1 year

    I've got a sitemap that is generated daily with a lot of links to product pages. These products are imported daily from another data source. Because the update consists of throwing away all current product info and replacing it with the new imported info the last modified date always jumps one day. This is also used in the sitemap. Even for products that haven't changed. All product pages pretend to have been updated.

    Will Google penalize the website for pretending the pages have changed from day to day while they haven't?

    My solution would be to only change the entry only if the new imported product data differs from the previous data. I just want to make sure this is a useful upgrade to make, while I could also spend my time on other improvements.

  • Elicit
    Elicit over 12 years
    I totally agree. However, I'm dependable on another company that delivers the data. They always send every product (+200) in their data exports. So updating the lot seemed the best solution a few years ago. My client doesn't have the budget to solve this properly. These ex / imports happen at night, so the extra used resources are not a big problem at the moment.
  • Anonymous Penguin
    Anonymous Penguin almost 8 years
    @Elicit if you still have this issue, just store the data exports from the day before in their original, parseable format and do a git diff-style comparison to see what products have changed. Although it's nice, you don't need them to send you the changed products only; you should be able to figure it out yourself.
  • Victor Schröder
    Victor Schröder about 5 years
    The link is broken...