Is there an XML sitemap generator with command line interface for nginx on Linux?

9,273

Did you trying Google'ing this? First result on the first page:

https://code.google.com/p/sitemap-generators/wiki/SitemapGenerators

Edit:

As per the comments, I tried out the following sitemap generator:

http://sitemap-generators.googlecode.com/svn/trunk/docs/en/sitemap-generator.html

The downloaded zip bundle contains a few files:

drwxr-xr-x  19 user  group     646 Apr 10 05:22 .
drwxr-xr-x   3 user  group     102 Apr 10 05:12 ..
-r--r-----@  1 user  group      23 Jun 16  2005 AUTHORS
-r--r-----@  1 user  group    1791 Jun 16  2005 COPYING
-r--r--r--@  1 user  group    2267 Dec  5  2005 ChangeLog
-rw-r--r--@  1 user  group     258 Dec  5  2005 PKG-INFO
-r--r--r--@  1 user  group    1111 Dec  5  2005 README
drwxr-xr-x   3 user  group     102 Apr 10 05:16 build
-r--r--r--@  1 user  group    5662 Sep  7  2005 example_config.xml
-r--r-----@  1 user  group     996 Jun 16  2005 example_urllist.txt
-r-xr-xr-x@  1 user  group     317 Dec  5  2005 setup.py
-r-xr-xr-x@  1 user  group   73063 Dec  5  2005 sitemap_gen.py
-r-xr-xr-x@  1 user  group   28551 Sep  7  2005 test_sitemap_gen.py

Using the provided example_config.xml, I modified it in the following manner:

<?xml version="1.0" encoding="UTF-8"?>

<site
  base_url="http://YOURDOMAIN.com/"
  store_into="/var/www/sitemap_gen-1.4/sitemap.xml"
  verbose="1"
  >

  <url  href="http://YOURDOMAIN.com/stats?q=name"  />
  <url
     href="http://YOURDOMAIN.com/stats?q=age"
     lastmod="2004-11-14T01:00:00-07:00"
     changefreq="yearly"
     priority="0.3"
  />


  <urllist  path="urllist.txt"  encoding="UTF-8"  />

  <!-- Exclude URLs that end with a '~'   (IE: emacs backup files)      -->
  <filter  action="drop"  type="wildcard"  pattern="*~"           />

  <!-- Exclude URLs within UNIX-style hidden files or directories       -->
  <filter  action="drop"  type="regexp"    pattern="/\.[^/]*"     />

</site>

I think that serves as the template for generating the sitemap.xml. Now, the generator supports pulling URLS from apache style access logs or pulling from a url list file. I opted to pull from a url list file, since I was testing from my laptop.

To generate the url list, I employed 'wget' to spider the site:

wget -mk --spider -r -l2 http://YOURDOMAIN.COM/

or

wget -mk --spider -r -l2 http://YOURDOMAIN.COM/ -o urlinfolist.txt

-r: recursive; -l2: depth (if not set, depth = unlimited). See wget manual page.

Then extracted the URLS from the wget-log that is generated:

cat wget-log | tr ' ' '\012' | grep "^http" | egrep -vi "[?]|[.]jpg$" | sort -u > urllist.txt

or

cat urlinfolist.txt | tr ' ' '\012' | grep "^http" | egrep -vi "[?]|[.]jpg$" | sort -u > urllist.txt

Note: Some of the exclusions I had in my line were not needed because the config file either already excluded them or could very easily exclude them.

Then, ran the generator:

python sitemap_gen.py --config=example_config.xml 

Which produced the sitemap.xml file.

The script looks to be designed to run in an automated fashion. But it worked for my test run. The wget can take a while to run. However, if you don't have any special rewrites/etc, you can just scan your site's static content path with a 'find' and maybe do some filtering on it before dumping it into the url list file.

Share:
9,273

Related videos on Youtube

Minifyre
Author by

Minifyre

Updated on September 18, 2022

Comments

  • Minifyre
    Minifyre almost 2 years

    I'm looking for an XML sitemap generator that can be triggered from the command line, supports nginx, and works on Linux (Debian). What can you recommend?

  • Minifyre
    Minifyre about 11 years
    @ Wing Tang Wong: Yes I did. I've also tried this generator out, but it doesn't work for me.
  • Wing Tang Wong
    Wing Tang Wong about 11 years
    Hmm.. interesting, I tried the Google Python sitemap generator and it worked for me. I'll update my Answer with how I got it to work for me.
  • Minifyre
    Minifyre about 11 years
    It works now! Thank you very much for the explanation and a great example!