Convert Website to PDF (recursively)

8,667

Save a list of Web pages as PDF file

  • First install wkhtmltopdf conversion tool (this tool requires desktop environment; source):

    sudo apt install wkhtmltopdf 
    
  • Then create a file that contains a list of URLs of multiple target web pages (each on new line). Let's call this file url-list.txt and let's place it in ~/Downloads/PDF/. For example its content could be:

    https://askubuntu.com/users/721082/tarek
    https://askubuntu.com/users/566421/pa4080
    
  • And then run the next command, that will generate a PDF file for each site URL, located into the directory where the command is executed:

    while read i; do wkhtmltopdf "$i" "$(echo "$i" | sed -e 's/https\?:\/\///' -e 's/\//-/g' ).pdf"; done < ~/Downloads/PDF/url-list.txt
    

    The result of this command - executed within the directory ~/Downloads/PDF/ - is:

    ~/Downloads/PDF/$ ls -1 *.pdf
    askubuntu.com-users-566421-pa4080.pdf
    askubuntu.com-users-721082-tarek.pdf
    
  • Merge the output files by the next command, executed in the above directory (source):

    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged-output.pdf $(ls -1 *.pdf)
    

    The result is:

    ~/Downloads/PDF/$ ls -1 *.pdf
    askubuntu.com-users-566421-pa4080.pdf
    askubuntu.com-users-721082-tarek.pdf
    merged-output.pdf
    

Save an entire Website as PDF file

  • First we must create a file (url-list.txt) that contains URL map of the site. Run these commands (source):

    TARGET_SITE="https://www.yahoo.com/"
    wget --spider --force-html -r -l2 "$TARGET_SITE" 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\)$' > url-list.txt
    
  • Then we need go through the steps from the above section.

Create a script that will Save an entire Website as PDF file (recursively)

  • To automate the process we can bring all together in a script file.

  • Create an executable file, called site-to-pdf.sh:

    mkdir -p ~/Downloads/PDF/
    touch ~/Downloads/PDF/site-to-pdf.sh
    chmod +x ~/Downloads/PDF/site-to-pdf.sh
    nano ~/Downloads/PDF/site-to-pdf.sh
    
  • The script content is:

    #!/bin/sh
    TARGET_SITE="$1"
    wget --spider --force-html -r -l2 "$TARGET_SITE" 2>&1 | grep '^--' | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|txt\)$' > url-list.txt
    while read i; do wkhtmltopdf "$i" "$(echo "$i" | sed -e 's/https\?:\/\///' -e 's/\//-/g' ).pdf"; done < url-list.txt
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged-output.pdf $(ls -1 *.pdf)
    

    Copy the above content and in nano use: Shift+Insert for paste; Ctrl+O and Enter for save; Ctrl+X for exit.

  • Usage:

    enter image description here


The answer to the original question:

Convert multiple PHP files to one PDF (recursively)

  • First install the package enscript, which is a 'regular file to pdf' conversion tool:

    sudo apt update && sudo apt install enscript
    
  • Then run the next command, that will generate file called output.pdf, located into directory where the command is executed, which will contains the content of all php files within /path/to/folder/ and its sub-directories:

    find /path/to/folder/ -type f -name '*.php' -exec printf "\n\n{}\n\n" \; -exec cat "{}" \; | enscript -o - | ps2pdf - output.pdf
    
  • Example, from my system, that generated this file:

    find /var/www/wordpress/ -type f -name '*.php' -exec printf "\n\n{}\n\n" \; -exec cat "{}" \; | enscript -o - | ps2pdf - output.pdf
    
Share:
8,667
Tarek
Author by

Tarek

Updated on September 18, 2022

Comments

  • Tarek
    Tarek over 1 year

    Is there any way to convert a web page and its sub pages into one PDF file?

    • Zanna
      Zanna almost 7 years
      Please edit your question to add some details of exactly what you want. Your comments on pa4080's answer suggest you have some specific requirements that aren't clear from the question.
    • Tarek
      Tarek almost 7 years
      Sorry for my English, then I have php files that represent pages of a website, these files are grouped within various subdirectories, I would like to create a single pdf containing the text of all formatted files as if it were displayed In the browser.
  • Tarek
    Tarek almost 7 years
    To display the page as if it were html?
  • pa4080
    pa4080 almost 7 years
    @Tarek, please, be more specific. You mean not the PHP code but the result that you see into the web browser or the HTML output from the PHP code?
  • Tarek
    Tarek almost 7 years
    For example, if I download a php page "www .... com / index.php", how do I create a pdf from this view as in the browser and not in PHP code?
  • pa4080
    pa4080 almost 7 years
    @Tarek, you mean that you have saved a web page as and you want to convert it in PDF? If so, why not just save it as a PDF?
  • Tarek
    Tarek almost 7 years
    Because I need a recursive solution to use for entire sites...
  • pa4080
    pa4080 almost 7 years
    @Tarek, I've updated the answer with a way that allows you to save an entire website as pdf.
  • Tarek
    Tarek almost 7 years
    Perfect, just what I needed, thank you for the help. Congratulations you're great!