How to convert a web page into a PDF?

7,232

Solution 1

I found wkhtmltopdf project, which seems to do the trick. It's command line, so there is a bit of a learning curve, but not too bad.

Specifically to convert a web page, open a command window in the directory where wkhtmltopdf was installed and execute the following:

wkhtmltopdf.exe http://www.yourpage.com/index.htm c:\misc\cnn.pdf

The application has a ridiculous amount of options designed to tweak the output as needed, but the defaults give a pretty good result.

Solution 2

CutePDF writer uses GhostScript for text processing and then ps2pdf to create searchable PDFs of web pages. This will of course not work if the text on the page is an image to start out with.

Solution 3

What's the problem with Print to PDF solutions? I've two virtual printers installed in my system: PDF Creator & Virtual Printer from Adobe Acrobat X. Both work fine. I can search text in generated PDFs easily as long as my PDF viewer has OCR capabilities (which is common these days).
If you are thinking about creating something like text and word document, you can't. Its limitation of PDF format or says its how PDF format work. Due to this, we can embed fonts etc. in this format without any dependency to make it universal.
And, I don't think wkhtmltopdf project can generate a PDF file in which text can be searched without using OCR technology (because it'll violate PDF specification).

Solution 4

I use Adobe Acrobat 8 Professional (current version is Adobe Acrobat X). It has a menu option File... Create PDF... From Web Page... which asks me for a URL, then it downloads the page at the URL as a PDF file, with searchable text. It will also convert pages linked to from that page, recursively. You can end up with many HTML pages in one multi-page PDF file, with inter-page links preserved.

For some web pages, Acrobat Create PDF gets the formatting wrong. In that case I fall back to the Adobe PDF 8.0 printer driver which Acrobat 8 Professional installed on my system. It is very good at giving me a PDF equivalent of the web page I'm looking at, with searchable text.

Adobe Acrobat 8 Professional is not free software. It's fully-priced proprietary software. However IMHO it deserves as much a place in every knowledge worker's computer as does Microsoft Office. And, you didn't specify that you insisted on a free software way to convert a Web page into searchable PDF.

Share:
7,232

Related videos on Youtube

gardenofwine
Author by

gardenofwine

Updated on September 18, 2022

Comments

  • gardenofwine
    gardenofwine almost 2 years

    There are many ways to convert a web page to a PDF (online services, bookmarklets, Print to PDF solutions, etc...).

    But none of these produce a searchable PDF. It seems like they all convert HTML into one gigantic image. Is there anyway to convert a Web Page into a searchable PDF?

    • sean christe
      sean christe over 12 years
      If that has solved your problem then you should post an answer to that effect. Preferably with any details that might help someone else with the same question in the future. After a period of time you will be able to accept the answer then future people with this issue will have a nice clean Q&A that they can find.
    • gardenofwine
      gardenofwine over 12 years
      @EBGreen You are right. Done.
  • gardenofwine
    gardenofwine over 12 years
    I tried it - it does not create searchable PDFs
  • Cute Bear
    Cute Bear over 11 years
    what if the webpage contains authenticated data like myspace / facebook? then this solution won't work
  • HelpingHand
    HelpingHand about 11 years
    PrimoPDF is the best program for turning webpages into PDF's
  • HelpingHand
    HelpingHand about 11 years
    There is also a firefox addon that turns it into an image.
  • HelpingHand
    HelpingHand about 11 years
    That program can also write on the image created.
  • HelpingHand
    HelpingHand about 11 years
    And blur certain sections.
  • HelpingHand
    HelpingHand about 11 years
    I cannot find the name of it though.
  • Aventinus
    Aventinus over 4 years
    Tested this in 2020 (Windows 10) on my personal webpage: The tool works, however, it prints the mobile version of the webpage instead of the normal one. It's unclear why.
  • gardenofwine
    gardenofwine over 4 years
    @Aventinus Try adding --page-width 1200 to the command line.
  • Aventinus
    Aventinus over 4 years
    @AngryHacker Thanks, however it doesn't make a difference. I tried the command using several page widths and no matter the value the output is the same. This is the website I'm trying it on btw: https://mittos.xyz/
  • gardenofwine
    gardenofwine over 4 years
    @Aventinus Likely the reason is that wkhtmltopdf is really old technology and the server that's providing the page thinks it can't handle complex stuff. That's why you are likely getting the mobile version. To confirm, try spoofing the user agent. For instance to pretend that you are requesting it from Firefox. By adding --custom-header 'User-Agent' 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) 'Gecko/20100101 Firefox/73.0'
  • Aventinus
    Aventinus over 4 years
    @AngryHacker Thanks again, however, the result is the same :(