How to convert a web page into a PDF?
Solution 1
I found wkhtmltopdf project, which seems to do the trick. It's command line, so there is a bit of a learning curve, but not too bad.
Specifically to convert a web page, open a command window in the directory where wkhtmltopdf
was installed and execute the following:
wkhtmltopdf.exe http://www.yourpage.com/index.htm c:\misc\cnn.pdf
The application has a ridiculous amount of options designed to tweak the output as needed, but the defaults give a pretty good result.
Solution 2
CutePDF writer uses GhostScript for text processing and then ps2pdf to create searchable PDFs of web pages. This will of course not work if the text on the page is an image to start out with.
Solution 3
What's the problem with Print to PDF solutions? I've two virtual printers installed in my system: PDF Creator & Virtual Printer from Adobe Acrobat X. Both work fine. I can search text in generated PDFs easily as long as my PDF viewer has OCR capabilities (which is common these days).
If you are thinking about creating something like text and word document, you can't. Its limitation of PDF format or says its how PDF format work. Due to this, we can embed fonts etc. in this format without any dependency to make it universal.
And, I don't think wkhtmltopdf project can generate a PDF file in which text can be searched without using OCR technology (because it'll violate PDF specification).
Solution 4
I use Adobe Acrobat 8 Professional (current version is Adobe Acrobat X). It has a menu option File... Create PDF... From Web Page...
which asks me for a URL, then it downloads the page at the URL as a PDF file, with searchable text. It will also convert pages linked to from that page, recursively. You can end up with many HTML pages in one multi-page PDF file, with inter-page links preserved.
For some web pages, Acrobat Create PDF gets the formatting wrong. In that case I fall back to the Adobe PDF 8.0
printer driver which Acrobat 8 Professional installed on my system. It is very good at giving me a PDF equivalent of the web page I'm looking at, with searchable text.
Adobe Acrobat 8 Professional is not free software. It's fully-priced proprietary software. However IMHO it deserves as much a place in every knowledge worker's computer as does Microsoft Office. And, you didn't specify that you insisted on a free software way to convert a Web page into searchable PDF.
Related videos on Youtube
gardenofwine
Updated on September 18, 2022Comments
-
gardenofwine almost 2 years
There are many ways to convert a web page to a PDF (online services, bookmarklets, Print to PDF solutions, etc...).
But none of these produce a searchable PDF. It seems like they all convert HTML into one gigantic image. Is there anyway to convert a Web Page into a searchable PDF?
-
sean christe over 12 yearsIf that has solved your problem then you should post an answer to that effect. Preferably with any details that might help someone else with the same question in the future. After a period of time you will be able to accept the answer then future people with this issue will have a nice clean Q&A that they can find.
-
gardenofwine over 12 years@EBGreen You are right. Done.
-
-
gardenofwine over 12 yearsI tried it - it does not create searchable PDFs
-
Cute Bear over 11 yearswhat if the webpage contains authenticated data like myspace / facebook? then this solution won't work
-
HelpingHand about 11 yearsPrimoPDF is the best program for turning webpages into PDF's
-
HelpingHand about 11 yearsThere is also a firefox addon that turns it into an image.
-
HelpingHand about 11 yearsThat program can also write on the image created.
-
HelpingHand about 11 yearsAnd blur certain sections.
-
HelpingHand about 11 yearsI cannot find the name of it though.
-
Aventinus over 4 yearsTested this in 2020 (Windows 10) on my personal webpage: The tool works, however, it prints the mobile version of the webpage instead of the normal one. It's unclear why.
-
gardenofwine over 4 years@Aventinus Try adding
--page-width 1200
to the command line. -
Aventinus over 4 years@AngryHacker Thanks, however it doesn't make a difference. I tried the command using several page widths and no matter the value the output is the same. This is the website I'm trying it on btw:
https://mittos.xyz/
-
gardenofwine over 4 years@Aventinus Likely the reason is that
wkhtmltopdf
is really old technology and the server that's providing the page thinks it can't handle complex stuff. That's why you are likely getting the mobile version. To confirm, try spoofing the user agent. For instance to pretend that you are requesting it from Firefox. By adding--custom-header 'User-Agent' 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) 'Gecko/20100101 Firefox/73.0'
-
Aventinus over 4 years@AngryHacker Thanks again, however, the result is the same :(