How to deal with ContentNotFoundError when using wkhtmltopdf?

24,883

Solution 1

Unfortunately wkhtmltopdf doesn't handle downloading of complex websites, because it's uses Qt/QtWebKit library which seems to have some issues.

One problem is that wkhtmltopdf doesn't support relative addresses (GitHub: #1634, #1886, #2359, QTBUG-46240) such as:

<img src="/images/filetypes/txt.png">
<script src="//cdn.optimizely.com/js/653710485.js">

and it loads them as local. One solution which I've found to this is the correcting html file in-place by ex in-place editor:

ex -V1 page.html <<-EOF
  %s,'//,'http://,ge 
  %s,"//,"http://,ge 
  %s,'/,'http://www.example.com/,ge
  %s,"/,"http://www.example.com/,ge
  wq " Update changes and quit.
EOF

However it won't work for files which have these type of URLs on the remote.

Another problem is that it doesn't handle missing resources. You can try to specify --load-error-handling ignore, but in most cases it doesn't work (see #2051), so this is still outstanding. Workaround is to simply remove these invalid resources, before conversion.

Alternatively to wkhtmltopdf, you can use either htmldoc, PhantomJS with some additional script, for example using rasterize.js:

phantomjs rasterize.js http://example.com/

or dompdf (HTML to PDF converter for PHP, you can install by composer) with sample code below:

<?php
// somewhere early in your project's loading, require the Composer autoloader
// see: http://getcomposer.org/doc/00-intro.md
$HOMEDIR = "/Users/foo";
require $HOMEDIR . '/.composer/vendor/autoload.php';

// disable DOMPDF's internal autoloader if you are using Composer
define('DOMPDF_ENABLE_AUTOLOAD', FALSE);
define('DOMPDF_ENABLE_REMOTE', TRUE);

// include DOMPDF's default configuration
require_once $HOMEDIR . '/.composer/vendor/dompdf/dompdf/dompdf_config.inc.php';

$htmlString = file_get_contents("https://example.com/foo.pdf");

$dompdf = new DOMPDF();
$dompdf->load_html($htmlString);
$dompdf->render();
$dompdf->stream("sample.pdf");

Solution 2

my problem was solved removing @font-face from css.

Share:
24,883
Murali Mopuru
Author by

Murali Mopuru

Updated on May 29, 2020

Comments

  • Murali Mopuru
    Murali Mopuru almost 4 years

    Can someone tell me how to resolve following issues?

    1. wkhtmltopdf don't have option to pass proxy info (-p or --proxy) unlike in previous versions and its not using system $http_proxy and $https_proxy env variable too.

    2. wkhtmltopdf not working with HTTPS/SSL even though i set LD_LIBRARY_PATH for libssl.so and libcrypto.so

      [deploy@localhost ~]$ wkhtmltopdf https://www.google.co.in google.pdf
      loaded the Generic plugin 
      Loading page (1/2)
      Error: Failed loading page https://www.google.co.in (sometimes it will work just to ignore this error with --load-error-handling ignore)
      Exit with code 1 due to network error: UnknownNetworkError
      

      and

      [deploy@localhost ~]$ wkhtmltoimage https://www.google.co.in sample.jpg
      loaded the Generic plugin 
      Loading page (1/2)
      Error: Failed loading page https://www.google.co.in (sometimes it will work just to ignore this error with --load-error-handling ignore)
      Exit with code 1 due to network error: UnknownNetworkError
      
    3. wkhtmltopdf working partially with HTTP. The output pdf files missing some content/background/positions.

      [deploy@localhost ~]$ wkhtmltopdf http://localhost:8880/ sample.pdf
      loaded the Generic plugin 
      Loading page (1/2)
      Printing pages (2/2)                                               
      Done                                                           
      Exit with code 1 due to network error: ContentNotFoundError
      
      [deploy@localhost ~]$ wkhtmltoimage http://localhost:8880/ sample.jpg
      loaded the Generic plugin 
      Loading page (1/2)
      Rendering (2/2)                                                    
      Done                                                               
      Exit with code 1 due to network error: ContentNotFoundError
      

    Note: Im using wkhtmltopdf-0.12.1-1.fc20.x86_64 and qt-4.8.6-10.fc20.x86_64

  • Murali Mopuru
    Murali Mopuru almost 9 years
    It went the same way you mentioned here, I too got to know relative resource paths, broken links etc after little more work on wkhtmltopdf. I fixed my problem with phantomjs scripts.
  • shox
    shox almost 4 years
    Sorry I don't quite follow what you are suggesting as an answer. Maybe a code example of your solution?