PhantomJS: exported PDF to stdout

12,037

Solution 1

As pointed out by Niko you can use renderBase64() to render the web page to an image buffer and return the result as a base64-encoded string.
But for now this will only work for PNG, JPEG and GIF.

To write something from a phantomjs script to stdout just use the filesystem API.

I use something like this for images :

var base64image = page.renderBase64('PNG');
var fs = require("fs");
fs.write("/dev/stdout", base64image, "w");

I don't know if the PDF format for renderBase64() will be in a future version of phanthomjs but as a workaround something along these lines may work for you:

page.render(output);
var fs = require("fs");
var pdf = fs.read(output);
fs.write("/dev/stdout", pdf, "w");
fs.remove(output);

Where output is the path to the pdf file.

Solution 2

You can output directly to stdout without a need for a temporary file.

page.render('/dev/stdout', { format: 'pdf' });

See here for history on when this was added.

If you want to get HTML from stdin and output the PDF to stdout, see here

Solution 3

Sorry for the extremely long answer; I have a feeling that I'll need to refer to this method several dozen times in my life, so I'll write "one answer to rule them all". I'll first babble a little about files, file descriptors, (named) pipes, and output redirection, and then answer your question.


Consider this simple C99 program:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[])
{

  if (argc < 2) {
    printf("Usage: %s file_name\n", argv[0]);
    return 1;
  }

  FILE* file = fopen(argv[1], "w");
  if (!file) {
    printf("No such file: %s\n", argv[1]);
    return 2;
  }

  fprintf(file, "some text...");

  fclose(file); 

  return 0;
}

Very straightforward. It takes an argument (a file name) and prints some text into it. Couldn't be any simpler.


Compile it with clang write_to_file.c -o write_to_file.o or gcc write_to_file.c -o write_to_file.o.

Now, run ./write_to_file.o some_file (which prints into some_file). Then run cat some_file. The result, as expected, is some text...

Now let's get more fancy. Type (./write_to_file.o /dev/stdout) > some_file in the terminal. We're asking the program to write to its standard output (instead of a regular file), and then we're redirecting that stdout to some_file (using > some_file). We could've used any of the following to achieve this:

  • (./write_to_file.o /dev/stdout) > some_file, which means "use stdout"

  • (./write_to_file.o /dev/stderr) 2> some_file, which means "use stderr, and redirect it using 2>"

  • (./write_to_file.o /dev/fd/2) 2> some_file, which is the same as above; stderr is the third file descriptor assigned to Unix processes by default (after stdin and stdout)

  • (./write_to_file.o /dev/fd/5) 5> some_file, which means "use your sixth file descriptor, and redirect it to some_file"

In case it's not clear, we're using a Unix pipe instead of an actual file (everything is a file in Unix after all). We can do all sort of fancy things with this pipe: write it to a file, or write it to a named pipe and share it between different processes.


Now, let's create a named pipe:

mkfifo my_pipe

If you type ls -l now, you'll see:

total 32
prw-r--r--  1 pooriaazimi  staff     0 Jul 15 09:12 my_pipe
-rw-r--r--  1 pooriaazimi  staff   336 Jul 15 08:29 write_to_file.c
-rwxr-xr-x  1 pooriaazimi  staff  8832 Jul 15 08:34 write_to_file.o

Note the p at the beginning of second line. It means that my_pipe is a (named) pipe.

Now, let's specify what we want to do with our pipe:

gzip -c < my_pipe > out.gz &

It means: gzip what I put inside my_pipe and write the results in out.gz. The & at the end asks the shell to run this command in the background. You'll get something like [1] 10449 and the control gets back to the terminal.

Then, simply redirect the output of our C program to this pipe:

(./write_to_file.o /dev/fd/5) 5> my_pipe

Or

./write_to_file.o my_pipe

You'll get

[1]+  Done                    gzip -c < my_pipe > out.gz

which means the gzip command has finished.

Now, do another ls -l:

total 40
prw-r--r--  1 pooriaazimi  staff     0 Jul 15 09:14 my_pipe
-rw-r--r--  1 pooriaazimi  staff    32 Jul 15 09:14 out.gz
-rw-r--r--  1 pooriaazimi  staff   336 Jul 15 08:29 write_to_file.c
-rwxr-xr-x  1 pooriaazimi  staff  8832 Jul 15 08:34 write_to_file.o

We've successfully gziped our text!

Execute gzip -d out.gz to decompress this gziped file. It will be deleted and a new file (out) will be created. cat out gets us:

some text...

which is what we expected.

Don't forget to remove the pipe with rm my_pipe!


Now back to PhantomJS.

This is a simple PhantomJS script (render.coffee, written in CoffeeScript) that takes two arguments: a URL and a file name. It loads the URL, renders it and writes it to the given file name:

system = require 'system'

renderUrlToFile = (url, file, callback) ->
  page = require('webpage').create()
  page.viewportSize = { width: 1024, height : 800 }
  page.settings.userAgent = 'Phantom.js bot'

  page.open url, (status) ->
    if status isnt 'success'
      console.log "Unable to render '#{url}'"
    else
      page.render file

    delete page
    callback url, file


url         = system.args[1]
file_name   = system.args[2]

console.log "Will render to #{file_name}"
renderUrlToFile "http://#{url}", file_name, (url, file) ->
  console.log "Rendered '#{url}' to '#{file}'"
  phantom.exit()

Now type phantomjs render.coffee news.ycombinator.com hn.png in the terminal to render Hacker News front page into file hn.png. It works as expected. So does phantomjs render.coffee news.ycombinator.com hn.pdf.

Let's repeat what we did earlier with our C program:

(phantomjs render.coffee news.ycombinator.com /dev/fd/5) 5> hn.pdf

It doesn't work... :( Why? Because, as stated on PhantomJS's manual:

render(fileName)

Renders the web page to an image buffer and save it as the specified file.

Currently the output format is automatically set based on the file extension. Supported formats are PNG, JPEG, and PDF.

It fails, simply because neither /dev/fd/2 nor /dev/stdout end in .PNG, etc.

But no fear, named pipes can help you!

Create another named pipe, but this time use the extension .pdf:

mkfifo my_pipe.pdf

Now, tell it to simply cat its inout to hn.pdf:

cat < my_pipe.pdf > hn.pdf &

Then run:

phantomjs render.coffee news.ycombinator.com my_pipe.pdf 

And behold the beautiful hn.pdf!

Obviously you want to do something more sophisticated that just cating the output, but I'm sure it's clear now what you should do :)


TL;DR:

  1. Create a named pipe, using ".pdf" file extension (so it fools PhantomJS to think it's a PDF file):

    mkfifo my_pipe.pdf
    
  2. Do whatever you want to do with the contents of the file, like:

    cat < my_pipe.pdf > hn.pdf
    

    which simply cats it to hn.pdf

  3. In PhantomJS, render to this file/pipe.

  4. Later on, you should remove the pipe:

    rm my_pipe.pdf
    

Solution 4

I don't know if it would address your problem, but you may also check the new renderBase64() method added to PhantomJS 1.6: https://github.com/ariya/phantomjs/blob/master/src/webpage.cpp#L623

Unfortunately, the feature is not documented on the wiki yet :/

Share:
12,037
user2161301
Author by

user2161301

Curiosity. Amoeba-shaped. Currently working on Moqups.

Updated on July 19, 2022

Comments

  • user2161301
    user2161301 almost 2 years

    Is there a way to trigger the PDF export feature in PhantomJS without specifying an output file with the .pdf extension? We'd like to use stdout to output the PDF.

  • philfreo
    philfreo almost 11 years
    In order to get this method to work, I had to change the fs.read line to var pdf = fs.open(output, 'rb').read(); -- reading the file in binary form was important (otherwise redirecting stdout to a file led to an incorrect PDF). However I was later able to get this to work with no temporary file at all - see stackoverflow.com/a/17282463/137067
  • poof
    poof almost 10 years
    Works nicely, but if you are reading stdout in Node such as described here npmjs.org/package/phantomjs you need to set the execFile options for binary and probably increase the buffer size as described here stackoverflow.com/a/6170723/2297380
  • Rick Mohr
    Rick Mohr about 9 years
    Here's documentation from the PhantomJS wiki. But for me this method does not respect page.viewportSize settings.
  • guidoman
    guidoman almost 9 years
    For some reason it works on Mac OS X, but it doesn't work on linux (PhantomJS version 1.9.8).
  • gabrielAnzaldo
    gabrielAnzaldo over 7 years
    even adding the options: {encoding: 'binary', maxBuffer: 5000*1024} this does not arrive to the stdout in node
  • gabrielAnzaldo
    gabrielAnzaldo over 7 years
    For me did not work in windows 7 and phantomjs-prebuilt 2.1.13