AWS Lambda function - convert PDF to Image

11,305

You can find a compiled version of Ghostscript for Lambda in the following repository. You should add the files to the zip file that you are uploading as the source code to AWS Lambda.

https://github.com/sina-masnadi/lambda-ghostscript

This is an npm package to call Ghostscript functions:

https://github.com/sina-masnadi/node-gs

After copying the compiled Ghostscript files to your project and adding the npm package, you can use the executablePath('path to ghostscript') function to point the package to the compiled Ghostscript files that you added earlier.

Share:
11,305
A.Z.
Author by

A.Z.

Software Developer

Updated on June 04, 2022

Comments

  • A.Z.
    A.Z. almost 2 years

    I am developing application where user can upload some drawings in pdf format. Uploaded files are stored on S3. After uploading, files has to be converted to images. For this purpose I have created lambda function which downloads file from S3 to /tmp folder in lambda execution environment and then I call ‘convert’ command from imagemagick.

    convert sourceFile.pdf targetFile.png

    Lambda runtime environment is nodejs 4.3. Memory is set to 128MB, timeout 30 sec.

    Now the problem is that some files are converted successfully while others are failing with the following error:

    { [Error: Command failed: /bin/sh -c convert /tmp/sourceFile.pdf /tmp/targetFile.png convert: %s' (%d) "gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/tmp/magick-QRH6nVLV--0000001" "-f/tmp/magick-B610L5uo" "-f/tmp/magick-tIe1MjeR" @ error/utility.c/SystemCommand/1890. convert: Postscript delegate failed/tmp/sourceFile.pdf': No such file or directory @ error/pdf.c/ReadPDFImage/678. convert: no images defined `/tmp/targetFile.png' @ error/convert.c/ConvertImageCommand/3046. ] killed: false, code: 1, signal: null, cmd: '/bin/sh -c convert /tmp/sourceFile.pdf /tmp/targetFile.png' }

    At first I did not understand why this happens, then I tried to convert problematic files on my local Ubuntu machine with the same command. This is the output from terminal:

    **** Warning: considering '0000000000 XXXXX n' as a free entry. **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Mac OS X 10.10.5 Quartz PDFContext <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification.

    So the message was very clear, but the file gets converted to png anyway. If I try to do convert source.pdf target.pdf and after that convert target.pdf image.png, file is repaired and converted without any errors. This doesn’t work with lambda.

    Since the same thing works on one environment but not on the other, my best guess is that the version of Ghostscript is the problem. Installed version on AMI is 8.70. On my local machine Ghostsript version is 9.18.

    My questions are:

    • Is the version of ghostscript problem? Is this a bug with older version of ghostscript? If not, how can I tell ghostscript (with or without using imagemagick) to repair or ignore errors like it does on my local environment?
    • If the old version is a problem, is it possible to build ghostscript from source, create nodejs module and then use that version of ghostscript instead the one that is installed?
    • Is there an easier way to convert pdf to image without using imagemagick and ghostscript?

    UPDATE Relevant part of lambda code:

    var exec = require('child_process').exec;
    var AWS = require('aws-sdk');
    var fs = require('fs');
    ...
    
    var localSourceFile = '/tmp/sourceFile.pdf';
    var localTargetFile = '/tmp/targetFile.png';
    
    var writeStream = fs.createWriteStream(localSourceFile);
    writeStream.write(body);
    writeStream.end();
    
    writeStream.on('error', function (err) {
        console.log("Error writing data from s3 to tmp folder.");
        context.fail(err);
    });
    
    writeStream.on('finish', function () {
        var cmd = 'convert ' + localSourceFile + ' ' + localTargetFile;
    
        exec(cmd, function (err, stdout, stderr ) {
    
            if (err) {
                console.log("Error executing convert command.");
                context.fail(err);
            }
    
            if (stderr) {
                console.log("Command executed successfully but returned error.");
                context.fail(stderr);
            }else{
                //file converted successfully - do something...
            }
        });
    });
    
  • A.Z.
    A.Z. over 7 years
    Thank you very much for your answer Ken. Unfortunately I cannot replace GS installation. This is specifically AWS thing. You can only execute code and use packages that comes preinstalled. Everything else you have to deploy with your application including npm modules. I can probably create EC2 instance and install version that I need, but that would be an overkill to use it just to convert pdf files I guess. building from source and node module is relevant because I would be able to deploy compiled libraries with my code and use it instead of installed one.
  • KenS
    KenS over 7 years
    Be aware of the fact that Ghostscript is AGPL-licenced. If you build Ghostscript into an application, you'll need to be sure you are complying with the licence.
  • A.Z.
    A.Z. over 7 years
    Thank you for noticing that. Obtaining commercial license would not be a problem if I have to include specific version of GS with my lambda function.
  • Alexyu
    Alexyu over 7 years
    We have solved some flakiness in ImageMagick by switching to GraphicsMagic instead, which can be used by the 'gm' module in Node. We had to compile it ourselves on Amazon Linux for deployment. But you could try it locally first to see if it works on your PDFs. Just offering that as another possible path.
  • A.Z.
    A.Z. over 7 years
    @ToddPrice thank you for your contribution, your comment points how it should be done. The thing is 'gm' node module is still using ghostscript. I thought that issues with pdf files are somehow connected with 'gm' node module, this is why I choose to generate convert command by myself. I was wrong of course. Did you compile ghostscript with your node modules, or you just delivered node module ('gs' in this particular case) with your code and use the existing ghostscript installation?
  • A.Z.
    A.Z. over 7 years
    From GM readme: >GraphicsMagick requires Ghostscript software (version 9.04 recommended) to read the Postscript or the Portable Document Format (PDF).
  • Alexyu
    Alexyu over 7 years
    @A.Z. a colleague actually did the compiling work but he doesn't recall having to wrestle with ghostscript in our case. It could be that the PDF document errors wouldn't be resolved by GraphicsMagick alone. Have you tried creating a simple node.js script locally that uses the 'gm' module with GraphicsMagick to do the task that fails in Lambda?
  • A.Z.
    A.Z. over 7 years
    @ToddPrice Yes I did try it locally and it works fine. I am sure that the problem is with the old version of gs that is installed on lambda environment, I was just hopping that calling GS with some 'special' params could solve the issue. Old version of GS cannot repair pdf errors that newer version can. I have to find alternative solution or wait for amazon to update AMI on lambda.
  • A.Z.
    A.Z. almost 7 years
    Hi Sina, thank you for your answer. This is actually quite useful but I did some workaround to resolve my issue long time ago. I will give it a try and let you know how it works. However as @KenS noted this kind of distribution may not comply with free Ghostscript license if source code of target lambda function is not open sourced. artifex.com/licensing
  • Sina Masnadi
    Sina Masnadi almost 7 years
    @A.Z. Thanks for your comment about licensing! Do you know any library that doesn't need licensing?
  • Alvaro Artano
    Alvaro Artano almost 6 years
    @A.Z. Could you explain what workaround you found for this problem. We are facing the same issue now.
  • A.Raza
    A.Raza over 4 years
    Using this package, I am getting timeout issues. It keeps on trying to run the command at the end CloudWatch logs show timeout error.
  • A.Raza
    A.Raza over 4 years
    @SinaMasnadi any help? I am getting timeout issues if a write the whole in single .option() function. And if I write every option in separate function then it gives the following error in the logs 2019-08-06T09:40:45.192Z 6a66a9e2-3d6e-4812-8bb8-fa8ecfc630c‌​d stdout: GPL Ghostscript 9.20 (2016-09-26) Copyright (C) 2016 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. Error: /undefinedfilename in (| grep "Page" | wc -l 2>/dev/null)
  • A.Raza
    A.Raza over 4 years
    Actually this is the normal command we used to run before the Lambda AMI update. gs -dNODISPLAY -dBATCH -dNOPAUSE -o /dev/null /tmp/abc.pdf | grep "Page" | wc -l 2>/dev/null We were using GS to count the number of pages in the PDF file. Please help it is damaging the business as we are not able to process our PDF files.