阅读背景:

AWS Lambda函数——将PDF转换为图像

来源:互联网 

I am developing application where user can upload some drawings in pdf format. Uploaded files are stored on S3. After uploading, files has to be converted to images. For this purpose I have created lambda function which downloads file from S3 to /tmp folder in lambda execution environment and then I call ‘convert’ command from imagemagick.

我正在开发应用程序,用户可以上传一些pdf格式的图纸。上传的文件存储在S3中。上传后,文件必须转换成图像。为此,我创建了lambda函数,该函数将文件从S3下载到lambda执行环境中的/tmp文件夹,然后从imagemagick调用“convert”命令。

convert sourceFile.pdf targetFile.png

把源文件。pdf targetFile.png

Lambda runtime environment is nodejs 4.3. Memory is set to 128MB, timeout 30 sec.

Lambda运行时环境是nodejs 4.3。内存被设置为128MB,超时30秒。

Now the problem is that some files are converted successfully while others are failing with the following error:

现在的问题是,一些文件被成功转换,而另一些文件却失败了,错误如下:

{ [Error: Command failed: /bin/sh -c convert /tmp/sourceFile.pdf /tmp/targetFile.png convert: %s' (%d) "gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/tmp/magick-QRH6nVLV--0000001" "-f/tmp/magick-B610L5uo" "-f/tmp/magick-tIe1MjeR" @ error/utility.c/SystemCommand/1890. convert: Postscript delegate failed/tmp/sourceFile.pdf': No such file or directory @ error/pdf.c/ReadPDFImage/678. convert: no images defined `/tmp/targetFile.png' @ error/convert.c/ConvertImageCommand/3046. ] killed: false, code: 1, signal: null, cmd: '/bin/sh -c convert /tmp/sourceFile.pdf /tmp/targetFile.png' }

{[错误:命令失败:/bin/sh -c转换/tmp/sourceFile。pdf / tmp / targetFile。png转换:% s(% d)" system -q -dQUIET - dsecure - dgalopause -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE= pngallotphabits =4 -dGraphicsAlphaBits=4 "- r72xmp "转换:Postscript委托/ tmp /源文件失败。没有这样的文件或目录@ error/pdf.c/ReadPDFImage/678。转换:没有定义的图像' /tmp/targetFile。png”@错误/ convert.c / ConvertImageCommand / 3046。] kill: false, code: 1, signal: null, cmd: '/bin/sh -c convert /tmp/sourceFile。pdf / tmp / targetFile。png”}

At first I did not understand why this happens, then I tried to convert problematic files on my local Ubuntu machine with the same command. This is the output from terminal:

起初我不明白为什么会发生这种情况,然后我尝试用相同的命令在我的本地Ubuntu机器上转换有问题的文件。这是终端的输出:

**** Warning: considering '0000000000 XXXXX n' as a free entry. **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Mac OS X 10.10.5 Quartz PDFContext <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification.

****警告:将“0000000000 XXXXX n”视为免费入口。****这个文件有被修复或忽略的错误。**** **** **** **** **** **** **** **** **** **** **** ****文件**** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** **** ****

So the message was very clear, but the file gets converted to png anyway. If I try to do convert source.pdf target.pdf and after that convert target.pdf image.png, file is repaired and converted without any errors. This doesn’t work with lambda.

所以消息很清楚,但是文件还是被转换为png。如果我尝试转换源文件。pdf的目标。pdf,然后转换目标。pdf格式的图像。文件被修复和转换,没有任何错误。这个对不成立。

Since the same thing works on one environment but not on the other, my best guess is that the version of Ghostscript is the problem. Installed version on AMI is 8.70. On my local machine Ghostsript version is 9.18.

由于同样的事情在一个环境中有效,而在另一个环境中无效,我的最佳猜测是Ghostscript的版本是问题所在。AMI上安装的版本是8.70。在我的本地机器Ghostsript版本是9.18。

My questions are:

我的问题是:

  • Is the version of ghostscript problem? Is this a bug with older version of ghostscript? If not, how can I tell ghostscript (with or without using imagemagick) to repair or ignore errors like it does on my local environment?
  • 幽灵脚本的版本有问题吗?这是一个旧版本的鬼脚本的bug吗?如果没有,我如何告诉ghostscript(使用imagemagick或不使用imagemagick)修复或忽略它在本地环境中的错误?
  • If the old version is a problem, is it possible to build ghostscript from source, create nodejs module and then use that version of ghostscript instead the one that is installed?
  • 如果旧版本是个问题,是否可以从源代码构建ghostscript,创建nodejs模块,然后使用这个版本的ghostscript,而不是安装的那个版本?
  • Is there an easier way to convert pdf to image without using imagemagick and ghostscript?
  • 有没有一种更简单的方法可以在不使用imagemagick和ghostscript的情况下将pdf转换成图像?

UPDATE Relevant part of lambda code:

更新lambda代码的相关部分:

var exec = require('child_process').exec;
var AWS = require('aws-sdk');
var fs = require('fs');
...

var localSourceFile = '/tmp/sourceFile.pdf';
var localTargetFile = '/tmp/targetFile.png';

var writeStream = fs.createWriteStream(localSourceFile);
writeStream.write(body);
writeStream.end();

writeStream.on('error', function (err) {
    console.log("Error writing data from s3 to tmp folder.");
    context.fail(err);
});

writeStream.on('finish', function () {
    var cmd = 'convert ' + localSourceFile + ' ' + localTargetFile;

    exec(cmd, function (err, stdout, stderr ) {

        if (err) {
            console.log("Error executing convert command.");
            context.fail(err);
        }

        if (stderr) {
            console.log("Command executed successfully but returned error.");
            context.fail(stderr);
        }else{
            //file converted successfully - do something...
        }
    });
});

2 个解决方案

#1


3  

You can find a compiled version of Ghostscript for Lambda in the following repository. You should add the files to the zip file that you are uploading as the source code to AWS Lambda.

您可以在下面的存储库中找到Lambda的已编译版本的Ghostscript。您应该将文件添加到要作为源代码上传到AWS Lambda中的zip文件中。

https://github.com/sina-masnadi/lambda-ghostscript

https://github.com/sina-masnadi/lambda-ghostscript

This is an npm package to call Ghostscript functions:

这是一个调用Ghostscript函数的npm包:

https://github.com/sina-masnadi/node-gs

https://github.com/sina-masnadi/node-gs

After copying the compiled Ghostscript files to your project and adding the npm package, you can use the executablePath('path to ghostscript') function to point the package to the compiled Ghostscript files that you added earlier.

在将已编译的Ghostscript文件复制到项目并添加npm包之后,您可以使用executablePath(“到Ghostscript的路径”)函数将该包指向之前添加的已编译的Ghostscript文件。

#2


0  

Its almost certainly a bug, or perhaps limitation, with the older version of Ghostscript.

对于旧版本的Ghostscript来说,这几乎肯定是一个缺陷,或者可能是限制。

Many PDF producers create PDF files which do not conform to the specification, and yet will open without complain in Adobe Acrobat. Ghostscript endeavours to do the same, but obviously we can't know what Acrobat is going to allow, so we are continually chasing this nebulous target. (FWIW that warning is a legitimate out-of-spec PDF file).

许多PDF生成程序创建的PDF文件不符合规范,但在adobeacrobat中可以毫无怨言地打开。Ghostscript试图做同样的事情,但显然我们不知道Acrobat会允许什么,所以我们一直在追逐这个模糊的目标。(FWIW,这个警告是一个合法的超出规格的PDF文件)。

There's nothing you can do with the old version other than replace it.

除了替换旧版本之外,您对旧版本无能为力。

Yes you can build Ghostscript from source, I have no idea about a nodejs module, not sure why that's relevant.

是的,您可以从源代码构建Ghostscript,我不知道nodejs模块,不知道为什么它是相关的。

There are numerous other applications which will render a PDF file, MuPDF is another one I know of. And, of course, you can use Ghostscript directly without using ImageMagick. Of course, if you can load another application, then you should simply be able to replace your Ghostscript installation too.

有许多其他的应用程序将呈现一个PDF文件,MuPDF是另一个我知道的。当然,您可以直接使用Ghostscript而不使用ImageMagick。当然,如果您可以加载另一个应用程序,那么您也应该能够替换Ghostscript安装。


分享到: