Edit *existing* PDF in a browser

45,182

Solution 1

Quick answer - no and it is quite unlikely you will find a cross-browser solution. It is very unlikely that you will find a PDF-perfect solution. Better to think about having the users edit HTML and generate the PDF at the server.

[Edit June 29th 2021- given this question is from 2017 you may think it is outdated and discount it. Well, as far as I am aware the answer is still relevant and every other week someone passes through and gives it an up-vote. But if you do find a good lib or util on your travels please come back and list it. Thanks.]

Long answer - the PDF format is both brilliant and fiendish at the same time. Brilliant because of its portability, but fiendish because of the internal structure and storage mechanisms. There is no friendly 'DOM' like with HTML. If we were starting out afresh to develop a portable document format it would not be PDF that we would choose. But PDF currently has too much momentum to be thrown away, period.

Younger viewers might be wondering how the hell this manic format got into its market leading position and where it came from. Well, when the founding fathers of PDF were laying down the design, before XML, JSON, HTML and even the Internet, they weren't working with today's document sharing in mind. They were working on a better way to encode printing instructions - the PostScript printer driver concept. These were never expected to be edited before the printer consumed them, and they were worthless for any other purpose. Then someone noticed the you could interpret the PostScript drawing instructions to a screen, and subsequently someone spotted the fantastic potential to employ this as a transportable, cross device display concept. And here we are.

Back to the question - to edit a PDF in any meaningful GUI way, you would need to unpack the PDF and render the components (images, formatted text, pages) to the display device; then allow folks to mess with the layout; then re-pack the PDF. You would have to do this perfectly in line with the PDF standards otherwise you may find the downstream consumers of your edited PDF file crash or are unable to render it. You would have to cater for the various Acrobat standard levels, and the shortcuts and bloats that the editing package (Word, Illustrator, InDesign) vendors chuck into the PDF file; layers, thumbnails, etc.

Then we come to colors. Have a read of the PDF spec and you will see that there are an array of colorspace options that the original PDF producer can decide to use. You would have to interpret these to a reasonable device color on the screen and back, etc.

And then fonts. Fonts might be embedded subset, or not. To keep fidelity with the PDF you will need to realise the glyphs as vector graphics on your drawing surface at the scale defined in the PDF. This mostly means utilising some kind of platform-dependant type library - tricky cross-platform. Plus the fact that you will need to licence the fonts for appropriate use which can be pricey for the fonts most people want to use to look hip and professional.

Given the layering, scaling and rotating features in PDF, you would likely be looking at an html canvas as the drawing surface. Anyone who knows will tell you that in the world of canvas you are pretty much on your own for word-processing type functions.

Not impossible but hard.

Components that render PDF to a display are largely acting as print drivers, slavishly obeying the PDF drawing instructions, and usually generating a raster or sometimes an SVG graphic. This is a one-way street - they read and draw, but there is no sense of 'handles' to the objects drawn. No handles means no manipulation, and these guys certainly have little intention of letting you modify and write back.

You will find many 'save to pdf' products. When client-side they will be leaning toward grabbing a set of pixels and dumping a raster graphic into a file with the thinnest veneer of 'PDF' definition wrapped around it. Where they are server based then they can be quite powerful - there are plenty of tools like Aspose, and ABCPDF that truly offer some PDF wrangling server side - but this is not what you are looking for in your OP.

Summary - very complicated subject. If anything ever emerges as a potential it will likely have many constraints in terms of the PDF features covered and thus restrictions on what it can safely edit.

If you are looking for online editing of documents that are ultimately exported as PDF, then a way forward is to keep an html version of the document source and have the user edit this with TinyMCE, CKEditor, etc, then use one of the server-side tools to take the saved source HTML and render out to PDF. Tools like ABCPDF render HTML faithfully let you add images, headers and footers, page numbers, etc.

This is a pragmatic answer to your (assumed) need, though it still has some trade-offs in terms of the font (licencing) issues, clunkiness of browser-based editors, all-round weirdness of the HTML laid down by some HTML editing components, etc. But it IS viable.

Final thoughts - rethink the scope of what you need. If HTML editing and convert to PDF at server is usable for you it is a well-trodden path and you will find both free and commercial components for client and server to support it.

Edit: If you need to annotate the PDF then things are much easier. On the server, you need to generate images of the pages of the document, send those to the client, display them to the user, let the user mark them up, capture the co-ordinates of the annotations back to the server and use a server-side PDF library to render the annotations into the PDF. It is achievable, though requires various skillsets for server-side PDF to image manipulation and client side presentation and annotation capture.

Edit: Readers may be interested in knowing if the picture I painted above has changed. As of Jan 2019 I stand by what I wrote. Suppliers are coming to the market with better tools and libraries that can do more than previously. However you still need to assess your needs and confirm their restrictions - it is likely that there will be some. No vendor I am aware of yet has a client-side, cross-browser, cross-device, full capability PDF editing lib for any PDF file - there is always some limitation. But I am happy to be corrected.

Solution 2

For future reference:

I found two libraries, that enable you to edit existing PDFs in the browser to certain extends. The second one isn't documented yet, so I don't know exactly what it does. It might be the solution for such a problem in the future.

Solution 3

Community:

Commercial:

Solution 4

Because other SO questions are being directed here, and considering how fast web technology advances (e.g. WASM), I am providing the following answer. Though PDFNetJS was able to do all this when the question was originally asked.

Since the requirement of "edit" was clarified to be "Basically what is needed is for users to open up a previously uploaded PDF, highlight or circle sections, and then save those annotations to the PDF back on the server." and "No text editing or manipulation of the document contant needs to happen.", then yes this is possible completely in any modern browser on any modern device.

PDFTron PDFNet SDK can do all this. A full fledged, out of the box document viewer is provided, with full annotation support. It is also possible to actually edit the PDF (change/replace text, redact, extract/add/replace images, and more). Not only are PDF files supported directly client side, but so are DOCX, PPTX, XLSX, PNG and JPG. Files can be loaded locally or remotely, and there is no need for slow base64 encoding/decoding.

Demo: http://www.pdftron.com/webviewer

Samples: http://www.pdftron.com/documentation/web/samples/universal-samples

The original question was also for support for Siebel and "PDFNetJS tries to retrieve a .mem file, which is some binary data. This cannot be served by the application I'm using (Siebel) so it doesn't look like this is an option.".

The .mem file is for PNaCl which is Chrome only, and this can be disabled. PDFTron for Web supports WASM and even emscripten, one of which, if not both, should then be compatible with Siebel.

Share:
45,182
neilsimp1
Author by

neilsimp1

Updated on June 29, 2021

Comments

  • neilsimp1
    neilsimp1 almost 3 years

    I have a web application that is currently getting a base64 representation of a PDF from the server. I'm able to use Mozilla's pdf.js to display this on a <canvas> and toggle through the pages with a dropdown.

    According to everything I've been able to find and Can Mozilla's pdf.js modify PDFs?, it's not possible to edit the PDF with pdf.js.

    I've found jsPDF and while I'm able to take the canvas and do a .toDataURL() with it for each page and build a new PDF document with it, but there are two issues:

    1. The newly generated PDF will just be a series of images on each page, so any text in the original PDF will just be an image after I'm done with it.
    2. I generate a new PDF with jsPDF and then send the base64 of it back to pdf.js to display it on the canvas. Something happens between these steps where the images of the pages get scaled incorrectly, so each page takes up about 3/4 of the canvas after each new PDF change. I've been unable to get it to retain the same size/scale.

    jsPDF doesn't look like it has a way to load an existing PDF, it only creates new ones. pdfmake and PDFKit also look like they only create new PDF files.


    So my question:

    Is there anything that will allow for both viewing a pdf (from base64) and for making changes to it? Ideally I'd watch for changes to the canvas, then draw that change onto the pdf page. When done, export that to a base64 string to send back to the server.

  • neilsimp1
    neilsimp1 almost 7 years
    Basically what is needed is for users to open up a previously uploaded PDF, highlight or circle sections, and then save those annotations to the PDF back on the server. Due to the setup of the application, there's really nothing server side I can do besides send and receive the file's base64.My hopes were to take the PDF and draw an image of the highlights onto it. No text editing or manipulation of the document contant needs to happen.
  • neilsimp1
    neilsimp1 almost 7 years
    Thank you for such an in-depth response, though. I going to see if we can't get the requirements here changed. If I don't find another answer to this soon I'll mark your answer as correct.
  • tklives
    tklives over 6 years
    Howdy! With regards to obtaining the coordinates of an 'annotation' box drawn 'over' a PDF (likely in a separate overlaying canvas), do you have any suggestions on how to accurately determine the PDF X,Y coords of the start of a drawn box (top left) along with the height/width of said box? I don't need to rewrite these to the PDF, just need to be able to get them and store them. Thanks in advance! :)
  • Vanquished Wombat
    Vanquished Wombat over 6 years
    @TimKelly That is a broad question. What tech domain are you working in? C#, php, etc?
  • tklives
    tklives over 6 years
    @VanquishedWombat - So, we don't need to actually add an "annotation" per se, we more so need to know the coords of a box drawn over a PDF, and I had read a bunch of posts claiming that the X,Y system was different in PDF than in a browser - but I'm just getting to develop that part of the app so it'll be interesting to find out! Basically, we're displaying a user uploaded PDF in the browser, and using Fabric.js, allowing the user to draw boxes over the PDF - with those coords/dimensions, we will later snip images out of the PDF. I'll post back if I run into specific issues. Thanks again!
  • Vanquished Wombat
    Vanquished Wombat over 6 years
    @TimKelly Those posts you saw are largely correct but your library should provide some reasonable means of translating between them.Interested to know how you get along. Have fun.
  • Vanquished Wombat
    Vanquished Wombat almost 6 years
    Hi Ryan. WASM looks interesting as a generic technology - do you know what the adoption growth curve is like? And PDFTron looks powerful too - what are its limitations? Is it possible to capture the annotation details outside of the PDF being edited - so you could store in a history DB?
  • Ryan
    Ryan almost 6 years
    @VanquishedWombat "do you know what the adoption growth curve is like?" Not enough at the moment, we still use PNaCl on Chrome (its still offers better performance over Chrome WASM), and emscripten/asm.js as final backup for IE11 (which is still popular for our typical customer). iOS11 and Chrome 67+ for Android. However if Firefox implementation of WASM is any indication the future is bright indeed. Startup times are really fast, and performance great.
  • Ryan
    Ryan almost 6 years
    @VanquishedWombat "And PDFTron looks powerful too - what are its limitations?" For Web viewing? A server component is ideal, which we offer, which allows viewing+annotating on any modern device and browser, even older iOS and Android devices. This would be a hybrid solution, so if a user was on Chrome desktop, for example, then the server would just provide a few images to get started but then the client would take over completely and the server would be idle. While older devices would be assisted by the server more. Blog: pdftron.com/blog/webviewer/webviewer-3-2
  • Ryan
    Ryan almost 6 years
    @VanquishedWombat "Is it possible to capture the annotation details outside of the PDF being edited - so you could store in a history DB?" Yes, PDFTron uses the XML based XFDF format, specified in the PDF ISO standard, for annotation data interchange. You can handle multiple users, real-time collaboration, and versioning,using these XFDF strings, and they are compatible with any database technology. At anytime, you can merge the XFDF data into the source PDF and provide users with a local annotated copy. You can annotate images, text files, and office documents also.
  • Vanquished Wombat
    Vanquished Wombat almost 6 years
    Sounds impressive - is there a free to use / open source option or is this paid-for licensing. How does that work ?
  • Ryan
    Ryan almost 6 years
    @VanquishedWombat Yes, there is Xodo, a free app available in all the stores (4.7/5 rating on Android for example), plus a fully web based version also completely free. See www.xodo.com. For commercial currently there is no free to use solutions, though PDFTron is looking at offering something for very specific use cases. As for licensing, there are lots of options for licensing. While the site advertises the well known enterprise customers, there are lots of small and medium businesses. I would encourage anyone interested to get in contact with sales and to learn what the options are.
  • Vanquished Wombat
    Vanquished Wombat about 5 years
    Ryan - how do you see the future if WASM v's compiled Javascript? Its an intriguing case currently where the better tech (WASM) might just not get enough traction to achieve full adoption.
  • Ryan
    Ryan about 5 years
    WASM has a bright future, not only in the browser, but server side (e.g. with NodeJS). I would not see it as versus javascript, but as a parallel (complementary) technology. For vendors like PDFTron that need high performance rendering (HTML5 canvas does not cover the PDF standard) WASM is very exciting. As for "full" adoption, sure, but as long as there are fallbacks, or those cases are edge, not really an issue. Note that Chrome PNaCl is still a great tech (Google has already delayed PNaCl deprecation a year, and I suspect they will delay again), since it is faster than WASM still.
  • Vidal
    Vidal over 4 years
    @Ryan I tried to find an example on pdfTron to be able to replace text or change it, I did not find it, also where I can see some examples of the docx viewer/editor?
  • Ryan
    Ryan over 4 years
    @Vidal "replace text or change it" It sounds like you are asking about a full editor, like MS Word for DOCX files, which is not available. This SO question meant "edit" with regards to annotating (see comments to question above). "docx viewer/editor" you can open office files in this demo: pdftron.com/webviewer/demo
  • Vidal
    Vidal over 4 years
    @Ryan for me to edit a file, is to change it. I saw that pdftron does not edit is just a layer on top. That's why I ask. Thanks for the response.
  • Paul
    Paul over 3 years
    Any free option for an open source non profit project?
  • Sharcoux
    Sharcoux over 2 years
    I'm using pdf-lib. It's very good to draw elements over an existing pdf, or create a new pdf from scratch. The documentation is quite clear and the API is friendly enough. But pdf is such an obscur format, that there is still a long way to go and the maintainer seems a bit overwhelmed by the success of his lib. I hope the community will come and rescue him se we can have a proper lib to handle pdf matters in javascript
  • Pushkar
    Pushkar over 2 years
    @Sharcoux Does the pdf-lib has functionalities to add annotations and comments in pdfs from the browser side? I read the documentation but there is no explanation for that anywhere.
  • Sharcoux
    Sharcoux over 2 years
    I'm successfully using it with react-native-web, so yes, you can use it from the browser. But you'll still need to handle the UI. pdf-lib only provides a way to alterate the file. You can use libs like react-pdf to draw the pdf