Tracking Code Into a PDF or PostScript File

13,624

Solution 1

The PDF standard includes support for JavaScript but as @Wes Hardaker pointed out, not every PDF reader supports it. However, sometimes some is better than none.

Here's Adobe's official Acrobat JavaScript Scripting Guide. What's probably most interesting to you is the doc object which has a method called getURL(). To use it you'd just call:

app.doc.getURL('http://www.google.com/');

Bind that event to the document's open event and you've got a tracker. I'm not too familiar with creating events from within Adobe Acrobat but from code its pretty easy. The code below is a full working VS2010 C# WinForms app that uses the open source library iTextSharp (5.1.1.0). It creates a PDF and adds the JavaScript to the open event.

Some notes: Adobe Acrobat and Reader will both warn the user whenever a document accesses an external resource. Most other PDF readers will probably do the same. This is very annoying so for this reason alone it shouldn't be done. Personally I don't care if someone tracks my document opens, I just don't want to get a prompt every time. Second, just to reiterate, this code works for Adobe Acrobat and Adobe Reader, probably as far back as at least V6, but may or may not work in other PDF readers. Third, there's no safe way to uniquely identify the user. Doing so would require you to create and store some equivalent of a "cookie" which would require you writing to the user's file system which would be considered unsafe. This means that you could only detect the number of opens, not unique opens. Fourth, this might not be legal everywhere. Some jurisdictions require that you notify users if you are tracking them and provide for a way for them to see what information you are collecting.

But with all of the above, I can't not give an answer just because I don't like it.

using System;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //File that we will create
            string OutputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Events.pdf");

            //Standard PDF creation setup
            using (FileStream fs = new FileStream(OutputFile, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (Document doc = new Document(PageSize.LETTER))
                {
                    using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
                    {
                        //Open our document for writing
                        doc.Open();

                        //Create an action that points to the built-in app.doc object and calls the getURL method on it
                        PdfAction act = PdfAction.JavaScript("app.doc.getURL('http://www.google.com/');", writer);

                        //Set that action as the documents open action
                        writer.SetOpenAction(act);

                        //We need to add some content to this PDF to be valid
                        doc.Add(new Paragraph("Hello"));

                        //Close the document
                        doc.Close();
                    }
                }
            }

            this.Close();
        }
    }
}

Solution 2

The problem with technologies like that is that they can never be absolute.

First, it's a security violation to trigger an external event and the software writers likely wouldn't support it (or, at least I hope not).

Second, its dependent on things like the network. What happens when someone downloads it and then reads it while offline on a plane, for example? You won't get the notification.

Third, there are multiple ways to read PDF files. Some people read them with readers you've likely not heard of (my favorite is a linux application that I like much better than the Adobe's AcroRead).

So even if you could do it (and I'd argue you shouldn't, but that's not answering your question), the real answer is "no" but even if the software supported it, it still wouldn't be reliable in the first place.

Share:
13,624
speedplane
Author by

speedplane

Docket Alarm indexes hundreds of millions of lawsuits, analyzes what happens in them, and predicts what will happen in future cases. We package this intelligence up into a SaaS service and sell it to lawyers at top law firms. To analyze the law, we deal with lots of interesting NLP and unstructured data problems, and to tackle them, we use some of the latest machine-learning tools. We're always looking for skilled developers with a passion in the intersection of law and technology. If that's you, please get in touch.

Updated on June 13, 2022

Comments

  • speedplane
    speedplane almost 2 years

    Is there a way to track when a PDF is opened? Perhaps by embedding some script into the pdf itself?

    I saw the question below, and I suppose the answer is "no" for javascript, but I am wondering if this is possible at all.

    Google analytics tracking code insert in pdf file

  • speedplane
    speedplane over 12 years
    Obviously there are privacy and reliability concerns with any type of tracking. I'm not debating that. But why do you say "the real answer is 'no.'" Don't the newer pdfs have dynamic content? I think they have some sort of scripting capabilities which may support something like this.
  • KenS
    KenS over 12 years
    Not all readers support all possible content of PDF files. As Wes pointed out, just because you can do something with Acrobat doesn't mean it'll work in Foxit, Ghostscript, MuPDF, etc, etc.
  • Wes Hardaker
    Wes Hardaker over 12 years
    And the "no" is because, though I admit I'm not a true PDF or PS expert, the current support of the PDF language does not provide support for the language to query external entities (ie, you can't say "go grab this pixel image from this remote website so I can track you). PDFs are, by designed, supposed to be self-contained.
  • speedplane
    speedplane over 12 years
    Wow, this is the answer I was hoping for but didn't expect. Thanks!
  • Sjoerd
    Sjoerd over 6 years
    The question is about PDF and your answer is about PostScript. Does PostScript actually run within a PDF? Do you have any source with more information about their relation?
  • Shark8
    Shark8 over 6 years
    (a) The title explicitly mentions PostScript, and (b) the relation between PostScript and PDF is essentially that PDF is [the result of] PS run through its processing/program; this thread on stackexchange is really informative: tex.stackexchange.com/questions/217511/…