Remove text from pdf

9,346

Solution 1

You can try pdftk, but it works only a fraction of the time, due to (I believe) a problem with fonts.

It works like this: first you need to uncompress the pdf file,

  pdftk myfile.pdf output unc.pdf uncompress

then you modify it with

  sed 's/oldstring/newstring/g' < unc.pdf > mod_unc.pdf

lastly you recompress it with

 pdftk mod_unc.pdf output myfile_modified.pdf compress

I have had only moderate success with this command, in the sense that sometimes it works, sometimes it doesn't, according to its whim.

Solution 2

On Windows (maybe a virtual machine) you could install PDF-XChange Editor https://www.tracker-software.com/product/downloads/enduser/pdf-xchange-editor

In the free-version can remove text (but not add text) without adding a watermark (of the software, even the software tells you so).

I had to remove several texts, therefore sed was too timeconsuming/exhausting, and sed did not work with umlauts.

Source: https://de.wikipedia.org/wiki/Benutzer:JoKalliauer/PDF

Share:
9,346

Related videos on Youtube

DrYap
Author by

DrYap

Updated on September 18, 2022

Comments

  • DrYap
    DrYap over 1 year

    I have a pdf file with some text on each page which I would like to remove.

    The text is matched by a regex and I think it comes in one block of the pdf.

    I have used pdfedit to select and delete the text with the GUI but I was looking for a way to do this from the terminal.

    • mirh
      mirh about 5 years
      pdfedit-tools exists, you know
  • DrYap
    DrYap over 10 years
    I gave it a go but the uncompress turned most of the text to white which never got recovered. Funnily the only text that wasn't white was what I'm trying to get rid of!!! Thanks for the suggestion.
  • slhck
    slhck about 10 years
    The OP is on Linux and they said they already used PDFEdit. Please read the question before posting an answer.
  • Eugene Pakhomov
    Eugene Pakhomov almost 6 years
    One can also use qpdf: qpdf --stream-data=uncompress myfile.pdf unc.pdf.
  • Maxim
    Maxim over 5 years
    Marius, pdftk is not an Open Source any longer, unfortunately. -- do you know of another solution? Thank you.
  • MariusMatutiae
    MariusMatutiae over 5 years
    @Maxim I still have it in my Debian (and Ubuntu) repos, which means the version available to Linux users is certainly opens source. Why do you say it´s not FOSS any longer? There are free and pay versions, but for the users of Windows and MacOS.
  • Maxim
    Maxim almost 5 years
    @MariusMatutiae Marius -- I tool the following from the Fedora project mailing list : Jochen Schmitt 2014-03-04 Because pdftk depends on a gcj-feature which no more supported in Fedora I have retired this package for F20+. So no new version of pdftk will be available on Feodra. whole discussion: https://lists.fedoraproject.org/pipermail/users/2014-Decembe‌​r/thread.html#455977 The problem is that libgcj does not exist on F21. So, we can not run pdftk unless we go back to F20 --
  • Maxim
    Maxim almost 5 years
    @MariusMatutiae -- I have switched from Fedora to Linux Mint since then, and pdftk is not in the Mint repositories either [any longer]. -- but importantly, libgcj is gone both from Fedora and Mint...
  • Hashim Aziz
    Hashim Aziz about 4 years
    Regarding the discussion in the comments, see my answer here.
  • MariusMatutiae
    MariusMatutiae about 4 years
    @Hashim Thank you, excellent answer, +1 from me. My answer is pretty old...