Remove text from pdf
Solution 1
You can try pdftk, but it works only a fraction of the time, due to (I believe) a problem with fonts.
It works like this: first you need to uncompress the pdf file,
pdftk myfile.pdf output unc.pdf uncompress
then you modify it with
sed 's/oldstring/newstring/g' < unc.pdf > mod_unc.pdf
lastly you recompress it with
pdftk mod_unc.pdf output myfile_modified.pdf compress
I have had only moderate success with this command, in the sense that sometimes it works, sometimes it doesn't, according to its whim.
Solution 2
On Windows (maybe a virtual machine) you could install PDF-XChange Editor https://www.tracker-software.com/product/downloads/enduser/pdf-xchange-editor
In the free-version can remove text (but not add text) without adding a watermark (of the software, even the software tells you so).
I had to remove several texts, therefore sed
was too timeconsuming/exhausting, and sed
did not work with umlauts.
Source: https://de.wikipedia.org/wiki/Benutzer:JoKalliauer/PDF
Related videos on Youtube
DrYap
Updated on September 18, 2022Comments
-
DrYap over 1 year
I have a pdf file with some text on each page which I would like to remove.
The text is matched by a regex and I think it comes in one block of the pdf.
I have used pdfedit to select and delete the text with the GUI but I was looking for a way to do this from the terminal.
-
mirh about 5 yearspdfedit-tools exists, you know
-
-
DrYap over 10 yearsI gave it a go but the uncompress turned most of the text to white which never got recovered. Funnily the only text that wasn't white was what I'm trying to get rid of!!! Thanks for the suggestion.
-
slhck about 10 yearsThe OP is on Linux and they said they already used PDFEdit. Please read the question before posting an answer.
-
Eugene Pakhomov almost 6 yearsOne can also use
qpdf
:qpdf --stream-data=uncompress myfile.pdf unc.pdf
. -
Maxim over 5 yearsMarius, pdftk is not an Open Source any longer, unfortunately. -- do you know of another solution? Thank you.
-
MariusMatutiae over 5 years@Maxim I still have it in my Debian (and Ubuntu) repos, which means the version available to Linux users is certainly opens source. Why do you say it´s not FOSS any longer? There are free and pay versions, but for the users of Windows and MacOS.
-
Maxim almost 5 years@MariusMatutiae Marius -- I tool the following from the Fedora project mailing list :
Jochen Schmitt 2014-03-04 Because pdftk depends on a gcj-feature which no more supported in Fedora I have retired this package for F20+. So no new version of pdftk will be available on Feodra. whole discussion: https://lists.fedoraproject.org/pipermail/users/2014-December/thread.html#455977 The problem is that libgcj does not exist on F21. So, we can not run pdftk unless we go back to F20
-- -
Maxim almost 5 years@MariusMatutiae -- I have switched from Fedora to Linux Mint since then, and pdftk is not in the Mint repositories either [any longer]. -- but importantly,
libgcj
is gone both from Fedora and Mint... -
Hashim Aziz about 4 yearsRegarding the discussion in the comments, see my answer here.
-
MariusMatutiae about 4 years@Hashim Thank you, excellent answer, +1 from me. My answer is pretty old...