How to convert all pdf files to text (within a folder) with one command?
Solution 1
The following will convert all files in the current directory:
for file in *.pdf; do pdftotext "$file" "$file.txt"; done
Solution 2
ls *.pdf | xargs -n1 pdftotext
xargs
is often a quick solution for running the same command multiple times with just a small change each time. The -n1
option makes sure that only one pdf file is passed to pdftotext at a time.
Edit: If you're worried about spaces in filenames and such, you can use this alternative:
find . -name '*.pdf' -print0 | xargs -0 -n1 pdftotext
Solution 3
write a bash script
for f in *.pdf; do
pdftotext "$f"
done
or type it in a one-line command as follows:
for f in *.pdf; do pdftotext "$f"; done
I hope this helps. I do not have a large group of .pdfs to test this on, but I use this strategy to convert my .flac files to .ogg files.
Related videos on Youtube
Admin
Updated on September 18, 2022Comments
-
Admin over 1 year
I know that I can convert pdf files to text files one by one like this :
$ pdftotext filename.pdf
But is there a single command that would make that conversion without specifying separate file names so as to convert them all?
I see here, on Wikipedia, that "Wildcards (*), for example $ pdftotext *pdf, for converting multiple files, cannot be used because pdftotext expects only one file name."
-
Sam over 11 yearsits only one command, it can be typed in one line in the terminal (its the pdftotext inside a for loop in a one-line-syntax, which is what the op asked for)
-
Sam over 11 yearscheck out these links for more info of how the for loop works: cyberciti.biz/faq/bash-for-loop thegeekstuff.com/2011/07/bash-for-loop-examples
-
cprofitt over 11 yearswould this not cause issues with non-pdf files?
-
Admin over 11 yearscan it be done by opening the terminal in that folder and running a command instead of inserting path manually?
-
Admin over 11 yearscouldn't you paste it here as such and insert it into your answer? that would be a good answer. i was not able to reach the good formula just by deleting a part of what you posted
-
cprofitt over 11 yearsthe find and xargs I initially suggested did not work when I got a chance to test them.
-
Ryan C. Thompson over 11 yearsWouldn't this produce files like "filename.pdf.txt"?
-
Sam over 11 yearsyes, if that's a problem, we could remove the .pdf extension using 'sed' or 'awk' but it would add complexity to the command
-
Admin over 11 yearsthis solution has also the advantage of converting only the files in a directory at the level where it is fired and not downward onto the directory structure, into other directories and sub-directories
-
Martin over 11 yearsplease see my answer: can that command be adapted so as to avoid the problem mentioned there? this doesn't mean that your solution is not good, on the contrary, it does something very specific that the other alternatives here do not. but i was just curious
-
Admin over 11 yearsYou can avoid the
find
command searching in subdirectories by using-maxdepth 1
. Also, when putting it into Thunar's custom actions feature, you should probably replacefind .
withfind %F
to allow Thunar to properly pass the paths of the selected directories. -
kenorb almost 10 yearsAlternatively:
ls *.pdf | xargs -L1 -I% pdftotext % %.txt
-
user0 almost 9 yearsYou can just
for file in *.pdf; do pdftotext "$file"; done
and it will replace.pdf
with.txt
. No need forsed
orawk
.