How to convert all pdf files to text (within a folder) with one command?

pdf text convert batch

38,184

Solution 1

The following will convert all files in the current directory:

for file in *.pdf; do pdftotext "$file" "$file.txt"; done

Solution 2

ls *.pdf | xargs -n1 pdftotext

xargs is often a quick solution for running the same command multiple times with just a small change each time. The -n1 option makes sure that only one pdf file is passed to pdftotext at a time.

Edit: If you're worried about spaces in filenames and such, you can use this alternative:

find . -name '*.pdf' -print0 | xargs -0 -n1 pdftotext

Solution 3

write a bash script

for f in *.pdf; do
  pdftotext "$f"
done

or type it in a one-line command as follows:

for f in *.pdf; do pdftotext "$f"; done

I hope this helps. I do not have a large group of .pdfs to test this on, but I use this strategy to convert my .flac files to .ogg files.

38,184

Admin

Updated on September 18, 2022

Comments

Admin over 1 year
I know that I can convert pdf files to text files one by one like this :
```
$ pdftotext filename.pdf
```
But is there a single command that would make that conversion without specifying separate file names so as to convert them all?

I see here, on Wikipedia, that "Wildcards (*), for example $ pdftotext *pdf, for converting multiple files, cannot be used because pdftotext expects only one file name."
Sam over 11 years

its only one command, it can be typed in one line in the terminal (its the pdftotext inside a for loop in a one-line-syntax, which is what the op asked for)
Sam over 11 years

check out these links for more info of how the for loop works: cyberciti.biz/faq/bash-for-loop thegeekstuff.com/2011/07/bash-for-loop-examples
cprofitt over 11 years

would this not cause issues with non-pdf files?
Admin over 11 years

can it be done by opening the terminal in that folder and running a command instead of inserting path manually?
Admin over 11 years

couldn't you paste it here as such and insert it into your answer? that would be a good answer. i was not able to reach the good formula just by deleting a part of what you posted
cprofitt over 11 years

the find and xargs I initially suggested did not work when I got a chance to test them.
Ryan C. Thompson over 11 years

Wouldn't this produce files like "filename.pdf.txt"?
Sam over 11 years

yes, if that's a problem, we could remove the .pdf extension using 'sed' or 'awk' but it would add complexity to the command
Admin over 11 years

this solution has also the advantage of converting only the files in a directory at the level where it is fired and not downward onto the directory structure, into other directories and sub-directories
Martin over 11 years

please see my answer: can that command be adapted so as to avoid the problem mentioned there? this doesn't mean that your solution is not good, on the contrary, it does something very specific that the other alternatives here do not. but i was just curious
Admin over 11 years

You can avoid the find command searching in subdirectories by using -maxdepth 1. Also, when putting it into Thunar's custom actions feature, you should probably replace find . with find %F to allow Thunar to properly pass the paths of the selected directories.
kenorb almost 10 years

Alternatively: ls *.pdf | xargs -L1 -I% pdftotext % %.txt
user0 almost 9 years

You can just for file in *.pdf; do pdftotext "$file"; done and it will replace .pdf with .txt. No need for sed or awk.