How to convert all pdf files to text (within a folder) with one command?

38,184

Solution 1

The following will convert all files in the current directory:

for file in *.pdf; do pdftotext "$file" "$file.txt"; done

Solution 2

ls *.pdf | xargs -n1 pdftotext

xargs is often a quick solution for running the same command multiple times with just a small change each time. The -n1 option makes sure that only one pdf file is passed to pdftotext at a time.

Edit: If you're worried about spaces in filenames and such, you can use this alternative:

find . -name '*.pdf' -print0 | xargs -0 -n1 pdftotext

Solution 3

write a bash script

for f in *.pdf; do
  pdftotext "$f"
done

or type it in a one-line command as follows:

for f in *.pdf; do pdftotext "$f"; done

I hope this helps. I do not have a large group of .pdfs to test this on, but I use this strategy to convert my .flac files to .ogg files.

Share:
38,184

Related videos on Youtube

Admin
Author by

Admin

Updated on September 18, 2022

Comments

  • Admin
    Admin over 1 year

    I know that I can convert pdf files to text files one by one like this :

    $ pdftotext filename.pdf
    

    But is there a single command that would make that conversion without specifying separate file names so as to convert them all?

    I see here, on Wikipedia, that "Wildcards (*), for example $ pdftotext *pdf, for converting multiple files, cannot be used because pdftotext expects only one file name."

  • Sam
    Sam over 11 years
    its only one command, it can be typed in one line in the terminal (its the pdftotext inside a for loop in a one-line-syntax, which is what the op asked for)
  • Sam
    Sam over 11 years
    check out these links for more info of how the for loop works: cyberciti.biz/faq/bash-for-loop thegeekstuff.com/2011/07/bash-for-loop-examples
  • cprofitt
    cprofitt over 11 years
    would this not cause issues with non-pdf files?
  • Admin
    Admin over 11 years
    can it be done by opening the terminal in that folder and running a command instead of inserting path manually?
  • Admin
    Admin over 11 years
    couldn't you paste it here as such and insert it into your answer? that would be a good answer. i was not able to reach the good formula just by deleting a part of what you posted
  • cprofitt
    cprofitt over 11 years
    the find and xargs I initially suggested did not work when I got a chance to test them.
  • Ryan C. Thompson
    Ryan C. Thompson over 11 years
    Wouldn't this produce files like "filename.pdf.txt"?
  • Sam
    Sam over 11 years
    yes, if that's a problem, we could remove the .pdf extension using 'sed' or 'awk' but it would add complexity to the command
  • Admin
    Admin over 11 years
    this solution has also the advantage of converting only the files in a directory at the level where it is fired and not downward onto the directory structure, into other directories and sub-directories
  • Martin
    Martin over 11 years
    please see my answer: can that command be adapted so as to avoid the problem mentioned there? this doesn't mean that your solution is not good, on the contrary, it does something very specific that the other alternatives here do not. but i was just curious
  • Admin
    Admin over 11 years
    You can avoid the find command searching in subdirectories by using -maxdepth 1. Also, when putting it into Thunar's custom actions feature, you should probably replace find . with find %F to allow Thunar to properly pass the paths of the selected directories.
  • kenorb
    kenorb almost 10 years
    Alternatively: ls *.pdf | xargs -L1 -I% pdftotext % %.txt
  • user0
    user0 almost 9 years
    You can just for file in *.pdf; do pdftotext "$file"; done and it will replace .pdf with .txt. No need for sed or awk.