Running a command on many files

6,581

Solution 1

for file in xyz*
do
  ./transeq "$file" "${file}.faa" -table 11
done

This is a simple for loop that will iterate over every file that starts with xyz in the current directory and call the ./transeq program with the filename as the first argument, the filename followed by ".faa" as the second argument, followed by "-table 11".

Solution 2

If you install GNU Parallel you can do it in parallel like this:

parallel ./transeq {} {}.faa -table 11 ::: xyz*

If you program is CPU intensive it should speed up quite a bit.

Solution 3

You can do something like this on a bash command line:

printf '%s\n' {1..5025} | xargs -l -I {} -t ./transeq xyz{} xyz{}.faa -table 11

We are generating the integers from 1 to 5025 , one/line, then feeding them one-by-one to xargs, which encapsulates the integer into {} and then transplants it into the ./transeq command line in an appropriate manner.

Should you not have the brace-expansion facility {n..m} then you could invoke the seq utility to generate those numerics.

Or, you can always emulate the numeric generation via:

yes | sed -n =\;5025q | xargs ...

Solution 4

Assuming you have more than one core, and each invocation can run independently from the rest, you will gain quite a speedup with parallel runs.

A relatively simple way to do this is via the -P parameter of xargs - for example, if you have 4 cores:

echo xyz{1..5025} | \
    xargs -n 1 -P 4 -I{} /path/to/transeq xyz{} xyz{}.faa -table 11

The -n 1 tells xargs to pick only one argument out of the list for each invocation (by default it would pass plenty), and the -P 4 tells it to spawn 4 processes at the same time - when one dies, a new one is spawned.

IMHO, you don't need to install GNU parallel for this simple case - xargs suffices.

Solution 5

Using find, useful when your files are scattered inside directories

find -name "xyz*" -exec ./transeq {} {}.faa -table 11 \;
Share:
6,581

Related videos on Youtube

Manuel
Author by

Manuel

Updated on September 18, 2022

Comments

  • Manuel
    Manuel over 1 year

    I've got a folder with many files (xyz1, xyz2, all the way up to xyz5025) and I need to run a script on every one of them, getting xyz1.faa, xyz2.faa, and so on as outputs.

    The command for a single file is:

    ./transeq xyz1 xyz1.faa -table 11
    

    Is there a way to do that automatically? Maybe a for-do combo?

  • Dave Tweed
    Dave Tweed almost 7 years
    Or, as a one-liner: for file in xyz*; do ./transeq "$file" "${file}.faa" -table 11; done. I type this sort of thing all the time. And if you want to verify that the file names, etc. are getting expanded the way you want, just put an echo right after the do the first time, and then go back in your shell history and delete it the second time.
  • Peter Cordes
    Peter Cordes almost 7 years
    "$file".faa is slightly easier to type as part of an interactive one-liner, and safe because .faa doesn't contain any shell metacharacters that need to be quoted.
  • Peter Cordes
    Peter Cordes almost 7 years
    That's way over-complicated. for i in {1..5025}; do ./transeq "xyz$i" "xyz$i".faa -table 11; done is way easier to think of and type. If you want it to print commands before executing them, use set -x.
  • Admin
    Admin almost 7 years
    Yeah that's correct, but the way the OP formulated the question seemed to me that only the files with names xyz1 .. xyz5025 were of interest. So I thought if we do it using for xyz* then we need a way to reject the non-conforming files ... hence this. Ideally if the OP wants all the files in a directory processed, then why bring up the1 to 5025 thing? Just say that I want all files processed in a prescribed manner would have been sufficient.
  • Jeff Schaller
    Jeff Schaller almost 7 years
    As a note, if you end up with a partial run and want to restart the loop, the xyz* glob would pick up .faa files as well. For bash, run shopt -s extglob (reference), then use for file in xyz!(*.faa) ... to exclude the .faa files from being sent through the loop.