how to loop through files that match a regular expression in a unix shell script

14,850

Solution 1

You can use (GNU) find with the regex search option instead of parsing ls.

find . -regextype "egrep" \
       -iregex '.*/MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' \
       -exec [[whatever you want to do]] {} \;

Where [[whatever you want to do]] is the command you want to perform on the names of the files.

From the man page

-regextype type
          Changes  the regular expression syntax understood by -regex and -iregex tests 
          which occur later on the command line.  Currently-implemented types are 
          emacs (this is the default),posix-awk, posix-basic, posix-egrep and 
          posix-extended.

  -regex pattern
          File name matches regular expression pattern.  This is a match on the whole 
          path, not a search.  For example, to match a file named `./fubar3', you can 
          use the regular expression
          `.*bar.' or `.*b.*3', but not `f.*r3'.  The regular expressions understood by 
          find are by default Emacs Regular Expressions, but this can be changed with 
          the -regextype option.

  -iregex pattern
          Like -regex, but the match is case insensitive.

Solution 2

Based on the link Andy K provided I have used the following to loop based on my matching criteria:

for i in $(ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat' ); do             
 echo item: $i;         
done
Share:
14,850
paul frith
Author by

paul frith

50% Composer 50% Mathematician 100% penguin.

Updated on July 19, 2022

Comments

  • paul frith
    paul frith almost 2 years

    I want to be able to loop through a list of files that match a particular pattern. I can get unix to list these files using ls and egrep with a regular expression, but I cannot find a way to turn this into an iterative process. I suspect that using ls is not the answer. Any help would be gratefully received.

    My current ls command looks as follows:

    ls | egrep -i 'MYFILE[0-9][0-9]([0][1-9]|1[0-2])([0][1-9]|[12][0-9]|[3][01]).dat'
    

    I would expect the above to match:

    • MYFILE160418.dat
    • myFILE170312.DAT
    • MyFiLe160416.DaT

    but not:

    • MYOTHERFILE150202.DAT
    • Myfile.dat
    • myfile.csv

    Thanks,

    Paul.

  • paul frith
    paul frith about 8 years
    Interesting - funnily enough "find" was what I looked to in the first place, but I couldn't get my regex to work. -regextype "egrep" is what I needed!
  • paul frith
    paul frith about 8 years
    I've looked at this and it seems that parsing ls is a bad idea due to UNIX allowing almost any character in a file name, including newline feeds etc. However given that I am regex matching, surely that problem is mitigated in this instance. Are there other reasons to not parse an ls?
  • Rany Albeg Wein
    Rany Albeg Wein about 8 years
    Do not use ls output for anything. ls is a tool for interactively looking at directory metadata. Any attempts at parsing ls output with code are broken. Globs are much more simple AND correct: for file in *.txt. Read Parsing ls