How to extract specific elements from a filename?

5,164

Solution 1

Using parameter expansion

$ touch 2014-11-19.8.ext 2014-11-26.1.ext
$ for f in *.ext; do d="${f:0:4}${f:5:2}${f:8:2}"; echo "$d"; done
20141119
20141126
  • ${f:0:4} means 4 characters starting from index 0 and f is variable name
  • replace echo "$d" with your code

Solution 2

To loop over every file in the current directory and compare their filenames to the desired pattern, then set a variable containing the date pieces

for f in *
do 
  [[ $f =~ ^([0-9][0-9][0-9][0-9])-([0-9][0-9])-([0-9][0-9])(.*) ]] && 
  yourvar="${BASH_REMATCH[1]}${BASH_REMATCH[2]}${BASH_REMATCH[3]}"
done

This uses bash's [[ ability to use regular expression matching to place the date pieces into the BASH_REMATCH array.

Solution 3

You can do it interactively by using GNU sed:

$ sed 's/^\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}.*\)/\1\2\3/g' stuff.txt

For multiple files (if in same directory and no other considered files in directory):

for file in *
do
    if [ -f "$file" ]
    then
          sed 's/^\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\).*/\1\2\3/g' "$file"
    fi
done

Solution 4

Here is a zsh way of doing this, without loops:

autoload -U zmv
zmv -n '([0-9](#c4))-([0-9](#c2))-([0-9](#c2))(*)' '$1$2$3$4'
  • [0-9](#c4) means any digit repeated 4 times
  • $1-$2 refer to previously used parenthesis
  • -n prevents execution (only prints), remove this flag if you are happy with the result

As zsh takes care of globbing all corner cases (whitespaces, special characters, etc) should be taken automatically into account.

Solution 5

If you're on GNU Coreutils, you have this:

$ date --date=2014-11-13 +"%Y%m%d"
20141113

However:

$ date --date=2014-11-130ABCJUNK +"%Y%m%d"
date: invalid date ‘2014-11-130ABCJUNK’

So the task is much simpler: extact the first ten characters of each YYYY-MM-DDetc filename to get the date by itself, then pass to date for reformatting.

But, if we are on GNU Coreutils, we can skip the date command because touch has the exact same --date=STRING option.

for file in * ; do
  date=${file%${file##??????????}} # chop all but first ten
  touch --date=$date -- "$file"
done

But why do this ten character chopping in the POSIX portable way when we are relying on touch to be from GNU Coreutils?

for file in * ; do
  date=${file:0:10}
  touch --date=$date -- "$file"
done
Share:
5,164

Related videos on Youtube

ylluminate
Author by

ylluminate

Updated on September 18, 2022

Comments

  • ylluminate
    ylluminate over 1 year

    I have a bunch of files in the following format:

    2014-11-19.8.ext
    2014-11-26.1.ext
    2014-11-26.2.blah.ext
    2014-11-26_3.ext
    2014-11-26.4.stuff_here.ext
    2014-12-03.1. could be anything.ext
    2014-12-032b.ext
    2014-11-26 613 adva.ext
    

    My goal is to iterate over the entire list of files and to take the date formatting from YYYY-MM-DD and store that in a variable in the format of YYYYMMDD for further processing (in my case It's going to be pushed into a touch command).

    So normally I would match against this regular expression: (\d{4})-(\d{2})-(\d{2}).*

    And then use $1$2$3 to get my desired pattern, however I'm not sure how to do this in bash / zsh.

    How can this be done within a shell script as such?

    • ylluminate
      ylluminate almost 7 years
      @Sundeep that latter option is better wrt parameter expansion. So how does that work precisely? Right now in your example you get the YYYY and MM, but then you just grab the rest with ${f:8}, when I would rather just grab DD and discard .* (everything after DD).
    • John Goofy
      John Goofy almost 7 years
      Please, could you post an desired output? Or is your goal to rename files?
    • ylluminate
      ylluminate almost 7 years
      @JohnGoofy please note my edit from ~30 min ago and the answer that Sundeep gave.
  • ylluminate
    ylluminate almost 7 years
    Okay, nice, but I'm personally liking the conciseness of the comments that @Sundeep is leaving above where you seem to have more readable control over the fields. My goal here is to extract out these elements and then use them in another command (specifically I'm setting times via touch). Not quite sure why he's not starting it of as an answer instead of comments...
  • FloHe
    FloHe almost 7 years
    You can e.g. pipe the output to another command.
  • ylluminate
    ylluminate almost 7 years
    Interesting, so the "cursor" is on the index and is not inclusive, but rather exclusive. so in this case 5:2 starts at the 1st dash, but does not include it. 8:2 starts at the 2nd dash and does not include it. Very interesting and great to know.
  • Sundeep
    Sundeep almost 7 years
    0 is starting index... so the first - index is 4...
  • smw
    smw almost 7 years
    @ylluminate you could do a two-step substitution like for f in *.ext; do d="${f%%.*}"; echo "${d//-}"; done though, (first remove the longest trailing string, then remove the dashes).
  • BallpointBen
    BallpointBen almost 7 years
    I always think of indices as pointing between characters instead of at them. The oddball case is then requesting a single character, in which case the index is short for "between that index and the subsequent index", i.e. the character right of that index.
  • ylluminate
    ylluminate almost 7 years
    I was told by someone that touch required YYYYMMDD only format when the -t parameter was issued...
  • dave_thompson_085
    dave_thompson_085 almost 7 years
    @ylluminate: -t requires [[cc]yy]mmddhhmm[.ss] -- which is not the same as you wrote, although it does omit punctuation other than possibly one dot -- but in the GNU version (as clearly stated) --date (or -d) is different.