How to extract specific elements from a filename?
Solution 1
Using parameter expansion
$ touch 2014-11-19.8.ext 2014-11-26.1.ext
$ for f in *.ext; do d="${f:0:4}${f:5:2}${f:8:2}"; echo "$d"; done
20141119
20141126
-
${f:0:4}
means 4 characters starting from index0
andf
is variable name - replace
echo "$d"
with your code
Solution 2
To loop over every file in the current directory and compare their filenames to the desired pattern, then set a variable containing the date pieces
for f in *
do
[[ $f =~ ^([0-9][0-9][0-9][0-9])-([0-9][0-9])-([0-9][0-9])(.*) ]] &&
yourvar="${BASH_REMATCH[1]}${BASH_REMATCH[2]}${BASH_REMATCH[3]}"
done
This uses bash's [[
ability to use regular expression matching to place the date pieces into the BASH_REMATCH array.
Solution 3
You can do it interactively by using GNU sed
:
$ sed 's/^\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}.*\)/\1\2\3/g' stuff.txt
For multiple files (if in same directory and no other considered files in directory):
for file in *
do
if [ -f "$file" ]
then
sed 's/^\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\).*/\1\2\3/g' "$file"
fi
done
Solution 4
Here is a zsh
way of doing this, without loops:
autoload -U zmv
zmv -n '([0-9](#c4))-([0-9](#c2))-([0-9](#c2))(*)' '$1$2$3$4'
[0-9](#c4)
means any digit repeated 4 times$1
-$2
refer to previously used parenthesis-n
prevents execution (only prints), remove this flag if you are happy with the result
As zsh
takes care of globbing all corner cases (whitespaces, special characters, etc) should be taken automatically into account.
Solution 5
If you're on GNU Coreutils, you have this:
$ date --date=2014-11-13 +"%Y%m%d"
20141113
However:
$ date --date=2014-11-130ABCJUNK +"%Y%m%d"
date: invalid date ‘2014-11-130ABCJUNK’
So the task is much simpler: extact the first ten characters of each YYYY-MM-DDetc
filename to get the date by itself, then pass to date
for reformatting.
But, if we are on GNU Coreutils, we can skip the date
command because touch
has the exact same --date=STRING
option.
for file in * ; do
date=${file%${file##??????????}} # chop all but first ten
touch --date=$date -- "$file"
done
But why do this ten character chopping in the POSIX portable way when we are relying on touch
to be from GNU Coreutils?
for file in * ; do
date=${file:0:10}
touch --date=$date -- "$file"
done
Related videos on Youtube
ylluminate
Updated on September 18, 2022Comments
-
ylluminate over 1 year
I have a bunch of files in the following format:
2014-11-19.8.ext 2014-11-26.1.ext 2014-11-26.2.blah.ext 2014-11-26_3.ext 2014-11-26.4.stuff_here.ext 2014-12-03.1. could be anything.ext 2014-12-032b.ext 2014-11-26 613 adva.ext
My goal is to iterate over the entire list of files and to take the date formatting from
YYYY-MM-DD
and store that in a variable in the format ofYYYYMMDD
for further processing (in my case It's going to be pushed into atouch
command).So normally I would match against this regular expression:
(\d{4})-(\d{2})-(\d{2}).*
And then use
$1$2$3
to get my desired pattern, however I'm not sure how to do this inbash
/zsh
.How can this be done within a shell script as such?
-
ylluminate almost 7 years@Sundeep that latter option is better wrt parameter expansion. So how does that work precisely? Right now in your example you get the
YYYY
andMM
, but then you just grab the rest with${f:8}
, when I would rather just grabDD
and discard.*
(everything afterDD
). -
John Goofy almost 7 yearsPlease, could you post an desired output? Or is your goal to rename files?
-
ylluminate almost 7 years@JohnGoofy please note my edit from ~30 min ago and the answer that Sundeep gave.
-
-
ylluminate almost 7 yearsOkay, nice, but I'm personally liking the conciseness of the comments that @Sundeep is leaving above where you seem to have more readable control over the fields. My goal here is to extract out these elements and then use them in another command (specifically I'm setting times via
touch
). Not quite sure why he's not starting it of as an answer instead of comments... -
FloHe almost 7 yearsYou can e.g. pipe the output to another command.
-
ylluminate almost 7 yearsInteresting, so the "cursor" is on the index and is not inclusive, but rather exclusive. so in this case
5:2
starts at the 1st dash, but does not include it.8:2
starts at the 2nd dash and does not include it. Very interesting and great to know. -
Sundeep almost 7 years
0
is starting index... so the first-
index is4
... -
smw almost 7 years@ylluminate you could do a two-step substitution like
for f in *.ext; do d="${f%%.*}"; echo "${d//-}"; done
though, (first remove the longest trailing string, then remove the dashes). -
BallpointBen almost 7 yearsI always think of indices as pointing between characters instead of at them. The oddball case is then requesting a single character, in which case the index is short for "between that index and the subsequent index", i.e. the character right of that index.
-
ylluminate almost 7 yearsI was told by someone that
touch
requiredYYYYMMDD
only format when the-t
parameter was issued... -
dave_thompson_085 almost 7 years@ylluminate:
-t
requires [[cc]yy]mmddhhmm[.ss] -- which is not the same as you wrote, although it does omit punctuation other than possibly one dot -- but in the GNU version (as clearly stated)--date
(or-d
) is different.