Extracting number from filename
Solution 1
cut
is the wrong tool for that. To manipulate short strings such as file names, use the shell's string manipulation facilities whenever possible. All sh-type shells¹ (sh, dash, bash, ksh, zsh, …) have some basic string manipulation as part of variable substitution. See e.g. the dash manual under “parameter expansion”. You can remove the shortest/longest prefix/suffix that matches a pattern.
You want the last sequence of digits in the file name, so:
- Determine the non-numeric suffix by stripping everything up to the last digit.
- Remove that suffix.
- Strip everything up to the last non-digit.
filename=1.raw_bank_details_211.trg
suffix="${filename##*[0-9]}"
number="${filename%"$suffix"}"
number="${number##*[!-0-9]}"
¹ Except some pre-POSIX Bourne shells, but you don't care about those.
Solution 2
You would be better off using a standard text processing tool instead of a naive tool like cut
.
Here are some ways:
With awk
, getting the _
or .
separated second last field:
awk -F '[_.]' '{print $(NF-1)}' file.txt
grep
with PCRE (-P
):
grep -Po '\d+(?=[^_]*$)' file.txt
-o
only gets the matched portion\d+
matches one or more digitsThe zero width positive lookahead,
(?=[^_]*$)
, ensures that no_
is following till end of the line
With sed
:
sed -E 's/.*_([[:digit:]]+).*/\1/' file.txt
.*_
matches everything upto last_
([[:digit:]]+)
matches the required digits and put in captured group.*
matches the restIn the replacement, only the captured group,
\1
, is used
With perl
, same logic to the sed
one:
perl -pe 's/.*_(\d+).*/$1/' file.txt
If you must use cut
, do it in two steps, first get the _
separated 4th field and then get .
separated 1st field:
cut -d_ -f4 file.txt | cut -d. -f1
This is not recommended as this requires the field numbers to be hardcoded.
If it were a string, i would do it using shell parameter expansion:
% str='1.raw_bank_details_211.trg'
% str=${str##*_}
% echo "${str%%.*}"
211
You can still use a while
construct and take each line into a variable and do this, but that would be slow for a large file. Also alternately you could use _.
as the IFS
and get the hardcoded field (like cut
) instead if you want.
Example:
% cat file.txt
1.raw_bank_details_211.trg
2.raw_bank_details_222.trg
% awk -F '[_.]' '{print $(NF-1)}' file.txt
211
222
% grep -Po '\d+(?=[^_]*$)' file.txt
211
222
% sed -E 's/.*_([[:digit:]]+).*/\1/' file.txt
211
222
% perl -pe 's/.*_(\d+).*/$1/' file.txt
211
222
% cut -d_ -f4 file.txt | cut -d. -f1
211
222
Related videos on Youtube
Rak kundra
Updated on September 18, 2022Comments
-
Rak kundra almost 2 years
I have a filename following this model:
1.raw_bank_details_211.trg 2.raw_bank_details_222.trg
I need to use the
cut
command in unix and cut the above string to obtain211
and222
from the strings and echo the value.I already used grep
grep -o -E '[0-9]+'
, I need an alternative to this.-
RomanPerekhrest about 7 yearsa filename OR a string?
-
Rui F Ribeiro about 7 yearsI advise not to (ab)use the word requirement. This is a pro-bono service.
-