Extracting number from filename

shell filenames string cut

8,038

Solution 1

cut is the wrong tool for that. To manipulate short strings such as file names, use the shell's string manipulation facilities whenever possible. All sh-type shells¹ (sh, dash, bash, ksh, zsh, …) have some basic string manipulation as part of variable substitution. See e.g. the dash manual under “parameter expansion”. You can remove the shortest/longest prefix/suffix that matches a pattern.

You want the last sequence of digits in the file name, so:

Determine the non-numeric suffix by stripping everything up to the last digit.
Remove that suffix.
Strip everything up to the last non-digit.

filename=1.raw_bank_details_211.trg
suffix="${filename##*[0-9]}"
number="${filename%"$suffix"}"
number="${number##*[!-0-9]}"

¹ _{Except some pre-POSIX Bourne shells, but you don't care about those.}

Solution 2

You would be better off using a standard text processing tool instead of a naive tool like cut.

Here are some ways:

With awk, getting the _ or . separated second last field:

awk -F '[_.]' '{print $(NF-1)}' file.txt

grep with PCRE (-P):

grep -Po '\d+(?=[^_]*$)' file.txt

-o only gets the matched portion
\d+ matches one or more digits
The zero width positive lookahead, (?=[^_]*$), ensures that no _ is following till end of the line

With sed:

sed -E 's/.*_([[:digit:]]+).*/\1/' file.txt

.*_ matches everything upto last _
([[:digit:]]+) matches the required digits and put in captured group
.* matches the rest
In the replacement, only the captured group, \1, is used

With perl, same logic to the sed one:

perl -pe 's/.*_(\d+).*/$1/' file.txt

If you must use cut, do it in two steps, first get the _ separated 4th field and then get . separated 1st field:

cut -d_ -f4 file.txt | cut -d. -f1

This is not recommended as this requires the field numbers to be hardcoded.

If it were a string, i would do it using shell parameter expansion:

% str='1.raw_bank_details_211.trg'

% str=${str##*_} 

% echo "${str%%.*}"
211

You can still use a while construct and take each line into a variable and do this, but that would be slow for a large file. Also alternately you could use _. as the IFS and get the hardcoded field (like cut) instead if you want.

Example:

% cat file.txt                          
1.raw_bank_details_211.trg
2.raw_bank_details_222.trg

% awk -F '[_.]' '{print $(NF-1)}' file.txt
211
222

% grep -Po '\d+(?=[^_]*$)' file.txt         
211
222

% sed -E 's/.*_([[:digit:]]+).*/\1/' file.txt
211
222

% perl -pe 's/.*_(\d+).*/$1/' file.txt 
211
222

% cut -d_ -f4 file.txt | cut -d. -f1
211
222

8,038

Rak kundra

Updated on September 18, 2022

Comments

Rak kundra almost 2 years
I have a filename following this model:
```
 1.raw_bank_details_211.trg
 2.raw_bank_details_222.trg
```
I need to use the cut command in unix and cut the above string to obtain 211 and 222 from the strings and echo the value.

I already used grep grep -o -E '[0-9]+', I need an alternative to this.
- RomanPerekhrest about 7 years
  
  a filename OR a string?
- Rui F Ribeiro about 7 years
  
  I advise not to (ab)use the word requirement. This is a pro-bono service.