Extracting number from filename

8,038

Solution 1

cut is the wrong tool for that. To manipulate short strings such as file names, use the shell's string manipulation facilities whenever possible. All sh-type shells¹ (sh, dash, bash, ksh, zsh, …) have some basic string manipulation as part of variable substitution. See e.g. the dash manual under “parameter expansion”. You can remove the shortest/longest prefix/suffix that matches a pattern.

You want the last sequence of digits in the file name, so:

  1. Determine the non-numeric suffix by stripping everything up to the last digit.
  2. Remove that suffix.
  3. Strip everything up to the last non-digit.
filename=1.raw_bank_details_211.trg
suffix="${filename##*[0-9]}"
number="${filename%"$suffix"}"
number="${number##*[!-0-9]}"

¹ Except some pre-POSIX Bourne shells, but you don't care about those.

Solution 2

You would be better off using a standard text processing tool instead of a naive tool like cut.

Here are some ways:


With awk, getting the _ or . separated second last field:

awk -F '[_.]' '{print $(NF-1)}' file.txt

grep with PCRE (-P):

grep -Po '\d+(?=[^_]*$)' file.txt
  • -o only gets the matched portion

  • \d+ matches one or more digits

  • The zero width positive lookahead, (?=[^_]*$), ensures that no _ is following till end of the line


With sed:

sed -E 's/.*_([[:digit:]]+).*/\1/' file.txt
  • .*_ matches everything upto last _

  • ([[:digit:]]+) matches the required digits and put in captured group

  • .* matches the rest

  • In the replacement, only the captured group, \1, is used


With perl, same logic to the sed one:

perl -pe 's/.*_(\d+).*/$1/' file.txt 

If you must use cut, do it in two steps, first get the _ separated 4th field and then get . separated 1st field:

cut -d_ -f4 file.txt | cut -d. -f1

This is not recommended as this requires the field numbers to be hardcoded.


If it were a string, i would do it using shell parameter expansion:

% str='1.raw_bank_details_211.trg'

% str=${str##*_} 

% echo "${str%%.*}"
211

You can still use a while construct and take each line into a variable and do this, but that would be slow for a large file. Also alternately you could use _. as the IFS and get the hardcoded field (like cut) instead if you want.


Example:

% cat file.txt                          
1.raw_bank_details_211.trg
2.raw_bank_details_222.trg

% awk -F '[_.]' '{print $(NF-1)}' file.txt
211
222

% grep -Po '\d+(?=[^_]*$)' file.txt         
211
222

% sed -E 's/.*_([[:digit:]]+).*/\1/' file.txt
211
222

% perl -pe 's/.*_(\d+).*/$1/' file.txt 
211
222

% cut -d_ -f4 file.txt | cut -d. -f1
211
222
Share:
8,038

Related videos on Youtube

Rak kundra
Author by

Rak kundra

Updated on September 18, 2022

Comments

  • Rak kundra
    Rak kundra almost 2 years

    I have a filename following this model:

     1.raw_bank_details_211.trg
     2.raw_bank_details_222.trg
    

    I need to use the cut command in unix and cut the above string to obtain 211 and 222 from the strings and echo the value.

    I already used grep grep -o -E '[0-9]+', I need an alternative to this.

    • RomanPerekhrest
      RomanPerekhrest about 7 years
      a filename OR a string?
    • Rui F Ribeiro
      Rui F Ribeiro about 7 years
      I advise not to (ab)use the word requirement. This is a pro-bono service.