Grabbing the extension in a file name

76,701

Solution 1

If the file name is file-1.0.tar.bz2, the extension is bz2. The method you're using to extract the extension (fileext=${filename##*.}) is perfectly valid¹.

How do you decide that you want the extension to be tar.bz2 and not bz2 or 0.tar.bz2? You need to answer this question first. Then you can figure out what shell command matches your specification.

  • One possible specification is that extensions must begin with a letter. This heuristic fails for a few common extensions like 7z, which might be best treated as a special case. Here's a bash/ksh/zsh implementation:

    basename=$filename; fileext=
    while [[ $basename = ?*.* &&
             ( ${basename##*.} = [A-Za-z]* || ${basename##*.} = 7z ) ]]
    do
      fileext=${basename##*.}.$fileext
      basename=${basename%.*}
    done
    fileext=${fileext%.}
    

    For POSIX portability, you need to use a case statement for pattern matching.

    while case $basename in
            ?*.*) case ${basename##*.} in [A-Za-z]*|7z) true;; *) false;; esac;;
            *) false;;
          esac
    do …
    
  • Another possible specification is that some extensions denote encodings and indicate that further stripping is needed. Here's a bash/ksh/zsh implementation (requiring shopt -s extglob under bash and setopt ksh_glob under zsh):

    basename=$filename
    fileext=
    while [[ $basename = ?*.@(bz2|gz|lzma) ]]; do
      fileext=${basename##*.}.$fileext
      basename=${basename%.*}
    done
    if [[ $basename = ?*.* ]]; then
      fileext=${basename##*.}.$fileext
      basename=${basename%.*}
    fi
    fileext=${fileext%.}
    

    Note that this considers 0 to be an extension in file-1.0.gz.

¹ ${VARIABLE##SUFFIX} and related constructs are in POSIX, so they work in any non-antique Bourne-style shell such as ash, bash, ksh or zsh.

Solution 2

You might simplify matters by just doing pattern matching on the filename rather than extracting the extension twice:

case "$filename" in
    *.tar.bz2) bunzip_then_untar ;;
    *.bz2)     bunzip_only ;;
    *.tar.gz)  untar_with -z ;;
    *.tgz)     untar_with -z ;;
    *.gz)      gunzip_only ;;
    *.zip)     unzip ;;
    *.7z)      do something ;;
    *)         do nothing ;;
esac

Solution 3

$ echo "thisfile.txt"|awk -F . '{print $NF}'

Comments on this here: http://liquidat.wordpress.com/2007/09/29/short-tip-get-file-extension-in-shell-script/

Solution 4

Here's my shot at it: Translate dots to newlines, pipe through tail, get last line:

$> TEXT=123.234.345.456.456.567.678
$> echo $TEXT | tr . \\n | tail -n1
678

Solution 5

One day I've created those tricky functions:

# args: string how_many
function get_last_letters(){ echo ${1:${#1}-$2:$2}; }
function cut_last_letters(){ echo ${1:0:${#1}-$2}; }

I've found this straightforward approach, very useful in many cases, not only when it goes about extensions.

For checking extensions - It's simple and reliable

~$ get_last_letters file.bz2 4
.bz2
~$ get_last_letters file.0.tar.bz2 4
.bz2

For cutting-off extension:

~$ cut_last_letters file.0.tar.bz2 4
file.0.tar

For changing extension:

~$ echo $(cut_last_letters file.0.tar.bz2 4).gz
file.0.tar.gz

Or, if you like "handy functions:

~$ function cut_last_letters_and_add(){ echo ${1:0:${#1}-$2}"$3"; }
~$ cut_last_letters_and_add file.0.tar.bz2 4 .gz
file.0.tar.gz

P.S. If you liked those functions or found them usedfull, please refer to this post :) (and hopefully put a comment).

Share:
76,701
uray
Author by

uray

i'am a programmer

Updated on September 17, 2022

Comments

  • uray
    uray almost 2 years

    How do I get the file extension from bash? Here's what I tried:

    filename=`basename $filepath`
    fileext=${filename##*.}
    

    By doing that I can get extension of bz2 from the path /dir/subdir/file.bz2, but I have a problem with the path /dir/subdir/file-1.0.tar.bz2.

    I would prefer a solution using only bash without external programs if it is possible.

    To make my question clear, I was creating a bash script to extract any given archive just by a single command of extract path_to_file. How to extract the file is determined by the script by seeing its compression or archiving type, that could be .tar.gz, .gz, .bz2 etc. I think this should involve string manipulation, for example if I get the extension .gz then I should check whether it has the string .tar before .gz — if so, the extension should be .tar.gz.

    • Kurt
      Kurt almost 14 years
      file="/dir/subdir/file-1.0.tar.bz2"; echo ${file##*.} prints '.bz2' here. What is the output that you're expecting?
    • uray
      uray almost 14 years
      i need .tar.bz2
    • kenorb
      kenorb about 6 years
  • uray
    uray almost 14 years
    not work for .tar.gz extension
  • Kurt
    Kurt almost 14 years
    Does not work for all cases. Try with 'foo.7z'
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 14 years
    You need quotes, and better use printf in case the file name contains a backslash or begins with -: "${filename#$(printf %s "$filename" | sed 's/\.[^[:digit:]].*$//g;')}"
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 14 years
    @axel_c: right, and I've implemented the same specification as Maciej as an example. What heuristic do you suggest that's better than “begins with a letter”?
  • uray
    uray almost 14 years
    that should be solved, by checking if the string before last . token is archive type, for example tar, if its not archive type like 0 iteration should end.
  • Kurt
    Kurt almost 14 years
    @Gilles: i just think there's not a solution unless you use a precomputed list of known extensions, because an extension can be anything.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 14 years
    @uray: that works in this particular case, but it's not a general solution. Consider Maciej's example of .patch.lzma. A better heuristic would be to consider the string after the last .: if it's a compression suffix (.7z, .bz2, .gz, ...), continue stripping.
  • Chris
    Chris almost 14 years
    Well a .tar.gz is actually a tar inside a gzip file so it does work in the sense that it removes a gz extension from a gzip file.
  • AsymLabs
    AsymLabs over 7 years
    This solution is beautifully simple.
  • Gilles 'SO- stop being evil'
    Gilles 'SO- stop being evil' almost 5 years
    @NoamM What was wrong with the indentation? It's definitely broken after your edit: doubly-nested code is indented the same as singly-nested.