Grabbing the extension in a file name
Solution 1
If the file name is file-1.0.tar.bz2
, the extension is bz2
. The method you're using to extract the extension (fileext=${filename##*.}
) is perfectly valid¹.
How do you decide that you want the extension to be tar.bz2
and not bz2
or 0.tar.bz2
? You need to answer this question first. Then you can figure out what shell command matches your specification.
-
One possible specification is that extensions must begin with a letter. This heuristic fails for a few common extensions like
7z
, which might be best treated as a special case. Here's a bash/ksh/zsh implementation:basename=$filename; fileext= while [[ $basename = ?*.* && ( ${basename##*.} = [A-Za-z]* || ${basename##*.} = 7z ) ]] do fileext=${basename##*.}.$fileext basename=${basename%.*} done fileext=${fileext%.}
For POSIX portability, you need to use a
case
statement for pattern matching.while case $basename in ?*.*) case ${basename##*.} in [A-Za-z]*|7z) true;; *) false;; esac;; *) false;; esac do …
-
Another possible specification is that some extensions denote encodings and indicate that further stripping is needed. Here's a bash/ksh/zsh implementation (requiring
shopt -s extglob
under bash andsetopt ksh_glob
under zsh):basename=$filename fileext= while [[ $basename = ?*.@(bz2|gz|lzma) ]]; do fileext=${basename##*.}.$fileext basename=${basename%.*} done if [[ $basename = ?*.* ]]; then fileext=${basename##*.}.$fileext basename=${basename%.*} fi fileext=${fileext%.}
Note that this considers
0
to be an extension infile-1.0.gz
.
¹
${VARIABLE##SUFFIX}
and related constructs are in POSIX, so they work in any non-antique Bourne-style shell such as ash, bash, ksh or zsh.
Solution 2
You might simplify matters by just doing pattern matching on the filename rather than extracting the extension twice:
case "$filename" in
*.tar.bz2) bunzip_then_untar ;;
*.bz2) bunzip_only ;;
*.tar.gz) untar_with -z ;;
*.tgz) untar_with -z ;;
*.gz) gunzip_only ;;
*.zip) unzip ;;
*.7z) do something ;;
*) do nothing ;;
esac
Solution 3
$ echo "thisfile.txt"|awk -F . '{print $NF}'
Comments on this here: http://liquidat.wordpress.com/2007/09/29/short-tip-get-file-extension-in-shell-script/
Solution 4
Here's my shot at it: Translate dots to newlines, pipe through tail
, get last line:
$> TEXT=123.234.345.456.456.567.678
$> echo $TEXT | tr . \\n | tail -n1
678
Solution 5
One day I've created those tricky functions:
# args: string how_many
function get_last_letters(){ echo ${1:${#1}-$2:$2}; }
function cut_last_letters(){ echo ${1:0:${#1}-$2}; }
I've found this straightforward approach, very useful in many cases, not only when it goes about extensions.
For checking extensions - It's simple and reliable
~$ get_last_letters file.bz2 4
.bz2
~$ get_last_letters file.0.tar.bz2 4
.bz2
For cutting-off extension:
~$ cut_last_letters file.0.tar.bz2 4
file.0.tar
For changing extension:
~$ echo $(cut_last_letters file.0.tar.bz2 4).gz
file.0.tar.gz
Or, if you like "handy functions:
~$ function cut_last_letters_and_add(){ echo ${1:0:${#1}-$2}"$3"; }
~$ cut_last_letters_and_add file.0.tar.bz2 4 .gz
file.0.tar.gz
P.S. If you liked those functions or found them usedfull, please refer to this post :) (and hopefully put a comment).
Comments
-
uray almost 2 years
How do I get the file extension from bash? Here's what I tried:
filename=`basename $filepath` fileext=${filename##*.}
By doing that I can get extension of
bz2
from the path/dir/subdir/file.bz2
, but I have a problem with the path/dir/subdir/file-1.0.tar.bz2
.I would prefer a solution using only bash without external programs if it is possible.
To make my question clear, I was creating a bash script to extract any given archive just by a single command of
extract path_to_file
. How to extract the file is determined by the script by seeing its compression or archiving type, that could be .tar.gz, .gz, .bz2 etc. I think this should involve string manipulation, for example if I get the extension.gz
then I should check whether it has the string.tar
before.gz
— if so, the extension should be.tar.gz
.-
Kurt almost 14 yearsfile="/dir/subdir/file-1.0.tar.bz2"; echo ${file##*.} prints '.bz2' here. What is the output that you're expecting?
-
uray almost 14 yearsi need
.tar.bz2
-
kenorb about 6 yearsRelated: Extract filename and extension in Bash.
-
-
uray almost 14 yearsnot work for
.tar.gz
extension -
Kurt almost 14 yearsDoes not work for all cases. Try with 'foo.7z'
-
Gilles 'SO- stop being evil' almost 14 yearsYou need quotes, and better use
printf
in case the file name contains a backslash or begins with-
:"${filename#$(printf %s "$filename" | sed 's/\.[^[:digit:]].*$//g;')}"
-
Gilles 'SO- stop being evil' almost 14 years@axel_c: right, and I've implemented the same specification as Maciej as an example. What heuristic do you suggest that's better than “begins with a letter”?
-
uray almost 14 yearsthat should be solved, by checking if the string before last
.
token is archive type, for exampletar
, if its not archive type like0
iteration should end. -
Kurt almost 14 years@Gilles: i just think there's not a solution unless you use a precomputed list of known extensions, because an extension can be anything.
-
Gilles 'SO- stop being evil' almost 14 years@uray: that works in this particular case, but it's not a general solution. Consider Maciej's example of
.patch.lzma
. A better heuristic would be to consider the string after the last.
: if it's a compression suffix (.7z
,.bz2
,.gz
, ...), continue stripping. -
Chris almost 14 yearsWell a .tar.gz is actually a tar inside a gzip file so it does work in the sense that it removes a gz extension from a gzip file.
-
AsymLabs over 7 yearsThis solution is beautifully simple.
-
Gilles 'SO- stop being evil' almost 5 years@NoamM What was wrong with the indentation? It's definitely broken after your edit: doubly-nested code is indented the same as singly-nested.