Sort files according to their extensions

8,700

Solution 1

While the general problem of identifying extensions is hard, you can clean up the script a bit:

  1. Tell find to only consider files with an extension: -iname '*.*'
  2. Use awk instead of cutting yourself:
  3. Use a script, and then tell find to exec that script.

Thus: a script called, say, move.sh:

#! /bin/bash
for i
do
    ext=/some/where/else/$(awk -F. '{print $NF}' <<<"$i")
    mkdir -p "$ext"
    mv "$i" "$ext"
done

Then run find thus:

find . -name '*.*' -type f -exec move.sh {} +

This has the problem that you can't rearrange within the folder, so you could use xargs:

find . -name '*.*' -type f -print0 > /tmp/temp
xargs -0 move.sh < /tmp/tmp

I'm not too sure of the efficiency involved, but another approach would be to get all the extensions, then move all the files involved in one swoop.

Something like:

find . -name '*.*' -type f -print0 | sed -z 's/.*\.//g' | sort -zu > /tmp/file-exts

This should get you a list of unique file extensions. Then our move.sh will look like:

#!/bin/bash
for i
do
    mkdir -p "$i"
    find . -name "*.$i" -type f -exec mv -t "$i" {} +
done

And we'll run it:

xargs -0 move.sh < /tmp/file-exts

I make quite a few assumptions in this post, such as sed and sort supporting -z (allowing them to work with the NUL-terminated lines that find and xargs thrive on).

Solution 2

Recursing into subdirectories

Parsing the output of find is unreliable. What if there was a file name with a newline in it? Use find … -exec …, which guarantees reliable processing.

find . -type f -exec sh -c '…' {} \;

The shell snippet receives the file name in $0. Note that this is a separate shell process, it doesn't inherit variables or functions from the grandparent script. You can speed up processing by using the same shell subprocess to handle multiple files.

find . -type f -exec sh -c 'for x; do … done' _ {} +

This time, inside the loop, the file name is in the variable x.

Breaking up the file name

Invoking external utilities such as sed, cut, etc. is fragile: you have to be extremely careful to avoid mangling some file names. You don't need that: the shell's built-in string processing features are enough for what you want to do here. Given a file name $x:

directory=${x%/*}
basename=${x##*/}
extension=…
if [ -n "$extension" ]; then
  mkdir -p "$directory/extension"
  mv "$x" "$directory/extension"
fi

The extension

What is the extension of a file? It's the part after one of the . in the names. There's no standard that says which one. It's up to you to decide what you consider to be the extension in cases like foo.tar.gz or bar-1.2.

Here's some example code that considers common compression extensions to nest, and that requires extensions to contain a letter, so that foo-1.2.tar.gz is considered to have the extension tar.gz.

extension=
while case "${basename##*.}" in
        gz|bz2|xz) extension=.${basename##*.}$extension;; # stackable extension
        *) false;;
do
  basename=${basename%.*}
done
case "${basename##*.}" in
  "$basename") :;; # no . ==> no extension
  *[!0-9A-Za-z]*) :;; # only allow alphanumeric characters
  *[A-Za-z]*) extension=${basename##*.}$extension;; # non-stackable extension
  *) false;; # require at least one letter
esac
extension=${extension#.}
Share:
8,700

Related videos on Youtube

Edward Torvalds
Author by

Edward Torvalds

Updated on September 18, 2022

Comments

  • Edward Torvalds
    Edward Torvalds over 1 year

    I have made a script that will sort files according to their extension and place them in the proper folder. For example, place abc.jpg in the directory jpg.

    #!/bin/bash
    #this script sorts files according to their extensions
    oldIFS=$IFS
    IFS=$'\n'
    (find . -type f) > /tmp/temp
    for var in `cat /tmp/temp`
    do
    name=`basename "$var"`
    ext=`echo $name | cut -d'.' -f2- | cut -d'.' -f2- | cut -d'.' -f2- | cut -d'.' -f2- | cut -d'.' -f2- | cut -d'.' -f2- | cut -d'.' -f2-`
    mkdir -p $ext
    mv "$var" $ext/ 2> /dev/null
    done
    IFS=$oldIFS
    

    problem with this script:

    1. it involves use of IFS, it is said to avoid use of IFS, as much as possible
    2. it does not sorts file without file extensions
    3. it will sort files like abc.tar.bz in folder named bz, but however such a file should go in tar.bz folder
    4. see line 9 of my script; if any file contain more no. of dots(in its name) than no. of cut -d'.' -f2- in the script than if will result in file name taken in extension part.
      for example, a file named i.am.live.in.india.and.i.study.computer.science.txt will be placed in folder named study.computer.science.txt

    you may also suggest any tweaks to make this script more smaller and neat.

  • muru
    muru over 9 years
    @edwardtorvalds it's the path where you want the sorted files to go to. Use it if you use the first approach (find with -exec). If you want the files to go in the same directory where you ran find, remove it and use the second approach (find, followed by xargs).
  • Edward Torvalds
    Edward Torvalds over 9 years
    how come the script works fine for i.am.live.in.india.and.i.study.computer.science.txt and not for abcdg.tar.bz ?
  • Edward Torvalds
    Edward Torvalds over 9 years
    oh i see you grabbing the last part :p
  • muru
    muru over 9 years
    @edwardtorvalds as I said, determining whether an arbitrary string is an extension or not is too hard. Gilles used extra code for some known extensions. I didn't.
  • Mauricio Gracia Gutierrez
    Mauricio Gracia Gutierrez about 3 years
    I have just used the second approach and it works very fast and fine, thanks