Correct file extensions

5,294

Solution 1

You can do it relatively easily in bash:

for f in *jpg; do 
    type=$(file -0 -F" " "$f" | grep -aPo '\0\s*\K\S+') 
    mv "$f" "${f%%.*}.${type,,}"  
done

This is the same idea as @A.B's answer but using shell globs instead of find. The ${f%%.*} is the filename without its extension. The -0 of the file command makes it print a \0 after the file name which we then use to grep the file type. This should work with arbitrary file names, including those that contain spaces, newlines or anything else. The ${type,,} is a trick to get lower case extensions. It would convert PNG to png.

You didn't say in your question, but if you need this to be recursive and descend into subdirectories, you could use this instead:

shopt -s globstar
for f in **/*jpg; do 
    type=$(file -0 -F" " "$f" | grep -aPo '\0\s*\K\S+') 
    mv "$f" "${f%%.*}.${type,,}"  
done

The shopt -s globstar will enable bash's globstar option which lets ** match subdirectories:

globstar

If set, the pattern ** used in a pathname expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a /, only directories and subdirectories match.

Solution 2

The script below can be used to (recursively) rename an incorrectly set extension, .jpg, to the correct one. In case it finds an unreadable file, it will report it in the script's output.

The script use the imghdr module, to recognize the following types: rgb, gif, pbm, pgm, ppm, tiff, rast, xbm, jpeg, bmp, png. More on the imghdr module here. The list can be extended with more types, as mentioned in the link.

As it is, it specifically renames files with the extension .jpg, as mentioned in the question. With a minor change, it can be fit to rename any extension, or a specific set of extensions, into the correct one (or with no extension, like here).

The script:

#!/usr/bin/env python3
import os
import imghdr
import shutil
import sys

directory = sys.argv[1]

for root, dirs, files in os.walk(directory):
    for name in files:
        file = root+"/"+name
        # find files with the (incorrect) extension to rename
        if name.endswith(".jpg"):
            # find the correct extension
            ftype = imghdr.what(file)
            # rename the file
            if ftype != None:
                shutil.move(file, file.replace("jpg",ftype))
            # in case it can't be determined, mention it in the output
            else:
                print("could not determine: "+file)

How to use

  1. Copy the script into an empty file, save it as rename.py
  2. Run it by the command:

    python3 /path/to/rename.py <directory>
    

Solution 3

Note: My approach seems to be too complex. I would prefer terdons answer in your place.


You can use the command file to to determine the file type:

% file 20050101_14-24-37_330.jpg 
20050101_14-24-37_330.jpg: JPEG image data, EXIF standard 2.2, baseline, precision 8, 1200x1600, frames 3

% file test.jpg
test.jpg: PNG image data, 1192 x 774, 8-bit/color RGBA, non-interlaced

With this information, the files can be renamed:

Please do a test before you apply the command to your images

find . -type f -iname "*.jpg" -print0 | xargs -0 -I{} file -F"<separator>" {} | 
 awk -F " image data" '{print $1}' | 
  awk -F"<separator> " '{
   system("mv \""$1"\" $(dirname \""$1"\")/$(basename -s .jpg \"" $1 "\")."$2)
   }'

Example

% find . -type f -name "*.jpg"
./test.jpg
./sub/20050101_14-24-37_330.jpg

% find . -type f -iname "*.jpg" -print0 | xargs -0 -I{} file -F"<separator>" {} | awk -F " image data" '{print $1}' | awk -F"<separator> " '{system ("mv \""$1"\" $(dirname \""$1"\")/$(basename -s .jpg \"" $1 "\")."$2)}'

% find . -type f -iname "*"    
./test.PNG
./sub/20050101_14-24-37_330.JPEG
Share:
5,294

Related videos on Youtube

BilboX
Author by

BilboX

Here to help

Updated on September 18, 2022

Comments

  • BilboX
    BilboX over 1 year

    I have about 12000 images of different file types but every one of them was renamed *.jpg.

    Now I want to give them their proper extensions back, how can I do it

  • terdon
    terdon almost 9 years
    Note that this will break in the unlikely case that any of the file name contain newlines.
  • A.B.
    A.B. almost 9 years
    @terdon Yes, I've been thinking. Unfortunately I have no idea what I can do. Can you help?
  • terdon
    terdon almost 9 years
    I have no idea how to do this properly using awk. It's not the right tool for the job. Either use find -exec bash -c "..." and do everything in there or use while read -d '' name type to split the file name and file output and then parse $type to get the file type. Not worth it really, see my answer for how to do it much more easily in pure(ish) bash.
  • Davide
    Davide almost 9 years
    +1 for simple and easy to read, unlike the bash based solutions.
  • terdon
    terdon almost 9 years
    @A.B. see update. It allows ** to recurse into subdirectories.
  • Paddy Landau
    Paddy Landau almost 9 years
    Those semicolons at the end of each line are redundant, aren't they?
  • terdon
    terdon almost 9 years
    @PaddyLandau yes, I was testing it as a one liner and added newlines for clarity here. I forgot to remove them. Note that they're not wrong, just redundant as you say.
  • terdon
    terdon almost 9 years
    @Campa no, of course not. It would also add bogus extensions to binary files, normal text files, perl and python scripts and the list goes on. The question was asking about images specifically and those do tend to have the same name as their usual extensions. Remember that extensions on Linux are optional, with very few exceptions, they don't actually do anything. They help the user organize their data, the OS doesn't care about them.