How to find all JPG files on the file system when .jpg extension is not obligatory ?

70,672

If you want to crawl on dirs and subdirs:

find /home/place/to/crawl -type f -exec file --mime-type {}  \; | awk '{if ($NF == "image/jpeg") print $0 }'

What it does?

  • Search all inodes with the type file
  • Execute the command file, to get a jpeg header of the file like: image/jpeg
  • awk

Edit: Added @Franklin tip, to use file with -i to use the mime string standard while outputing filetypes. This will reduce the false positives of the jpeg word.

Edit2: Added @don_crissti tip. Filtering now just the last column with awk and printing the whole line if matches with image/jpeg. Changed the file switch to --mime-type to suppress charset information

Share:
70,672

Related videos on Youtube

Abdul Al Hazred
Author by

Abdul Al Hazred

Updated on November 28, 2022

Comments

  • Abdul Al Hazred
    Abdul Al Hazred over 1 year

    First thing I noticed when switched from Windows to Linux was, that Linux has no strict naming convention and no obligatory file name extensions like .bmp, .jpg, .exe etc. Therefore I can not tell by the file name itself its file format.

    If all JPEG files on my file system had the .jpg extension, I could simply find all JPEG files by:

    find / -type f -name "*.jpg"
    

    But if it is not the case I am clueless how to find all JPEG files.

    • Admin
      Admin about 9 years
    • Admin
      Admin about 9 years
      No its not the same one. Here he is asking about the cases where the files have NO EXTENSION OF JPG
    • Admin
      Admin about 9 years
      @Miline - the other question (same poster) asks " for a method to find only JPEG files" it doesn't say anything about the files actually having the extension .jpg so it's pretty much the same question.
    • Admin
      Admin about 9 years
      I personally think that this question lies the focus on the search for content than name, nwildner gave a very good answer.
    • Admin
      Admin about 9 years
      @AbdulAlHazred - you already have an answer there that focuses on "content" (if you actually bothered reading it).
    • Admin
      Admin about 9 years
      frankly speaking, i know realize (after learning some linux here), that my wording of the question was bad, by the way its focus is different from this even if the answer is contained in the other as a subset of a wider explanation.
    • Admin
      Admin about 9 years
      @AbdulAlHazred - you prolly haven't read my replies so have it your way.
    • Admin
      Admin about 9 years
      I cherish the idea of presumption of innocence.
    • Admin
      Admin about 9 years
      @Miline The previous question has answers that cover both cases, so there's no need for a new question.
  • Admin
    Admin about 9 years
    very helpful info that "file header reader", i read the headers of a jpg , gif and png file with the file command and all had the word "image" in them, does this mean that if i exchanged "...| grep JPEG" with "...| grep image" that all images regardless of format would be found ?
  • Admin
    Admin about 9 years
    @AbdulAlHazred: Yes, it means that. grep is a tool that filters lines of text that contain a certain substring. If you grep JPEG some text, you'll get only the lines containing "JPEG". If you grep image some text, you'll get only the lines containing "image".
  • Admin
    Admin about 9 years
    Not all image formats. BPM for example is an exception, an you shall find something like: PC bitmap, Windows 3.x format, 3264 x 2448 x 24. You will get almost all image format headers this way, but, you will have to deal with the black sheep, as the .bmp format has shown ;)
  • Admin
    Admin about 9 years
    Fixed. Greping JPEG image data should do the trick
  • Admin
    Admin about 9 years
    using file -i prints MIME type like image/jpeg. That's easier and more reliable to grep since mime type are guaranteed to never change. Also, it's easy to list all/any images format. Example to list all JPEG images: find /home/dir/example -type f -exec file -i {} \; | grep ': image/jpeg\>'
  • Admin
    Admin about 9 years
    @FranklinPiat - that's better, indeed, but you're still left with a couple of problems either way because you're grepping find+file output: grep will not give you the expected output for filenames containing newlines; you'll also have to consider the fact that paths could be e.g. /dir/: image/jpeg/etc; assuming no such filenames, you still have to parse grep output again to get just the filenames.
  • Admin
    Admin about 9 years
    Filenames with newline is not a common practice(not justifying my mistake), and should be hell to manage this. People with common sense will not do this kind of bizarre thing. However, i agree with the grep thing and i modified my answer cause i agree with your directory point of view. Take a look at it :)
  • Admin
    Admin about 9 years
    nwilder, OK but your output still doesn't list just the filenames. I'm not trying to be an ***, I'm just pointing out the weak link in your answer - which is parsing the output of find+file (I expect other people here to do the same when they spot possible problems in my answers). And no, managing filenames with newlines (or other funky chars) isn't hell but you'll have to take a different approach.
  • Admin
    Admin over 7 years
    I strongly recommend use of -exec command {} + instead of -exec command {} \; so as to only call file once (instead of once for every file). It will save a lot of time. You may want to combine this with file's --no-pad option. The completed command would look something like this: find -type f -exec file --no-pad --mime-type {} + | awk '$NF == "image/jpeg" {$NF=""; sub(": $", ""); print}'. This only prints filenames only, does not include any extraneous output per request of @don_crissti