How to find all JPG files on the file system when .jpg extension is not obligatory ?
If you want to crawl on dirs and subdirs:
find /home/place/to/crawl -type f -exec file --mime-type {} \; | awk '{if ($NF == "image/jpeg") print $0 }'
What it does?
- Search all inodes with the type
file
- Execute the command file, to get a jpeg header of the file like:
image/jpeg
awk
Edit: Added @Franklin tip, to use file with -i
to use the mime string standard while outputing filetypes. This will reduce the false positives of the jpeg word.
Edit2: Added @don_crissti tip. Filtering now just the last column with awk
and printing the whole line if matches with image/jpeg
. Changed the file
switch to --mime-type
to suppress charset information
Related videos on Youtube
Abdul Al Hazred
Updated on November 28, 2022Comments
-
Abdul Al Hazred over 1 year
First thing I noticed when switched from Windows to Linux was, that Linux has no strict naming convention and no obligatory file name extensions like .bmp, .jpg, .exe etc. Therefore I can not tell by the file name itself its file format.
If all JPEG files on my file system had the .jpg extension, I could simply find all JPEG files by:
find / -type f -name "*.jpg"
But if it is not the case I am clueless how to find all JPEG files.
-
Admin about 9 yearsIsn't this the same question as this one ?
-
Admin about 9 yearsNo its not the same one. Here he is asking about the cases where the files have NO EXTENSION OF JPG
-
Admin about 9 years@Miline - the other question (same poster) asks " for a method to find only JPEG files" it doesn't say anything about the files actually having the extension
.jpg
so it's pretty much the same question. -
Admin about 9 yearsI personally think that this question lies the focus on the search for content than name, nwildner gave a very good answer.
-
Admin about 9 years@AbdulAlHazred - you already have an answer there that focuses on "content" (if you actually bothered reading it).
-
Admin about 9 yearsfrankly speaking, i know realize (after learning some linux here), that my wording of the question was bad, by the way its focus is different from this even if the answer is contained in the other as a subset of a wider explanation.
-
Admin about 9 years@AbdulAlHazred - you prolly haven't read my replies so have it your way.
-
Admin about 9 yearsI cherish the idea of presumption of innocence.
-
Admin about 9 years@Miline The previous question has answers that cover both cases, so there's no need for a new question.
-
-
Admin about 9 yearsvery helpful info that "file header reader", i read the headers of a jpg , gif and png file with the file command and all had the word "image" in them, does this mean that if i exchanged "...| grep JPEG" with "...| grep image" that all images regardless of format would be found ?
-
Admin about 9 years@AbdulAlHazred: Yes, it means that.
grep
is a tool that filters lines of text that contain a certain substring. If yougrep JPEG
some text, you'll get only the lines containing "JPEG". If yougrep image
some text, you'll get only the lines containing "image". -
Admin about 9 yearsNot all image formats. BPM for example is an exception, an you shall find something like:
PC bitmap, Windows 3.x format, 3264 x 2448 x 24
. You will get almost all image format headers this way, but, you will have to deal with the black sheep, as the .bmp format has shown ;) -
Admin about 9 yearsFixed. Greping
JPEG image data
should do the trick -
Admin about 9 yearsusing
file -i
prints MIME type likeimage/jpeg
. That's easier and more reliable to grep since mime type are guaranteed to never change. Also, it's easy to list all/any images format. Example to list all JPEG images:find /home/dir/example -type f -exec file -i {} \; | grep ': image/jpeg\>'
-
Admin about 9 years@FranklinPiat - that's better, indeed, but you're still left with a couple of problems either way because you're grepping
find+file
output:grep
will not give you the expected output for filenames containing newlines; you'll also have to consider the fact that paths could be e.g./dir/: image/jpeg/etc
; assuming no such filenames, you still have to parsegrep
output again to get just the filenames. -
Admin about 9 yearsFilenames with newline is not a common practice(not justifying my mistake), and should be hell to manage this. People with common sense will not do this kind of bizarre thing. However, i agree with the grep thing and i modified my answer cause i agree with your directory point of view. Take a look at it :)
-
Admin about 9 yearsnwilder, OK but your output still doesn't list just the filenames. I'm not trying to be an ***, I'm just pointing out the weak link in your answer - which is parsing the output of
find
+file
(I expect other people here to do the same when they spot possible problems in my answers). And no, managing filenames with newlines (or other funky chars) isn't hell but you'll have to take a different approach. -
Admin over 7 yearsI strongly recommend use of
-exec command {} +
instead of-exec command {} \;
so as to only callfile
once (instead of once for every file). It will save a lot of time. You may want to combine this withfile
's--no-pad
option. The completed command would look something like this:find -type f -exec file --no-pad --mime-type {} + | awk '$NF == "image/jpeg" {$NF=""; sub(": $", ""); print}'
. This only prints filenames only, does not include any extraneous output per request of @don_crissti