Bash: Fastest way of determining dimensions of image from URL

5,576

Solution 1

As you note, you don't need the whole ImageMagick package. You just need identify.

You will also need the libraries the executable links to (and the libraries those libraries link to).

> whereis identify
identify: /bin/identify /usr/bin/identify /usr/share/man/man1/identify.1.gz
> ldd /bin/identify

ldd will show a list. When I did this, it included some X libs, libjpeg, etc. and two libraries clearly from the ImageMagick package, libMagickCore and libMagickWand. Those look to be linked to the same bunch of things, so if you have that, identify should work.

You don't have to download an entire image in order to get the dimensions, because these are in a header at the beginning of the file and that's what identify looks at. For example, here I'm copying the first 4 kB from a complete jpeg into a new file:

dd if=real.jpg of=test.jpg bs=1024 count=4

4 kB should be more than enough to include the header -- I'm sure you could do it with 1/4 that amount. Now:

>identify test.jpg 
test.jpg JPEG 893x558 893x558+0+0 8-bit DirectClass 4.1KB 0.000u 0:00.000

Those are the correct dimensions for real.jpg. Notice, however, that the size (4.1KB) is the size of the truncated file, since that information is not from the image header.

So: you only have to download the first kilobyte or so of each image.

Solution 2

You can use curl to download parts of the image. It all depends on how robust it has to be. A test-case could be first 500 bytes. Seems to work for a lot of png and jpg, then use identifyor the like to check the size.

curl -o 500-peek -r0-500 "http://example.net/some-image.png"

Edit:


Long time since I wrote image parsers, but gave it some thought and refreshed some of my memory.

I suspect that it is all kind of images you want to check (but then again, perhaps not). I'll describe some of the more common ones: PNG, JPEG (JFIF) and GIF.


PNG:

These are simple when it comes to extraction of size. A png header stores the size within the first 24 bytes. First comes a fixed header:

byte  value  description
   0  0x89   Bit-check. 0x89 has bit 7 set.
 1-3  PNG    The letters P,N and G
 4-5  \r\n   Newline check.
   6    ^z   MS-DOS won't print data beyond this using `print`
   7    \n   *nix newline.

Next comes chunks trough out the file. They consist of a fixed field of length, type and checksum. In addition an optional data section of length size.

Luckily the first chunk is always an IHDR with this layout:

byte  description
0-3   Image Width
4-7   Image Height
  8   Bits per sample or per palette index
...   ...

By this we have that sizes are byte 16-20, and 21-24. You can dump the data by e.g. hexdump:

hexdump -vn29 -e '"Bit-test: " /1 "%02x" "\n" "Magic   : " 3/1 "%_c" "\n" "DOS-EOL : " 2/1 "%02x" "\n" "DOS-EOF : " /1 "%02x" "\n" "NIX-EOL : " /1 "%02x" "\n" "Chunk Size: " 4/1 "%02u" "\n" "Chunk-type: " 4/1 "%_c" "\n" "Img-Width : " 4/1 "%02x" "\n" "Img-Height: " 4/1 "%02x" "\n" /1 "Depth : %u bit" "\n" /1 "Color : %u" "\n" /1 "Compr.: %u" "\n" /1 "Filter: %u" "\n" /1 "Interl: %u" "\n"' sample.png

On a Big Endian/Motorola machine one could also print the sizes directly by:

hexdump -s16 -n8 -e '1/4 "%u" "\n"' sample.png

However, on Little Endian / Intel, it is not that easy, and it is nor very portable.

By this it is we could implement a bash + hexdump script as in:

png_hex='16/1 "%02x" " " 4/1 "%02x" " " 4/1 "%02x" "\n"'
png_valid="89504e470d0a1a0a0000000d49484452"

function png_wh()
{
    read -r chunk1 img_w img_h<<<$(hexdump -vn24 -e "$png_hex" "$1")
    if [[ "$chunk1" != "$png_valid" ]]; then
        printf "Not valid PNG: \`%s'\n" "$1" >&2
        return 1
    fi
    printf "%10ux%-10u\t%s\n" "0x$img_w" "0x$img_h" "$1"
    return 0
}

if [[ "$1" == "-v" ]]; then verbose=1; shift; fi

while [[ "$1" ]]; do png_wh "$1"; shift; done

But, this isn't directly efficient. Though it requires a bigger chunk (75-100 bytes), identify is rather faster. Or write the routine in e.g. C, which would be faster then library calls.


JPEG:

When it comes to jpg it isn't that easy. It also starts out with a signature header, but the size chunk isn't at a fixed offset. After the header:

 byte  value
 0-1   ffd8          SOI (Start Of Image)
 2-3   ffe0          JFIF marker
 4-5   <block-size>  Size of this block including this number
 6-10  JFIF\0        ...
11-12  <version>
   13  ...

a new block comes along specified by a two byte marker starting with 0xff. The one holding information about dimensions has the value 0xffc0 but can be buried quite a bit down the data.

In other words, one skip block-size bytes, check marker, skip block-size bytes, read marker, and so on until the correct one comes along.

When found the sizes are stored by two bytes each at offset 3 and 5 after marker.

 0-1   ffc0          SOF marker
 2-3   <block-size>  Size of this block including this number
   4   <bits>        Sample precision.
 5-6   <Y-size>      Height
 7-8   <X-size>      Width
   9   <components>  Three for color baseline, one for grayscale.

Wrote a simple C program to check some files and of about 10.000 jpg images, proximately 50% had the size information within the first 500 bytes, mostly 50% between ca. 100 and 200. The worst was around 80.000 bytes. A picture, as we talk pictures:

JFIF_SOF_graph


GIF:

Though gif typically can have multiple images stored within, it has a canvas size specified in the header, this is big enough to house the images. It is as easy as with PNG, and require even fever bytes: 10. After magic and version we find sizes. Example from a 364x472 image:

<byte>  <hex>   <value>
  0-2   474946  GIF  Magic
  3-5   383961  89a  Version (87a or 89a)
  6-7   6c01    364  Logical Screen Width
  8-9   d801    472  Logical Screen Height

In other words you can check the first six bytes to see if it is a gif, then read the next four for sizes.


Other formats:

Could have continued, but guess I stop here for now.

Solution 3

Assumes you have "identify". Put this in a script and chmod +x <scriptname>. To run it type <scriptname> picture.jpg and you will get the height and width of the image. The first 2 sections are to check if there is an image then set it as the IMAGE variable. The next section is to make sure the file is actually there. The last 2 sections are to take the relevant information from the 'identify' output and display it.

#!/bin/bash
if [[ "${#}" -ne "1" ]]
then
die "Usage: $0 <image>"
fi

IMAGE="${1}"

if [[ ! -f "${IMAGE}" ]]
then
die "File not found: ${IMAGE}"
fi

IMG_CHARS=`identify "$1" | cut -f 3 -d' '`
WIDTH=`echo $IMG_CHARS | cut -d'x' -f 1`
HEIGHT=`echo $IMG_CHARS | cut -d'x' -f 2`

echo -e "W: ${WIDTH} H: ${HEIGHT}"
Share:
5,576

Related videos on Youtube

exvance
Author by

exvance

Updated on September 18, 2022

Comments

  • exvance
    exvance almost 2 years

    I'm trying to figure out a really fast method in bash of determining an images dimensions.

    I know I could wget the image and then use imagemagick to determine the height and width of the image. I'm concerned that this may not be the fastest way of doing it.

    I'm also concerned with having to install imagemagick when I only need a very small subset of functionality. I'm on an embedded system that has very limited resources (CPU, RAM, storage).

    Any ideas?

    • Admin
      Admin over 10 years
      What image types do you need to support?
  • goldilocks
    goldilocks over 10 years
    file doesn't give dimensions for, e.g., .jpg files.
  • user2914606
    user2914606 over 10 years
    nice script. however, it'd be nice if you could explain what it does (since Stack Exchange is about learning).
  • peterph
    peterph over 10 years
    I'm not sure PHP is well suited for a low resources embedded systems. Plus this seems to fetch the whole file.
  • peterph
    peterph over 10 years
    Still it will load the whole PHP engine which is a memory hog. Plus a reasonable portion of PHP would have to be installed, which might be an issue for embedded system as well (disk space might be limited). For a regular system it might be an option, though you'd need to modify it to prevent fetching whole image (see Sukminder's answer).