If you need to extract information from a large number of images on disk (and you’re using a *nix system), you could do worse than using find
with Imagemagick’s command line tools.
If you’re unfamiliar with find
, I’d recommend reading the beginners guide on Linux.ie. It has terse and initially daunting syntax, but is one of the most powerful tools available to *nix users and proficiency with it is massively useful, especially for sysadmins and developers.
Here’s how you’d go about finding all jpg, gif, png and bmp images in a directory, excluding anything in a “thumbs” directory, getting their dimensions, compression type and filesize, separate each piece of information with a comma and writing it our to a file:
find . -path "*/thumbs/*" -prune -o -type f \(\ -iname "*.jp*g" -o -iname "*.gif" -o -iname "*.png" -o -iname "*.bmp" \)\ -exec identify -format "%i,%wx%h,%m,%[size]\n" {} + > /tmp/images.info |
Broken down:
find . |
Searches in the current directory (.) – you can specify a path just as easily (find /path/to/directory/
)
-path "*/thumbs/*" -prune |
Exclude (prune) paths that match the preceding pattern. You can specify this multiple times (or not at all).
-o |
This is the OR
operator. AND
is implied between each modifier if left out.
-type f |
Specifies that we’re looking for a file (a directory would be -type d
)
\(\ -iname "*.jp*g" -o -iname "*.gif" -o -iname "*.png" -o -iname "*.bmp" \)\ |
(
opens a group, )
closes it. The backslashes escape the parentheses and newline (I’ve just used the newline to make it more readable). The -iname
directive specifies a case-insensitive filename, in this case matching file extensions. The usage of the -o
operator is more obvious here, as without it we’d be asking that each file match .jpg AND .png AND .gif – which wouldn’t really work.
-exec ... {} + |
This executes a command on each item found, the “current” found item being contained in the {}
placeholder. +
is the terminator in this case. \;
can also be used (again, backslash as escape), but the + terminator batches results and performs much better with large numbers of files. This is roughly equivalent to piping into xargs
on older systems which may not have the + terminator available (pre-2005 builds).
identify -format "%i,%wx%h,%m,%[size]\n" |
In this case, the command we’re executing is Imagemagick’s identify
tool. There’s quite a lot of information available here, it’s prudent to use the -format
option to limit the information to what you need. Helpfully, there’s a list of escape characters to let you know what can be extracted.
Here, I’m getting the file path (%i
), the width(%w
), the height (%h
) and putting in a literal ‘x
‘ to separate them. After that, there’s the compression type (%m
) and the filesize in KB (%[size]
). I separate each value with a literal comma and ending each line with a newline
(\n
).
> /tmp/images.info |
Finally, rather than output this information to the screen (by default), we direct the output into a file in the tmp
directory. If there are a lot of files to process, you won’t immediately see data start to pour in here, as it’ll be batched using the + terminator mentioned before. You’ll probably see it populate in lumps of several thousand.
You should get a file containing results that look something like this:
./images/3tm9wzz4z9kzd51168cef0a9cc77ca616916128aaa3d.JPG,640x480,JPEG,22.8KB ./images/226te3jc3m85519d6348418bdde11ee08d77ffd338ff.JPG,626x639,JPEG,44.6KB ./images/2s9262f4uix2e26113b8007a2a3dfadb6aa3fa7aa0ee.JPG,384x288,JPEG,36.6KB ./images/3572wcuya3pi3fb0f68eff3d6104a7b94d5725b2b526.jpg,480x640,JPEG,50.9KB ./images/5wby49rxay9lcc890e914b4d52e9909700f8d5227bb9.jpg,354x142,JPEG,11.9KB ./images/1c6cf3icti8v9c2b997592c0c7c51c25e900969eaec4.JPG,478x640,JPEG,41.4KB ./images/53h1y0x1q37q22d65cc682f6d7994db2510cab013ddf.JPG,478x640,JPEG,28.1KB ./images/4r8ck3kn1ezi809f7d4a63c0fb95b4f07053641bd8d3.JPG,478x640,JPEG,33.5KB ./images/156m118zdn7n4a10fef7d6c88067482f0803db2837e6.JPG,478x640,JPEG,25.5KB |
If you spot any typos, mistakes or ways you think this might be improved, feel free to let me know.