How many images can fit in a TIFF file?

Posted by Steve Eddins, September 16, 2007

15 views (last 30 days) | 0 Likes | 51 comments

Note added July 30, 2008: See this post for an update.

In a comment on my R2007b post last week, Vincent wanted to know why imread and imwrite are slow when dealing with TIFF files containing tens of thousands of images. We have been hearing from other customers recently about the need to work with such TIFF files.

TIFF files can store an unlimited number of separate images. Each image has an IFD, or Image File Directory, that records where that image's data and metadata is stored in the file. The interesting thing is that the image data and the IFDs can be stored anywhere in the file, in any order. The first few bytes of the file tell you the location of the first image's IFD. The first image's IFD tells you the location of the 2nd IFD, and so on, in the fashion of a linked list.

So to find the k-th IFD, you have to find and read the 1st IFD, then the 2nd, all the way up to the (k-1)-st IFD.

When you call imread to read in the k-th image, the function only returns the image data. It does not return any information about the IFD locations. That means that when you call imread again to read in the (k+1)-st image, it has to start over, finding and reading all the IFDs from the first one.

Unfortunately, this scheme doesn't scale well (it's order N²), so it can be slow to read tens of thousands of images from the same file.

There's a similar problem with imwrite. When using the 'WriteMode','append' option, imwrite is designed to append a new image to an existing, complete, valid TIFF file. The file may have been written in another session of MATLAB, or by another application completely. To append correctly, you have to find the last IFD, which unfortunately means that you have to search through all the IFDs. Each call to imwrite can't take advantage of what the previous calls learned about where the IFDs are.

I believe that improving performance for this kind of TIFF work flow requires new syntax designs, or possibly new functions. To help us prioritize and then design such work well, we'd like to hear more about your work flow that involves such TIFF files. Where do they come from? Why are you using TIFF and not some other format? Do you need to read them, create them, or both? Do you know before you begin writing how many images you'll be adding to the file? Do you need the output file to be a valid TIFF file after every step, or can you tolerate a final "close" operation? Do you continue adding images to the file in subsequent MATLAB sessions? I've seen a tool that takes a directory full of single-image TIFF files and merges them quickly into a multipage TIFF file. Would something like that be helpful to you?

Put your feedback here, and thanks for taking the time.