Steve on Image Processing

September 16th, 2007

How many images can fit in a TIFF file?

Note added July 30, 2008: See this post for an update.

In a comment on my R2007b post last week, Vincent wanted to know why imread and imwrite are slow when dealing with TIFF files containing tens of thousands of images. We have been hearing from other customers recently about the need to work with such TIFF files.

TIFF files can store an unlimited number of separate images. Each image has an IFD, or Image File Directory, that records where that image's data and metadata is stored in the file. The interesting thing is that the image data and the IFDs can be stored anywhere in the file, in any order. The first few bytes of the file tell you the location of the first image's IFD. The first image's IFD tells you the location of the 2nd IFD, and so on, in the fashion of a linked list.

So to find the k-th IFD, you have to find and read the 1st IFD, then the 2nd, all the way up to the (k-1)-st IFD.

When you call imread to read in the k-th image, the function only returns the image data. It does not return any information about the IFD locations. That means that when you call imread again to read in the (k+1)-st image, it has to start over, finding and reading all the IFDs from the first one.

Unfortunately, this scheme doesn't scale well (it's order N2), so it can be slow to read tens of thousands of images from the same file.

There's a similar problem with imwrite. When using the 'WriteMode','append' option, imwrite is designed to append a new image to an existing, complete, valid TIFF file. The file may have been written in another session of MATLAB, or by another application completely. To append correctly, you have to find the last IFD, which unfortunately means that you have to search through all the IFDs. Each call to imwrite can't take advantage of what the previous calls learned about where the IFDs are.

I believe that improving performance for this kind of TIFF work flow requires new syntax designs, or possibly new functions. To help us prioritize and then design such work well, we'd like to hear more about your work flow that involves such TIFF files. Where do they come from? Why are you using TIFF and not some other format? Do you need to read them, create them, or both? Do you know before you begin writing how many images you'll be adding to the file? Do you need the output file to be a valid TIFF file after every step, or can you tolerate a final "close" operation? Do you continue adding images to the file in subsequent MATLAB sessions? I've seen a tool that takes a directory full of single-image TIFF files and merges them quickly into a multipage TIFF file. Would something like that be helpful to you?

Put your feedback here, and thanks for taking the time.

49 Responses to “How many images can fit in a TIFF file?”

  1. Daphne replied on :

    Hi Steve,
    A specific application: We use multi-page Tifs to save video-like data captured from a 16 bit CCD camera. We usually know how many images there will be, depends on the frame rate and time of capture. My applications only call for 100’s of images per stack. However, we still feel the slow-down.
    We don’t work with AVI or another video format to maintain the 16bit info.
    Daphne

  2. David Schoppik replied on :

    I work with images that come off a confocal microscope. Most high-end digital cameras that are attached to microscopes can take images at high (12/16) bit depth and use the stack to store 3D data (i.e. each image is taken at a different depth). Almost every image capture program can export to a multi-page TIFF in a lossless high-bit-depth format.
    As microscopes improve and allow better resolution in the Z plane, I anticipate having TIFF stacks with ever-increasing numbers of files. I hope that Mathworks finds a way to deal with these files in an efficient manner. Many thanks for asking, and keep up the fine work!

  3. evan replied on :

    Funny, I am in the same boat doing time-lapse microscopy using a ccd camera. Maybe it’s a theme, eh? Our lab has a programmer, and he has created custom code to save files in a binary format, which is very fast to read and write, and can be read from matlab too. Of course, it is proprietary code, so we can’t use it everywhere. It would be nice to have TIFF read/write that are as fast as possible for imaging applications like these. Thanks.

  4. Steve replied on :

    Daphne, David, and Evan—Thanks!

  5. gianni replied on :

    hi STEVE,
    computed tomography (CT) is one tipical application that requires a fast imread.
    In CT you read-in say 720 (or evan more) 2D projections in the same tif format and each time imread has to start over. If the kst call to imread can use the parameters of the previous (k-1)st one can speed up the reading task

  6. Steve replied on :

    Gianni—Thanks for your input.

  7. Steve replied on :

    Vincent described his use case in a comp.soft-sys.matlab posting. Here’s what he said:

    “Here is some detail about our use case. We’re using MATLAB to acquire and analyze large sequences of fluorescence microscopic images. We typically read blocks of thousands images, do some processing on them, then write the result back to the HD. TIFF stacks are standard for this application but we can’t use them because of the N2 behavior described before. We are therefore bound to reading / writing the images to separate TIFF files which is not efficient and very demanding on the file system.

    As for possible solutions, modifying imread and imwrite to provide access to blocks of images as opposed to single images (through multidimensional arrays) would get us a long way.”

  8. Rob replied on :

    Steve,

    We only really use TIFF stacks for intermediate files when doing some type of quantitative analysis on the image, usually relating intensity to the amount of a stain present, etc. Output and presentation materials are then best saved as JPGs.

    Accessing and computing on larger TIFF stacks can get slow, so I second the suggestion of using a version or switch of imread/imwrite that keeps track of the kst image and need only access the k-1 or k+1 (most recently accessed image) to operate on it. Maybe some type of temporary FAT for the TIFF needs to be loaded into memory while it’s being worked on? Just spit-balling here.

    Hope this helps,
    Rob

  9. Steve replied on :

    Rob—Thanks.

  10. Thomas replied on :

    Steve,
    I use tens of thousands of tif files in a single stack and need to do some heavy image analysis on them. as the individual tif files are pretty large, typically 512^2 or 1024^2, I read one at a time, process it and move on to the next. Takes forever. I wrote the same code in ImageJ and it speeds up the process by a factor of 50.
    Thanks for your help in rewriting those functions!
    Let me know if you want to see my Matlab and ImageJ codes for comparison.

  11. Steve replied on :

    Thomas—Did you mean tens of thousands of files, or tens of thousands of images in a single file? Also, did you do any profiling in MATLAB to see which step in your processing appeared to be the bottleneck?

  12. Arup replied on :

    Hi,

    I had a question on being able to read video captured by minidv into Matlab with lossless compression.
    But I have to give you some background info first. I’m capturing video minidv (.avi) from a Sony Digital Camera using Adobe Premiere. I need to read this into Matlab. Matlab is unable to read the compressed .avi so I am totally uncompressing the video. Then I am converting the uncompressed video frames using Matlab into image frames and writing them into Multipage Tiffs. I’m then compressing the Multipage Tiff using IrfanView.

    Firstly, is there a simpler way to go from minidv to Matlab (without lossy compression)? For example, is there any code/procedure in your repository that does this? Secondly, Matlab is unable to Mutlipage tiff with ZIP compression, although the imread function says that ‘Deflate’ standard is allowed. Matlab is reading in both uncompressed and LZW compressed tiff though.

    I’d be grateful if you had any specific or general suggestions.

    Arup

  13. Tim replied on :

    Steve,
    I use multipage TIFFs totaling 130000 images acquired from a high-speed CCD. I would like to make use of all the images in MATLAB, but currently am forced to use only the first few thousand out of time expense. I think a modification to imread to track IFDs would be extremely handy. Also, would this potentially be applicable to parallel processing? Thanks

  14. Steve replied on :

    Arup—You might want to consider upgrading to the R2007b release. Look at the new mmreader functionality - you may find that it can read your compressed AVI files. Also, imread in the new release is able to read more TIFF compression formats.

  15. Steve replied on :

    Tim—Thanks for the information about your images. I imagine this might indeed be a good application for parallel processing.

  16. Matt Kitching replied on :

    Hi Steve,
    I am processing synchrotron xray tomography images into image cubes. They are 4096 by 4096 pixels by 16 bit depth greyscale TIFFs. There are generally about 3072 slices in each stack. The work requires that the stacks get processed to view 3D features, and I am hoping to be able to eventually do some automated feature partitioning and volumetric rendering. Problems involve not wishing to lose information that will be required for edge detection and partitioning, which leads to slow load time, and major memory issues. 64 bit operating system has helped with memory issues but any real-time viewing is still hindered by image read times. A multipage tiff builder would be usefull but it would need to handle 100 Gb final files sizes.
    Matt

  17. Steve replied on :

    Matt—Interesting. Thanks for the information. How big are your files? The TIFF format is limited to 4 GB, and your description seems to imply a larger file than that. There is a group working on a variation called BigTIFF that will eliminate this limitation.

  18. Casper Coetzer replied on :

    We need the single tiff file with multiple images in the file input to similink. We have concluded by using other image “hand” processing software that we need to create a number of new image algorithms for electronic FPGA hardware for 16 bit values of which only certain number of bits (8/12/14) could be used. We need the multiple tiff read to test the new algoritms in Simulink as well as doing hardware in the loop development. The unit must be part of Video imaging and process block set in simulink forming part of the Sources and obviously read the file. I suggest it be part of the Sources : multimedia file block so that we can use it to develop our new algorithm. The algorithms rely heavily on statistical analysis, level settings and windowing techiniques. The output could be of similar file format or onscreen (PC) that already exsist in the sinks so that we could verify our algorithms.

  19. Steve replied on :

    Thanks for your input, Caspar. I’ll forward it to the Video and Image Processing Blockset team.

  20. Apostolos replied on :

    I need to be able to open a single TIFF file that contains a few thousand images that come from a very hi-speed digital camera. Initially, the camera saves the pictures in a proprietary (I think) format “.cin”. The only reasonable conversion option from the camera’s s/w is to a multipage TIFF file.

    Once the images are manipulated in MATLAB (w/ or w/o the Image Processing Toolbox) I need to be able to save them quickly into one new multi-page TIFF file.

    Any assistance in making this process fast would be appreciated.

  21. Shalin replied on :

    Steve,
    Interesting post on how TIFF format works. I work with microscopy images. I like to use TIFF because it preserves data and metadata. Our acquisition software on some microscopes is naive and cannot produce multi-slice TIFFs. We acquire individual TIFFs and name them sequentially. They need to be combined in 3D stack(I think imread based loop does the job well). I notice that when writing files by imwrite takes very long. So it will be good if there is a small utility function that can read all files and combine it in multipage TIFF very fast and allows use of regexp to specify the order of files to be combined. Another suggestion in same direction, we some times acquire multi-dimensional data - dimensions being depth, time, channels, positions etc. Acquisition software can be asked to make up filenames so that ‘co-ordinate’ of acquired files can be inferred in this multi-dimensional space. Can TIFF allow representation of such data efficiently (it will be multi-multi-multi page TIFF I suppose)? It will be good to have an extension to above utility that can produce such files quickly from the individual slices.

  22. Steve replied on :

    Apostolos—Thanks for your comments. Are you generating the multipage output TIFF using MATLAB now? If so, how long does it take, for how many images, and for what size images? And what is your expectation for “fast”?

  23. Steve replied on :

    Shalin—Thanks. Can you be more specific about “very long”? How long, for how many images, and for what size images? What is your expectation for speed?

    It doesn’t sound to me like TIFF is a good choice for your multidimensional application. There is only one natural dimension for relationships between images in a TIFF file. Any other meaning you would have to overlay yourself, and I doubt it would be efficient. You might consider using HDF5.

  24. Apostolos replied on :

    Steve–Yes, I generate the multipage output TIFF using MATLAB. The number of images varies between 1,000 and 2,500. Each image size is typically 512×1024 (8-bit gray), but can be as big as 1024×1024 (8-bit gray).

    In terms of reading the multipage TIFF, I have noticed considerable degradation as the page index increases. This is in tandem with significant disk activity. Not sure if the disk activity relates to searching inside the TIFF file or MATLAB accessing/swapping in virtual memory (i.e. hard drive).

    In my opinion significant improvements could be achieved if the imread function had the option of retrieving multiple pages of a single TIFF file with one call.

    For example, suppose one wanted to import pages 10, 12 and 15 from a multipage TIFF. Now the only way I know to do this is as follows:
    k=0; for i=[10 12 15]; k=k+1; xx(:,:,k) = imread(’fname’,i); end

    This is a very suboptimal way to get the data due to the lack of a master offset index at the beginning of the TIFF file that would have allowed for a quick jump to the appropriate page. In the above example, the file is opened and traversed to page 10 three times. Instead, a better way would be to open the file once; traverse the file down to the first requested page; acquire the page’s data; then scroll further down until the next requested page and acquire its data; and so on and so forth.

    In the proposed scheme the imread command could look like this: xx = imread(’fname’,[10 12 15]);

    Ditto for imwrite.

  25. Steve replied on :

    Apostolos—Thanks for the additional information and suggestions.

  26. Damodar replied on :

    Hi Steve;

    i would like to convert integer(4byte) to unsigned short integer (2 byte) when i write Tiff images in to arbitary image format, (e.g .ArbitaryImage, .ArbitaryHeader); is there any way to convert integer(4byte) to unsigned short integer (2 byte) in matlab?

    Happy new year 2008.

    damodar

  27. Steve replied on :

    Damodar—Scale your four-byte integer values as desired, and then use the uint16 function.

  28. Steve replied on :

    Is there any graceful was to loop through image stacks containing an unknown number of images. Is there a command I could use to get the number of images in a tif stack? Is there a way to jump from node to node in the tif linked list without loading the whole image?

    Right now I can just keep incrementing a counter in the imread call until it goes out of range. But I’d rather know the size of the stack going into the loop.

  29. Steve replied on :

    Steve—You can use imfinfo to get the number of images in a TIFF file:

    info = imfinfo('mystack.tif');
    num_images = numel(info);
    
  30. Andreas Engler replied on :

    We also have several high quality cameras and even APDs and PMTs providing a 16bit signal. This information is sometimes a sequence run with several hundreds of images or a 3D stack. As we use a inhouse programmed software for analysation tiff is the format of our choice as every program in the lab can read it. As the 3D stack is out of a 4pi-microscope the no of images can increase to say 500. It would be a great improve to have a function read in the first image, then a function reading in the next image becoming only the pointer from the first function and so on.

  31. Steve replied on :

    Andreas—Thanks for your input. For TIFF files containing “only” 500 images, though, I’m not sure it would make that much difference.

  32. Andrew Carter replied on :

    Hi Steve
    A couple of points relating to these posts which I thought might be helpful.

    1) Dave Wu (Caltech) found for long files that a try/catch code was much faster than using iminfo as you can first find out how long the file is in multiples of 100, then 10 etc. See example below:

    %Work out stack length
    try
    for i=100:100:1000000
    imread(filel,’tif’,i);
    end
    catch
    try
    for j=i-100:10:i
    imread(filel,’tif’,j);
    end
    catch
    try
    for k=j-10:1:j
    imread(filel,’tif’,k);
    end
    catch
    stacklength=k-1;
    end
    end
    end

    2) We are trying to read in up to 10000 images for particle tracking purposes and did some tests with imread and another function I found on a discussion page (See below).

    tf = imformats(’tif’);
    for i=1:5000
    A = feval(handles.tf.read, filename, i);
    end

    When I compared the time these functions took to run through different length tif stacks it became apparent that both slowed considerably with longer tifstacks (as reported … a faster bit of code would be great…), but that the feval based code was always faster:

    No Images imread feval
    (sec) (sec)
    500 1.4 0.7
    1000 3.8 2.4
    2000 13.2 9.5
    5000 66.2 58.3
    10000 247.0 234.0

    3) In the manual it says that using the ‘PixelRegion’ modifier to the imread command increases speed. I tested this with 5000 images and it was just as fast to read 1 pixel as it was the whole image. Do you see the same thing or am I doing something wrong?

  33. Steve replied on :

    Andrew—Thanks for your comments. 1) The use of imread in this way is clever. We plan, though, to improve the speed of imfinfo for TIFFs so that such cleverness will no longer be necessary. 2) The simple explanation is that imread calls the same function you are calling in the feval. imread has to do some extra work to verify that the file format before trying to read it, and that extra work takes time. 3) What version of MATLAB are you using? In versions before R2007b (I think), using the ‘PixelRegion’ modifier was not faster. Also, I would only expect it to be significantly faster when reading a subset of an extremely large image.

  34. Andreas replied on :

    Hi Steve,

    it’s been about 6 months since your initial post. Is there any hope for a new implementation that will overcome the performance issues anytime soon? In my experience, the imwrite problem is severely aggravated when writing to network shares.

  35. Steve replied on :

    Andreas—We have nothing to offer yet.

  36. Arya replied on :

    Hi Steve,

    I use these kinds of images a lot and after figuring that matlab is taking too long to read them (around 5000 images in each file) I hacked the c code to read 100 (or n) images at a time. This increased performance tremendously. Thus you can perhaps modify imread for tiff to read a variable number of images at a time.
    I also needed some code to determine the number of directories and wrote a simple code for that as well. This code works a little faster than counting the length of imfinfo and perhaps should be an option of a future version of imfinfo?

    Thanks
    Arya

  37. Steve replied on :

    Arya—Good ideas. We’ve discussed both issues here during our spec reviews. The first idea, reading in N images at a time, does help but doesn’t completely solve the performance problem. Each block of images you read would still need to start over at the beginning of the file. Also, there’s a difficult functional spec question to answer: Different images in the same TIFF file are completely independent. They do not have to have the same size, bitdepth, samples per pixel, or even image type. So how should multiple images be returned—as a cell array? What if there’s just one? You quickly get into behavior discontinuities, doc difficulties, and ease-of-use issues.

    Regarding counting the directories—we’ve put some work into speeding up imfinfo, but it might make sense to provide a specialized function. I’m coming around to the view that the use (and abuse) of TIFF files may call for a more extensive set of functionality than can be provided by imread, imwrite, and imfinfo alone.

  38. learner replied on :

    Hi..

    I have a database of similar (.pgm) images.How do I combine these images into a single generic one and then retrieve individual ones from this when required(could be in dft domain also)?

  39. Petr Strnad replied on :

    I am reading BigTIFF produced by Olympus microscope (>10GB files). I had to write my own BigTIFF reader. :-(

  40. Steve replied on :

    Petr—Thanks for the information.

  41. Steve replied on :

    Learner—Although the PGM format supports multiple images per file, the MATLAB functions imread and imwrite do not. I suggest that you consider using TIFF instead. Also, I can’t think of any particular reason for using the DFT in the process of storing multiple images.

  42. Sathya replied on :

    Hello,
    I have an avi video, from which I can read every 50th frame. I want to save these frames as a single tif file. Is that possible? How can I do that?

  43. Steve replied on :

    Sathya—Use imwrite with the ‘WriteMode’, ‘append’ option.

  44. Sathya replied on :

    Thanks for the reply.
    I have an avi file with more than 10000 frames. I can write each frame as a single tif file(overwrite next time) and process image and find length of an object in the image. (I need to find length of object for all the 10000 frames and plot a graph using the result).But for the 100 frames itself, it takes much time. I tried for 250 frames too. Not dare enough to try 10000 frames. Is there anyway to do that in minimum time?

  45. Steve replied on :

    Sathya—You’ve given no details about what kind of processing you are doing, so I can’t even begin to guess what the issue might be. I suggest that you use the MATLAB Profiler to analyze the performance of your code and see where the bottlenecks are.

  46. jurgen replied on :

    Hello Steve,
    I read your post rather accidently while I was searching fro a method to read meta data from tif files but :
    you wrote ” I’ve seen a tool that takes a directory full of single-image TIFF files and merges them quickly into a multipage TIFF file. Would something like that be helpful to you? ” I would be interested in the tool since we work with highspeed images of spreading processes, and then we could structure all images of one throw.

    kind regard, and thanks fro the answer

  47. Steve replied on :

    Jurgen—Thanks for your input.

  48. Eran Mukamel replied on :

    Following up on Andrew Carter’s suggestion, here is a function that returns the number of frames in a tiff stack:

    
    function j = tiff_frames(fn)
    %
    % n = tiff_frames(filename)
    %
    % Returns the number of slices in a TIFF stack.
    %
    %
    
    status = 1; j=0;
    jstep = 10^3;
    while status
        try
            j=j+jstep;
            imread(fn,j);
        catch
            if jstep<>1
                j=j-jstep;
                jstep = jstep/10;
            else
                j=j-1;
                status = 0;
            end
        end
    end
    
  49. Steve replied on :

    Eran—Thanks. I corrected your syntax; you had <> instead of ~=.

Leave a Reply

Wrap code fragments inside <pre> tags, like this:

<pre class="code">
a = magic(3);
sum(a)
</pre>

If you have a "<" character in your code, either follow it with a space or replace it with "&lt;" (including the semicolon).


Steve Eddins manages the Image & Geospatial development team at The MathWorks and coauthored Digital Image Processing Using MATLAB. He writes here about image processing concepts, algorithm implementations, and MATLAB.

  • Sana: hi steve, could you explain to me how i would be able to use the dir function, to do a loop through a directory...
  • Nishtha: Sir, I have preprocessed the image in following steps: [1] adaptive histogram equalization [2] thresholding...
  • Kristof: I also strongly support the idea. I have just recently bumped into the problem that im2single was not...
  • Steve: David—I’ m glad you found it useful!
  • David Lalejini: I found your example very useful for finding connected nodes in a large set of input pairs. I start...
  • tommy: Dear Steve, I have a question,please if you are kind to help me regarding the accumulator array dimensions of...
  • Steve: Abc—I don’t know how to distinguish the faces. You might try posting your question in the MATLAB...
  • Manju: well if we have a few ovals within each other like in a cell how do we measure the distance from the center...
  • Steve: Manju—What do you mean? How is each region defined?
  • Manju: if we have 2-3 regions within each other how do we measure the regions of each one?

These postings are the author's and don't necessarily represent the opinions of The MathWorks.