File Exchange Pick of the Week

Our best user submissions

Cache Your Nuts to Eat Later 3

Posted by Guest Picker,

Greg's pick this week is cachedcall by Aslak Grinsted.

Contents

Call That Expensive Computation Once!

Aslak has created an ingenious utility that stores the results to a function call, so that if the same function is called again with the same inputs, it will return the stored results instead of recomputing the answer.

               StandardCall    FirstCacheCall    SecondCacheCall    SpeedUp
               ____________    ______________    _______________    _______

    isprime       2.4633         2.4228          0.0036991           99.85 
    urlread    0.0048197        0.01007          0.0038378          20.372 
    sim         0.029784       0.034612          0.0038576          87.048 

What Is Caching?

The definition is to store a resource for future use. In computing, it generally refers to storing data in memory in a fashion that it can be rapidly accessed at a later time.

In fact developing algorithms for implicitly creating caches is always one of those areas under research in development.

However, in this particular example, the creation of the cache is explicit.

How Is It Done?

The simplified version is, for unique instance of a function call, the results of that function call are stored in a file. A data table is developed to link the unique function calls to specific data files. When a function call is applied, CACHECALL determines if that particular function call situation has been performed before. If so, return the stored results, otherwise call the desired function and store the results for later use.

For me, the part I find most fascinating is Aslak is able to develop unique identifiers for each function call situation based on the function, inputs, classes, etc. He actually ends up leveraging another File Exchange entry DataHash by Jan Simon.

The DataHash uses MATLAB's inherent ability to call Java code to develop his hash tables and unique identifiers.

Doesn't MATLAB Cache Functions Already?

Yes, but generally not based on specific inputs. When a function is called the first time, if it hasn't been used yet in the current MATLAB session, then that function must be loaded into memory. Subsequent calls to the same function require less time because the function is already in memory, but it doesn't store the results of specific computations for future use.

The same thing applies if the function has been edited. It must be reloaded into memory the first time it is called.

Should Every Function Call Be Cached?

Probably not. There is a penalty for the caching process. If the resulting cached results are never used, then that penalty is wasted. Also caching involves writing to disc, which causes the slowdown in the following results.

                     StandardCall    FirstCacheCall    SecondCacheCall    SpeedUp
                     ____________    ______________    _______________    _______

    rand                 1.0061         19.194            4.3404          -331.4 
    triangulation    3.7253e-05      0.0066531         0.0051827          -13812 

It probably only makes sense in cases where there are a limited number of input scenarios that will occur multiple times, and takes longer than writing to disc.

That's not always predictable, so it may require some profiling to determine if it's worth the cost of caching.

Why Did I Choose This Entry?

This is a clever method to speed up repetitive computations when there are a relatively finite number of possible inputs.

I'm impressed that Aslak was able to figure out how to identify unique function calls. When I saw this entry, it was one of those "I wish I had thought of that!" moments.

In addition, Aslak does a nice job of architecting the software and documenting his public interface to the caching features.

Could You Use Function Caching?

Have you tried this out? What do you think about it? Should this type of caching be available in MATLAB?

You can leave your comments here.


Get the MATLAB code

Published with MATLAB® 8.5

Note

Comments are closed.

3 CommentsOldest to Newest

Andrew Wind replied on : 2 of 3

We used a similar technique, storing results in memory rather than on disk when our simulation attempts numerous trials. Each specific piece is computationally expensive so we build a lookup table of results for the individual pieces. Then as the simulation tries various combinations of the pieces, we first check if each piece of the puzzle has been solved previously and stored in the lookup table or if we need to solve it for the first time now. We have the benefit of being able to fit all the results in memory and a large number of trials to yield the benefit. Note that we initialized the entire lookup table so that we could be wasting memory if every individual piece was not required at some point, whereas Aslek’s solution would appear to only use the disk space required to store the results actually used.

Aslak Grinsted replied on : 3 of 3

I wrote cachedcall to deal with this specific case: I am currently working on tracking glacier flow between succesive satellite images taken roughly two weeks apart. I have a network drive with a database with a terabyte of satellite images of glaciers (geotiff files). I load a small region of both scenes (using my FEX submission geoimread ) and then I use standard normalized cross correlation to track the ice motion (using ImGRAFT also on FEX). But the one thing that really annoyed me was that I kept having to load these big files over the network. -and so i wrote cachedcall. I use it thus:
[A,x,y,I]=cachedcall(@geoimread,{'Y:\landsat\... path to scene.TIF', roix, roiy})
It really saves me alot of time, -and it also helps me that i have some cached images on my drive if I want to work while commuting. And it does not take up as much space on my drive because i only store the region of interest in my cache compared to caching the entire scene.

I am considering expanding cachedcall so that it has the option of caching the results in a persistent variable also. But I am not going to do that before I need it for something.