Cache Your Nuts to Eat Later

著者 Pick of the Week Team, March 27, 2015

20 ビュー (過去 30 日間) | 0 いいね | 3 コメント

Greg's pick this week is cachedcall by Aslak Grinsted.

Call That Expensive Computation Once!
What Is Caching?
How Is It Done?
Doesn't MATLAB Cache Functions Already?
Should Every Function Call Be Cached?
Why Did I Choose This Entry?
Could You Use Function Caching?

Call That Expensive Computation Once!

Aslak has created an ingenious utility that stores the results to a function call, so that if the same function is called again with the same inputs, it will return the stored results instead of recomputing the answer.

               StandardCall    FirstCacheCall    SecondCacheCall    SpeedUp
               ____________    ______________    _______________    _______

    isprime       2.4633         2.4228          0.0036991           99.85 
    urlread    0.0048197        0.01007          0.0038378          20.372 
    sim         0.029784       0.034612          0.0038576          87.048

What Is Caching?

The definition is to store a resource for future use. In computing, it generally refers to storing data in memory in a fashion that it can be rapidly accessed at a later time.

In fact developing algorithms for implicitly creating caches is always one of those areas under research in development.

However, in this particular example, the creation of the cache is explicit.

How Is It Done?

The simplified version is, for unique instance of a function call, the results of that function call are stored in a file. A data table is developed to link the unique function calls to specific data files. When a function call is applied, CACHECALL determines if that particular function call situation has been performed before. If so, return the stored results, otherwise call the desired function and store the results for later use.

For me, the part I find most fascinating is Aslak is able to develop unique identifiers for each function call situation based on the function, inputs, classes, etc. He actually ends up leveraging another File Exchange entry DataHash by Jan Simon.

The DataHash uses MATLAB's inherent ability to call Java code to develop his hash tables and unique identifiers.

Doesn't MATLAB Cache Functions Already?

Yes, but generally not based on specific inputs. When a function is called the first time, if it hasn't been used yet in the current MATLAB session, then that function must be loaded into memory. Subsequent calls to the same function require less time because the function is already in memory, but it doesn't store the results of specific computations for future use.

The same thing applies if the function has been edited. It must be reloaded into memory the first time it is called.

Should Every Function Call Be Cached?

Probably not. There is a penalty for the caching process. If the resulting cached results are never used, then that penalty is wasted. Also caching involves writing to disc, which causes the slowdown in the following results.

                     StandardCall    FirstCacheCall    SecondCacheCall    SpeedUp
                     ____________    ______________    _______________    _______

    rand                 1.0061         19.194            4.3404          -331.4 
    triangulation    3.7253e-05      0.0066531         0.0051827          -13812

It probably only makes sense in cases where there are a limited number of input scenarios that will occur multiple times, and takes longer than writing to disc.

That's not always predictable, so it may require some profiling to determine if it's worth the cost of caching.

Why Did I Choose This Entry?

This is a clever method to speed up repetitive computations when there are a relatively finite number of possible inputs.

I'm impressed that Aslak was able to figure out how to identify unique function calls. When I saw this entry, it was one of those "I wish I had thought of that!" moments.

In addition, Aslak does a nice job of architecting the software and documenting his public interface to the caching features.