Memoize Functions in MATLAB

저자 Loren Shure, July 5, 2018

32 회 조회 (최근 30일) | 0 좋아요 | 8 댓글

Very early on in this blog (2006!), I wrote a post on memoizing functions, i.e., caching results so outputs that have already been calculated don't need to be calculated again (code included at the end of this post). Memoization can provide a significant performance boost especially if the function in question is expensive to calculate, and is likely to have inputs repeated.

It wasn't until recently that I realized this functionality (memoize) was added to MATLAB in R2017a. Needless to say, the shipping function is different than the solution I presented over 10 years ago. And without the limitations that mine had (limited to elementwise functions that had a single input).

What is Memoization?
Let's Try It
Is That All?
Do You Use Memoization?
From Reference from My 2006 Post

What is Memoization?

The idea of memoization is to cache function results from specific inputs so if these same inputs are used again, the function can simply return the values computed earlier, without rerunning the computation. This can be useful if you have a function that is very expensive to compute.

Of course, if you run the memoized function a lot, it will take up increasing amounts of memory as unique inputs get added to the list, unless we do something to limit the cache size. That's what MATLAB does now with the function memoize.

Let's Try It

As in my earlier post, let's try something simple, the function sin.

fmem = memoize(@sin)

fmem = 
  MemoizedFunction with properties:

     Function: @sin
      Enabled: 1
    CacheSize: 10

y = fmem(pi./(1:5)')

So, we still get the answers we expect.

Now let's compute some more values, some already in the cache and others not.

ymore = fmem(pi./(1:10)')

ymore = 10×1

   1.2246e-16
            1
      0.86603
      0.70711
      0.58779
          0.5
      0.43388
      0.38268
      0.34202
      0.30902

Again, no surprises on the out. The values are the ones we expect. I am not doing enough computation here for you to see the benefit of reduced time from caching, however.

Is That All?

Of course not! There are a bunch of choices you can use to control how much information gets cached, etc. Here's some links for more information.

Now let's see how this works. First, what is fmem?

fmem

fmem = 
  MemoizedFunction with properties:

     Function: @sin
      Enabled: 1
    CacheSize: 10

We see what function is being memoized, that caching is enabled, and how many distinct inputs are being cached. Since the inputs are consider collectively and I have called fmem 3 time so far with 3 different inputs (never mind that some values are shared), I should have 3 "elements" in the cache.

Let's see what's been cached.

s = stats(fmem)

s = struct with fields:
                    Cache: [1×1 struct]
       MostHitCachedInput: [1×1 struct]
      CacheHitRatePercent: 77.778
    CacheOccupancyPercent: 40

s.Cache

ans = struct with fields:
         Inputs: {{1×1 cell}  {1×1 cell}  {1×1 cell}  {1×1 cell}}
        Nargout: [1 1 1 1]
        Outputs: {{1×1 cell}  {1×1 cell}  {1×1 cell}  {1×1 cell}}
       HitCount: [4 9 1 0]
      TotalHits: 14
    TotalMisses: 4

And now let's use another input.

ysomemore = fmem(pi./-(1:12)')

ysomemore = 12×1

  -1.2246e-16
           -1
     -0.86603
     -0.70711
     -0.58779
         -0.5
     -0.43388
     -0.38268
     -0.34202
     -0.30902
      ⋮

snew = stats(fmem)

snew = struct with fields:
                    Cache: [1×1 struct]
       MostHitCachedInput: [1×1 struct]
      CacheHitRatePercent: 78.947
    CacheOccupancyPercent: 40

snew.Cache

ans = struct with fields:
         Inputs: {{1×1 cell}  {1×1 cell}  {1×1 cell}  {1×1 cell}}
        Nargout: [1 1 1 1]
        Outputs: {{1×1 cell}  {1×1 cell}  {1×1 cell}  {1×1 cell}}
       HitCount: [4 9 1 1]
      TotalHits: 15
    TotalMisses: 4

Now see what happens to the cached if we repeat an input.

yrepeat = fmem(pi./(1:10)')

yrepeat = 10×1

   1.2246e-16
            1
      0.86603
      0.70711
      0.58779
          0.5
      0.43388
      0.38268
      0.34202
      0.30902

srepeat = stats(fmem)

srepeat = struct with fields:
                    Cache: [1×1 struct]
       MostHitCachedInput: [1×1 struct]
      CacheHitRatePercent: 80
    CacheOccupancyPercent: 40

srepeat.Cache

ans = struct with fields:
         Inputs: {{1×1 cell}  {1×1 cell}  {1×1 cell}  {1×1 cell}}
        Nargout: [1 1 1 1]
        Outputs: {{1×1 cell}  {1×1 cell}  {1×1 cell}  {1×1 cell}}
       HitCount: [4 10 1 1]
      TotalHits: 16
    TotalMisses: 4

I can also clear the cache for a particular function or clear the caches for all memoized functions:

Do You Use Memoization?

Do you ever use memoization in your code, with or without the MATLAB functions? Let us know how you do this here.

From Reference from My 2006 Post

function f = memoize2(F)
% one-arg F, inputs testable with ==
% allow nonscalar input.
x = [];
y = [];
f = @inner;
    function out = inner(in)
        out = zeros(size(in));  % preallocate output
        [tf,loc] = ismember(in,x);  % find which in's already computed in x
        ft = ~tf;  % ones to be computed
        out(ft) = F(in(ft));  % get output values for ones not already in
        % place new values in storage
        x = [x in(ft(:).')];
        y = [y reshape(out(ft),1,[])];
        out(tf) = y(loc(tf));  % fill in the rest of the output values
    end
end

and

function f = memoize1(F)
% one-arg F, inputs testable with ==
x = [];
y = [];
f = @inner;
    function out = inner(in)
        ind = find(in == x);
        if isempty(ind)
            out = F(in);
            x(end+1) = in;
            y(end+1) = out;
        else
            out = y(ind);
        end
    end
end