# Performance improvements in R2010a 10

Posted by **Steve Eddins**,

Last month we shipped R2010a, the first of our two releases this year. R2010a includes version 7.0 of Image Processing Toolbox, a major update.

For the last several releases we've been working on improving the performance of various toolbox functions, and R2010a continues the trend. I want to show you a few plots of the cumulative improvements made over many releases.

The first plot shows the speed of `imfilter` going back to Release 12 Service Pack 1.

The `imfilter` plot shows eight different performance benchmarks over time. Each of the eight curves is individually normalized to the slowest time for that curve. So, for example, the double-precision image, 50-by-50 filter benchmark runs almost 100 times faster in R2010a than it did in R12.1 (on the same computer). The single-precision, 50-by-50 benchmark runs about 61 times faster.

You can see in this plot how performance has changed over time. Several of the benchmarks got faster in R13 Service Pack 1, and then again in R14 Service Pack 1. Single-precision filtering improved in R2008b. It looks like we messed up something in R2009b, when several of the benchmark tests got slower. I have no idea what happened then, but we recovered that loss and much more with the latest R2010a release.

Here's another plot that shows the relative speed of several `bwmorph` operations over time. We made improvements in R2007b and again in R2009b.

A couple of the `bwmorph` curves also show a tick upward in R2010a. That's because we gave an extra boost to the `'skel'` (skeletonization) operation.

Below are benchmark plots for `imresize`, `iradon`, and `bwhitmiss`.

Later I'll more of the performance improvements, and I'll discuss some of the other new features of the release.

PS. The Boston Marathon is still going on as I type this. I'd like to congratulate John and Alex, both developers in my area, for racing well. Alex, who is responsible for some of the speed improvements I described above, finished 49th overall (47th among men) out of a field of about 26,000 runners. That man is all about speed!

## 10 CommentsOldest to Newest

**1**of 10

Were these tests run on a multicore machine? How much of this speedup is due to code being parallelised, and how much faster would these functions be on a single core?

**2**of 10

Brad—The tests were run on a four-core machine. Speed-ups come from a combination of algorithm changes, MATLAB code optimization, C++ code optimization, processor optimization, and multithreading. The amount of speed-up due to multithreading varies considerably from function to function and even varies from syntax to syntax.

**3**of 10

Steve,

I do use imfilter to compute image gradient ( see the following few lines) and I am happy with its performance.

However, I do not know if there is an even faster way to perform the calculation in one go: that is with one single call to a matlab function or one intel primitive rather than two calls to imfilter one after the other … I have explored the help but found nothing on this issue i.e. image gradient in one go

thanks , gianni

% STENCILS for Gradient calculation

Cx_2d=[[-ones(3,1)],[zeros(3,1)],[ones(3,1)]]; % for CGy

Cy_2d=[[ones(1,3)];[zeros(1,3)];[-ones(1,3)]]; % For CGy calculation

CGx=imfilter(Img, Cx_2d); % Grad x

CGy=imfilter(img,-Cy_2d); % Grad y

M_Grad=sqrt(CGx.^2+CGy.^2); % Mod CG (Magnitude )

%M_Grad=abs(CGx) + abs(CGy); % Mod CG (Magnitude as abs value: faster)

CGy=CGy./M_Grad; CGx=CGx./M_Grad;

phi_Grad= atan2(CGy,CGx); % angle with +x (East) direction

ijagm0=find(phi_Grad(:)<0);

phi_Grad(ijagm0) = 2*pi+phi_Grad(ijagm0);

**4**of 10

Gianni—No, sorry.

**5**of 10

I’m hoping to see the video processing run faster. The mmreader( ) is particularly slow.

**6**of 10

I have written some blog posts on iradon and radon at

http://aprendtech.com/wordpress/?p=74

http://aprendtech.com/wordpress/?p=89

I am interested if you have any comments.

Bob

**7**of 10

Bob—The offset problem you mention is currently being investigated by an Image Processing Toolbox developer. In a recent release of the toolbox, we reimplemented portions of iradon in C++ in order to take advantage of multiple threads and extended processor instruction sets. And I like your approach to generating simulated Radon transform data.

**8**of 10

I know that this blogpost is a bit old but it looks like you’re very quick to reply and I seem to be unable to find your email address.

Anyhow, I’ve got some questions regarding imfilter. I’m using it to do 3D filtering, hence not able to make use of the IPP libraries, and I’m wondering how optimized you consider imfilter to be for a single thread implementation, i.e. would it be possible to write a better C implementation with a modest work effort? I assume the answer to be no but I just want to check with you.

Daniel

**9**of 10

Also wondering about the difference in absolute timing for imfilter using single or double precision.

/Daniel

**10**of 10

Daniel—For a three-dimensional filter and three-dimensional filter, imfilter uses an implementation technique that is suitable for implementing arbitary dimension convolution. It’s possible that a C function that’s hand-coded specifically for three dimensions might do better, but I’m not certain. As for the speed of single precision vs. double precision, that is an answer that varies with time and processing architectures. I don’t have a simple answer, and I haven’t benchmarked it recently.

## Recent Comments