<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Speeding Up MATLAB Applications</title>
	<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/</link>
	<description>Loren Shure  works on design of the MATLAB language at &#60;a href="http://www.mathworks.com/"&#62;The MathWorks&#60;/a&#62;. She writes here about once a week on MATLAB programming and related topics. &#60;br&#62;&#60;br&#62;&#60;a href="/images/loren-full.jpg"&#62;&#60;img src="/images/loren.jpg"&#62;&#60;/a&#62;</description>
	<pubDate>Mon, 23 Nov 2009 00:42:18 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.1</generator>
		<item>
		<title>By: B Cook</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-30540</link>
		<dc:creator>B Cook</dc:creator>
		<pubDate>Tue, 18 Aug 2009 13:40:53 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-30540</guid>
		<description>OK that's interesting the changes don't look like much at the moment, I will have to compare side by side to really see the difference I think. My first trial makes me think your laptop runs a bit faster than mine! I made the first change you suggested and it improved to around 1.8s, so not bad I think there shouldn't  be a big problem to improve on this now.





Thanks for the suggestions!</description>
		<content:encoded><![CDATA[<p>OK that&#8217;s interesting the changes don&#8217;t look like much at the moment, I will have to compare side by side to really see the difference I think. My first trial makes me think your laptop runs a bit faster than mine! I made the first change you suggested and it improved to around 1.8s, so not bad I think there shouldn&#8217;t  be a big problem to improve on this now.</p>
<p>Thanks for the suggestions!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sarah Z</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-30526</link>
		<dc:creator>Sarah Z</dc:creator>
		<pubDate>Wed, 12 Aug 2009 15:40:07 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-30526</guid>
		<description>Ben --

So, I played around with your code a bit.  Running your initial code on my laptop, it takes ~1.06 seconds.  If I alter your vectorized code by changing all cases where you indexing into the whole matrix into just using the matrix itself: 

&lt;pre&gt;
% old code 
% Ex2(:,:,:)=(Ex1(:,:,:))+deltaE*(dHzdy(:,:,:)-dHydz(:,:,:));
%
% new code
Ex2 = Ex1+deltaE*(dHzdy -dHydz);
&lt;/pre&gt;

Your vectorized code runs in ~0.78 seconds. 

This is my first shot at vectorization:

&lt;pre&gt;


gridxs = 100;
gridys = 100;
gridzs = 100;
deltaH = 0.1;
deltaE = 0.1;


%%  Vectorized Version
RandStream.setDefaultStream(RandStream('mt19937ar','seed',1));

Hx = rand(gridxs,gridys,gridzs);
Hy = rand(gridxs,gridys,gridzs);
Hz = rand(gridxs,gridys,gridzs);

Ex = rand(gridxs,gridys,gridzs);
Ey = rand(gridxs,gridys,gridzs);
Ez = rand(gridxs,gridys,gridzs);

Hx22 = zeros(gridxs,gridys,gridzs);
Hy22 = zeros(gridxs,gridys,gridzs);
Hz22 = zeros(gridxs,gridys,gridzs);

Ex22 = zeros(gridxs,gridys,gridzs);
Ey22 = zeros(gridxs,gridys,gridzs);
Ez22 = zeros(gridxs,gridys,gridzs);

tic

% dExdx2 = diff(Ex(2:end-1,2:end-2,2:end-2),1,1);
dExdy2 = diff(Ex(2:end-2,2:end-1,2:end-2),1,2);
dExdz2 = diff(Ex(2:end-2,2:end-2,2:end-1),1,3);

dEydx2 = diff(Ey(2:end-1,2:end-2,2:end-2),1,1);
% dEydy2 = diff(Ey(2:end-2,2:end-1,2:end-2),1,2);
dEydz2 = diff(Ey(2:end-2,2:end-2,2:end-1),1,3);

dEzdx2 = diff(Ez(2:end-1,2:end-2,2:end-2),1,1);
dEzdy2 = diff(Ez(2:end-2,2:end-1,2:end-2),1,2);
% dEzdz2 = diff(Ez(2:end-2,2:end-2,2:end-1),1,3);

Hx22(2:end-2,2:end-2,2:end-2)= Hx(2:end-2,2:end-2,2:end-2)-deltaH*(dEzdy2-dEydz2);
Hy22(2:end-2,2:end-2,2:end-2)= Hy(2:end-2,2:end-2,2:end-2)-deltaH*(dExdz2-dEzdx2);
Hz22(2:end-2,2:end-2,2:end-2)= Hz(2:end-2,2:end-2,2:end-2)-deltaH*(dEydx2-dExdy2);

% dHxdx2 = diff(Hx22(1:end-2,2:end-2,2:end-2),1,1);
dHxdy2 = diff(Hx22(2:end-2,1:end-2,2:end-2),1,2);
dHxdz2 = diff(Hx22(2:end-2,2:end-2,1:end-2),1,3);

dHydx2 = diff(Hy22(1:end-2,2:end-2,2:end-2),1,1);
% dHydy2 = diff(Hy22(2:end-2,1:end-2,2:end-2),1,2);
dHydz2 = diff(Hy22(2:end-2,2:end-2,1:end-2),1,3);

dHzdx2 = diff(Hz22(1:end-2,2:end-2,2:end-2),1,1);
dHzdy2 = diff(Hz22(2:end-2,1:end-2,2:end-2),1,2);
% dHzdz2 = diff(Hz22(2:end-2,2:end-2,1:end-2),1,3);

Ex22(2:end-2,2:end-2,2:end-2)= Ex(2:end-2,2:end-2,2:end-2)+deltaE*(dHzdy2-dHydz2);
Ey22(2:end-2,2:end-2,2:end-2)= Ey(2:end-2,2:end-2,2:end-2)+deltaE*(dHxdz2-dHzdx2);
Ez22(2:end-2,2:end-2,2:end-2)= Ez(2:end-2,2:end-2,2:end-2)+deltaE*(dHydx2-dHxdy2);

&lt;/pre&gt;

My code takes ~0.44 seconds to run.  With more tweaking I could perhaps get it slower, but vectorization does improve the performance considerably. I also compare the answer from your code to mine and they give the same outputs. I used diff because it made for cleaner code. 

Cheers,
Sarah</description>
		<content:encoded><![CDATA[<p>Ben &#8211;</p>
<p>So, I played around with your code a bit.  Running your initial code on my laptop, it takes ~1.06 seconds.  If I alter your vectorized code by changing all cases where you indexing into the whole matrix into just using the matrix itself: </p>
<pre>
% old code
% Ex2(:,:,:)=(Ex1(:,:,:))+deltaE*(dHzdy(:,:,:)-dHydz(:,:,:));
%
% new code
Ex2 = Ex1+deltaE*(dHzdy -dHydz);
</pre>
<p>Your vectorized code runs in ~0.78 seconds. </p>
<p>This is my first shot at vectorization:</p>
<pre>

gridxs = 100;
gridys = 100;
gridzs = 100;
deltaH = 0.1;
deltaE = 0.1;

%%  Vectorized Version
RandStream.setDefaultStream(RandStream('mt19937ar','seed',1));

Hx = rand(gridxs,gridys,gridzs);
Hy = rand(gridxs,gridys,gridzs);
Hz = rand(gridxs,gridys,gridzs);

Ex = rand(gridxs,gridys,gridzs);
Ey = rand(gridxs,gridys,gridzs);
Ez = rand(gridxs,gridys,gridzs);

Hx22 = zeros(gridxs,gridys,gridzs);
Hy22 = zeros(gridxs,gridys,gridzs);
Hz22 = zeros(gridxs,gridys,gridzs);

Ex22 = zeros(gridxs,gridys,gridzs);
Ey22 = zeros(gridxs,gridys,gridzs);
Ez22 = zeros(gridxs,gridys,gridzs);

tic

% dExdx2 = diff(Ex(2:end-1,2:end-2,2:end-2),1,1);
dExdy2 = diff(Ex(2:end-2,2:end-1,2:end-2),1,2);
dExdz2 = diff(Ex(2:end-2,2:end-2,2:end-1),1,3);

dEydx2 = diff(Ey(2:end-1,2:end-2,2:end-2),1,1);
% dEydy2 = diff(Ey(2:end-2,2:end-1,2:end-2),1,2);
dEydz2 = diff(Ey(2:end-2,2:end-2,2:end-1),1,3);

dEzdx2 = diff(Ez(2:end-1,2:end-2,2:end-2),1,1);
dEzdy2 = diff(Ez(2:end-2,2:end-1,2:end-2),1,2);
% dEzdz2 = diff(Ez(2:end-2,2:end-2,2:end-1),1,3);

Hx22(2:end-2,2:end-2,2:end-2)= Hx(2:end-2,2:end-2,2:end-2)-deltaH*(dEzdy2-dEydz2);
Hy22(2:end-2,2:end-2,2:end-2)= Hy(2:end-2,2:end-2,2:end-2)-deltaH*(dExdz2-dEzdx2);
Hz22(2:end-2,2:end-2,2:end-2)= Hz(2:end-2,2:end-2,2:end-2)-deltaH*(dEydx2-dExdy2);

% dHxdx2 = diff(Hx22(1:end-2,2:end-2,2:end-2),1,1);
dHxdy2 = diff(Hx22(2:end-2,1:end-2,2:end-2),1,2);
dHxdz2 = diff(Hx22(2:end-2,2:end-2,1:end-2),1,3);

dHydx2 = diff(Hy22(1:end-2,2:end-2,2:end-2),1,1);
% dHydy2 = diff(Hy22(2:end-2,1:end-2,2:end-2),1,2);
dHydz2 = diff(Hy22(2:end-2,2:end-2,1:end-2),1,3);

dHzdx2 = diff(Hz22(1:end-2,2:end-2,2:end-2),1,1);
dHzdy2 = diff(Hz22(2:end-2,1:end-2,2:end-2),1,2);
% dHzdz2 = diff(Hz22(2:end-2,2:end-2,1:end-2),1,3);

Ex22(2:end-2,2:end-2,2:end-2)= Ex(2:end-2,2:end-2,2:end-2)+deltaE*(dHzdy2-dHydz2);
Ey22(2:end-2,2:end-2,2:end-2)= Ey(2:end-2,2:end-2,2:end-2)+deltaE*(dHxdz2-dHzdx2);
Ez22(2:end-2,2:end-2,2:end-2)= Ez(2:end-2,2:end-2,2:end-2)+deltaE*(dHydx2-dHxdy2);
</pre>
<p>My code takes ~0.44 seconds to run.  With more tweaking I could perhaps get it slower, but vectorization does improve the performance considerably. I also compare the answer from your code to mine and they give the same outputs. I used diff because it made for cleaner code. </p>
<p>Cheers,<br />
Sarah</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: B Cook</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-30519</link>
		<dc:creator>B Cook</dc:creator>
		<pubDate>Fri, 07 Aug 2009 11:09:57 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-30519</guid>
		<description>Ooops in my previous post the below code should be within the inner loop like all the rest of it 

     &lt;pre&gt;           %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                Ex2(i,j,k)= Ex(i,j,k)+deltaE*(dHzdy-dHydz);
                Ey2(i,j,k)= Ey(i,j,k)+deltaE*(dHxdz-dHzdx);
                Ez2(i,j,k)= Ez(i,j,k)+deltaE*(dHydx-dHxdy);

&lt;/pre&gt;</description>
		<content:encoded><![CDATA[<p>Ooops in my previous post the below code should be within the inner loop like all the rest of it </p>
<pre>           %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                Ex2(i,j,k)= Ex(i,j,k)+deltaE*(dHzdy-dHydz);
                Ey2(i,j,k)= Ey(i,j,k)+deltaE*(dHxdz-dHzdx);
                Ez2(i,j,k)= Ez(i,j,k)+deltaE*(dHydx-dHxdy);
</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: B Cook</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-30518</link>
		<dc:creator>B Cook</dc:creator>
		<pubDate>Fri, 07 Aug 2009 11:07:27 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-30518</guid>
		<description>I have some finite difference time domain code for solving maxwells equations - pretty basic stuff, and on a 100 by 100 by 100 grid it ran/runs very slowly  the loop part of the code looks like
&lt;pre&gt;
for t=2:1:T-2
   tic
    for i=2:1:gridxs-2
        for j=2:1:gridys-2
            for k=2:1:gridzs-2
                %%%%%%%%%

                dExdx=Ex(i+1,j,k)-Ex(i,j,k);
                dExdy=Ex(i,j+1,k)-Ex(i,j,k);
                dExdz=Ex(i,j,k+1)-Ex(i,j,k);
                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dEydx=Ey(i+1,j,k)-Ey(i,j,k);
                dEydy=Ey(i,j+1,k)-Ey(i,j,k);
                dEydz=Ey(i,j,k+1)-Ey(i,j,k);
                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dEzdx=Ez(i+1,j,k)-Ez(i,j,k);
                dEzdy=Ez(i,j+1,k)-Ez(i,j,k);
                dEzdz=Ez(i,j,k+1)-Ez(i,j,k);
                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

                Hx2(i,j,k)= Hx(i,j,k)-deltaH*(dEzdy-dEydz);                               
                Hy2(i,j,k)= Hy(i,j,k)-deltaH*(dExdz-dEzdx);
                Hz2(i,j,k)= Hz(i,j,k)-deltaH*(dEydx-dExdy);        
               %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dHxdx=Hx2(i,j,k)-Hx2(i-1,j,k);
                dHxdy=Hx2(i,j,k)-Hx2(i,j-1,k);
                dHxdz=Hx2(i,j,k)-Hx2(i,j,k-1);                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dHydx=Hy2(i,j,k)-Hy2(i-1,j,k);
                dHydy=Hy2(i,j,k)-Hy2(i,j-1,k);
                dHydz=Hy2(i,j,k)-Hy2(i,j,k-1);                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dHzdx=Hz2(i,j,k)-Hz2(i-1,j,k);
                dHzdy=Hz2(i,j,k)-Hz2(i,j-1,k);
                dHzdz=Hz2(i,j,k)-Hz2(i,j,k-1);
            end
         end
      end
toc
end

                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                Ex2(i,j,k)= Ex(i,j,k)+deltaE*(dHzdy-dHydz);
                Ey2(i,j,k)= Ey(i,j,k)+deltaE*(dHxdz-dHzdx);
                Ez2(i,j,k)= Ez(i,j,k)+deltaE*(dHydx-dHxdy);
&lt;/pre&gt;
I am sure there are bugs in the code,but it runs so slowly that I can't be bothered waiting for it to run to find out

tic toc gives 2.8 seconds....
I tried to vectorise my code (probably added some extra bugs along the way)

it now looks like
&lt;pre&gt;

for t=2:1:T-2
    tic
  
    dExdx(2:iiend,:,:)=Ex1(3:iiend+1,:,:)-Ex1(2:iiend,:,:);
    dExdy(:,2:jjend,:)=Ex1(:,3:jjend+1,:)-Ex1(:,2:jjend,:);
    dExdz(:,:,2:kkend)=Ex1(:,:,3:kkend+1)-Ex1(:,:,2:kkend);

    dEydx(2:iiend,:,:)=Ey1(3:iiend+1,:,:)-Ey1(2:iiend,:,:);
    dEydy(:,2:jjend,:)=Ey1(:,3:jjend+1,:)-Ey1(:,2:jjend,:);
    dEydz(:,:,2:kkend)=Ey1(:,:,3:kkend+1)-Ey1(:,:,2:kkend);
    
    dEzdx(2:iiend,:,:)=Ez1(3:iiend+1,:,:)-Ez1(2:iiend,:,:);
    dEzdy(:,2:jjend,:)=Ez1(:,3:jjend+1,:)-Ez1(:,2:jjend,:);
    dEzdz(:,:,2:kkend)=Ez1(:,:,3:kkend+1)-Ez1(:,:,2:kkend);
   

    Hx2(:,:,:)= Hx1(:,:,:)-deltaH*(dEzdy(:,:,:)-dEydz(:,:,:));
    Hy2(:,:,:)= Hy1(:,:,:)-deltaH*(dExdz(:,:,:)-dEzdx(:,:,:));
    Hz2(:,:,:)= Hz1(:,:,:)-deltaH*(dEydx(:,:,:)-dExdy(:,:,:));


    dHxdx(2:iiend,:,:)=Hx1(3:iiend+1,:,:)-Hx1(2:iiend,:,:);
    dHxdy(:,2:jjend,:)=Hx1(:,3:jjend+1,:)-Hx1(:,2:jjend,:);
    dHxdz(:,:,2:kkend)=Hx1(:,:,3:kkend+1)-Hx1(:,:,2:kkend);

    dHydx(2:iiend,:,:)=Hy1(3:iiend+1,:,:)-Hy1(2:iiend,:,:);
    dHydy(:,2:jjend,:)=Hy1(:,3:jjend+1,:)-Hy1(:,2:jjend,:);
    dHydz(:,:,2:kkend)=Hy1(:,:,3:kkend+1)-Hy1(:,:,2:kkend);
    
    dHzdx(2:iiend,:,:)=Hz1(3:iiend+1,:,:)-Hz1(2:iiend,:,:);
    dHzdy(:,2:jjend,:)=Hz1(:,3:jjend+1,:)-Hz1(:,2:jjend,:);
    dHzdz(:,:,2:kkend)=Hz1(:,:,3:kkend+1)-Hz1(:,:,2:kkend);
    
    
    Ex2(:,:,:)= (Ex1(:,:,:))+deltaE*(dHzdy(:,:,:)-dHydz(:,:,:));
    Ey2(:,:,:)= (Ey1(:,:,:))+deltaE*(dHxdz(:,:,:)-dHzdx(:,:,:));
    Ez2(:,:,:)= (Ez1(:,:,:))+deltaE*(dHydx(:,:,:)-dHxdy(:,:,:));


    toc    
end
&lt;/pre&gt;

result of tic toc-- exactly the same 2.5 secs ??? 

I found that matlab has a numerical diff and numerical curl, in a speed test my code was at least twice as fast  (though looking at the code clearly not as versatile)...

Eventually there was nothing left to do except make a mex file. Now it runs without hick-up in about 0.2 secs or so.

Why didn't vecorisation work for me?

What can I have done wrong? I declared all my variables before hand with code such as

Ex(gridxs,gridzs,gridys)=0;

and analysed everything with profile

My only guess is memory problems, but as these are predeclared and allocated why should that matter?


-Ben</description>
		<content:encoded><![CDATA[<p>I have some finite difference time domain code for solving maxwells equations - pretty basic stuff, and on a 100 by 100 by 100 grid it ran/runs very slowly  the loop part of the code looks like</p>
<pre>
for t=2:1:T-2
   tic
    for i=2:1:gridxs-2
        for j=2:1:gridys-2
            for k=2:1:gridzs-2
                %%%%%%%%%

                dExdx=Ex(i+1,j,k)-Ex(i,j,k);
                dExdy=Ex(i,j+1,k)-Ex(i,j,k);
                dExdz=Ex(i,j,k+1)-Ex(i,j,k);
                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dEydx=Ey(i+1,j,k)-Ey(i,j,k);
                dEydy=Ey(i,j+1,k)-Ey(i,j,k);
                dEydz=Ey(i,j,k+1)-Ey(i,j,k);
                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dEzdx=Ez(i+1,j,k)-Ez(i,j,k);
                dEzdy=Ez(i,j+1,k)-Ez(i,j,k);
                dEzdz=Ez(i,j,k+1)-Ez(i,j,k);
                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

                Hx2(i,j,k)= Hx(i,j,k)-deltaH*(dEzdy-dEydz);
                Hy2(i,j,k)= Hy(i,j,k)-deltaH*(dExdz-dEzdx);
                Hz2(i,j,k)= Hz(i,j,k)-deltaH*(dEydx-dExdy);
               %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dHxdx=Hx2(i,j,k)-Hx2(i-1,j,k);
                dHxdy=Hx2(i,j,k)-Hx2(i,j-1,k);
                dHxdz=Hx2(i,j,k)-Hx2(i,j,k-1);                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dHydx=Hy2(i,j,k)-Hy2(i-1,j,k);
                dHydy=Hy2(i,j,k)-Hy2(i,j-1,k);
                dHydz=Hy2(i,j,k)-Hy2(i,j,k-1);                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                dHzdx=Hz2(i,j,k)-Hz2(i-1,j,k);
                dHzdy=Hz2(i,j,k)-Hz2(i,j-1,k);
                dHzdz=Hz2(i,j,k)-Hz2(i,j,k-1);
            end
         end
      end
toc
end

                %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                Ex2(i,j,k)= Ex(i,j,k)+deltaE*(dHzdy-dHydz);
                Ey2(i,j,k)= Ey(i,j,k)+deltaE*(dHxdz-dHzdx);
                Ez2(i,j,k)= Ez(i,j,k)+deltaE*(dHydx-dHxdy);
</pre>
<p>I am sure there are bugs in the code,but it runs so slowly that I can&#8217;t be bothered waiting for it to run to find out</p>
<p>tic toc gives 2.8 seconds&#8230;.<br />
I tried to vectorise my code (probably added some extra bugs along the way)</p>
<p>it now looks like</p>
<pre>

for t=2:1:T-2
    tic

    dExdx(2:iiend,:,:)=Ex1(3:iiend+1,:,:)-Ex1(2:iiend,:,:);
    dExdy(:,2:jjend,:)=Ex1(:,3:jjend+1,:)-Ex1(:,2:jjend,:);
    dExdz(:,:,2:kkend)=Ex1(:,:,3:kkend+1)-Ex1(:,:,2:kkend);

    dEydx(2:iiend,:,:)=Ey1(3:iiend+1,:,:)-Ey1(2:iiend,:,:);
    dEydy(:,2:jjend,:)=Ey1(:,3:jjend+1,:)-Ey1(:,2:jjend,:);
    dEydz(:,:,2:kkend)=Ey1(:,:,3:kkend+1)-Ey1(:,:,2:kkend);

    dEzdx(2:iiend,:,:)=Ez1(3:iiend+1,:,:)-Ez1(2:iiend,:,:);
    dEzdy(:,2:jjend,:)=Ez1(:,3:jjend+1,:)-Ez1(:,2:jjend,:);
    dEzdz(:,:,2:kkend)=Ez1(:,:,3:kkend+1)-Ez1(:,:,2:kkend);

    Hx2(:,:,:)= Hx1(:,:,:)-deltaH*(dEzdy(:,:,:)-dEydz(:,:,:));
    Hy2(:,:,:)= Hy1(:,:,:)-deltaH*(dExdz(:,:,:)-dEzdx(:,:,:));
    Hz2(:,:,:)= Hz1(:,:,:)-deltaH*(dEydx(:,:,:)-dExdy(:,:,:));

    dHxdx(2:iiend,:,:)=Hx1(3:iiend+1,:,:)-Hx1(2:iiend,:,:);
    dHxdy(:,2:jjend,:)=Hx1(:,3:jjend+1,:)-Hx1(:,2:jjend,:);
    dHxdz(:,:,2:kkend)=Hx1(:,:,3:kkend+1)-Hx1(:,:,2:kkend);

    dHydx(2:iiend,:,:)=Hy1(3:iiend+1,:,:)-Hy1(2:iiend,:,:);
    dHydy(:,2:jjend,:)=Hy1(:,3:jjend+1,:)-Hy1(:,2:jjend,:);
    dHydz(:,:,2:kkend)=Hy1(:,:,3:kkend+1)-Hy1(:,:,2:kkend);

    dHzdx(2:iiend,:,:)=Hz1(3:iiend+1,:,:)-Hz1(2:iiend,:,:);
    dHzdy(:,2:jjend,:)=Hz1(:,3:jjend+1,:)-Hz1(:,2:jjend,:);
    dHzdz(:,:,2:kkend)=Hz1(:,:,3:kkend+1)-Hz1(:,:,2:kkend);

    Ex2(:,:,:)= (Ex1(:,:,:))+deltaE*(dHzdy(:,:,:)-dHydz(:,:,:));
    Ey2(:,:,:)= (Ey1(:,:,:))+deltaE*(dHxdz(:,:,:)-dHzdx(:,:,:));
    Ez2(:,:,:)= (Ez1(:,:,:))+deltaE*(dHydx(:,:,:)-dHxdy(:,:,:));

    toc
end
</pre>
<p>result of tic toc&#8211; exactly the same 2.5 secs ??? </p>
<p>I found that matlab has a numerical diff and numerical curl, in a speed test my code was at least twice as fast  (though looking at the code clearly not as versatile)&#8230;</p>
<p>Eventually there was nothing left to do except make a mex file. Now it runs without hick-up in about 0.2 secs or so.</p>
<p>Why didn&#8217;t vecorisation work for me?</p>
<p>What can I have done wrong? I declared all my variables before hand with code such as</p>
<p>Ex(gridxs,gridzs,gridys)=0;</p>
<p>and analysed everything with profile</p>
<p>My only guess is memory problems, but as these are predeclared and allocated why should that matter?</p>
<p>-Ben</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Loren</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29611</link>
		<dc:creator>Loren</dc:creator>
		<pubDate>Thu, 24 Jul 2008 11:16:54 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29611</guid>
		<description>Oktay-

It very much depends on the details of the calculations you are doing.  Vectorization can sometimes require large intermediate arrays that might not be required by using cellfun, for example.  Grabbing the extra memory can cost time.  On the other hand, with cellfun, etc., you have the overhead of error checking/validation for each element instead of once for the whole array.  So it depends on the balance of what you are doing and what the typical sizes are.  I suggest you use the profiler to try out your specific example.

--Loren</description>
		<content:encoded><![CDATA[<p>Oktay-</p>
<p>It very much depends on the details of the calculations you are doing.  Vectorization can sometimes require large intermediate arrays that might not be required by using cellfun, for example.  Grabbing the extra memory can cost time.  On the other hand, with cellfun, etc., you have the overhead of error checking/validation for each element instead of once for the whole array.  So it depends on the balance of what you are doing and what the typical sizes are.  I suggest you use the profiler to try out your specific example.</p>
<p>&#8211;Loren</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Oktay</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29610</link>
		<dc:creator>Oktay</dc:creator>
		<pubDate>Thu, 24 Jul 2008 09:08:47 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29610</guid>
		<description>Hello,

Is there any significant difference between using:
- Vectorization inside a subfunction
- Benefiting from arrayfun, cellfun, or maybe some Mex functions outside the subfunction



For example:

I have two functions f1 and f2 which has subfunctions f1sub and f2sub respectively:



function y = f1
x = [1 2 3];
y = f1sub(x);
function z = f1sub(x)
- - do some validations on x here - -
z = x.^2;



function y = f2
x = [1 2 3];
y = arrayfun(@f2sub,x)
function z = f2sub(x)
- - do some validations on x, (x should have only a single element) - -
z = x^2;



Thank you.



Oktay</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>Is there any significant difference between using:<br />
- Vectorization inside a subfunction<br />
- Benefiting from arrayfun, cellfun, or maybe some Mex functions outside the subfunction</p>
<p>For example:</p>
<p>I have two functions f1 and f2 which has subfunctions f1sub and f2sub respectively:</p>
<p>function y = f1<br />
x = [1 2 3];<br />
y = f1sub(x);<br />
function z = f1sub(x)<br />
- - do some validations on x here - -<br />
z = x.^2;</p>
<p>function y = f2<br />
x = [1 2 3];<br />
y = arrayfun(@f2sub,x)<br />
function z = f2sub(x)<br />
- - do some validations on x, (x should have only a single element) - -<br />
z = x^2;</p>
<p>Thank you.</p>
<p>Oktay</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sarah Zaranek</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29607</link>
		<dc:creator>Sarah Zaranek</dc:creator>
		<pubDate>Mon, 21 Jul 2008 20:44:38 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29607</guid>
		<description>Hi Jacob,

Sorry about the slow response. You are correct that the code would be slower without the JIT/accelerator.

The JIT/accelerator was first introduced in MATLAB 6.5 (Release 13).  When you first run code, it is analyzed and, when possible, machine code is generated and used in place of the interpreted code.  The JIT/accelerator initially sped up some calculations in loops that obeyed certain characteristics, e.g., scalar operations on double values.   The JIT/accelerator has continued to evolve and handles much more of the MATLAB language, but still incurs costs the first time (or few times) through a loop so assumptions about variables being scalar, for example, can be verified.  If the assumptions are ever broken, MATLAB does know to use the interpreted version of code.  But if the assumptions hold, the loops can execute quickly.

Let's look at the first case with nx1=nx2=50. When the JIT/accelerator is turned off, we get a speed around 371 seconds. Compared to the case I showed in the blog post of 296 seconds. Interestingly, you could put a tic-toc pair within the inner-most for-loop itself to see what effect the JIT has on each loop.  The subsequent loops after the first loop are on average 27 times faster. 

If you are working on elements of a matrix and iterating through them (such as one of our later iterations of the code), a single for-loop is for certain cases almost just as fast as the matrix operation, especially if you remember that MATLAB stores matrix columns in monotonically increasing memory locations. This means that processing data column-wise results in faster times then processing data row-wise. 

Stuart McGarrity has a great News and Notes article written about this here: 
http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html

We recommend not writing specifically for the JIT since the JIT is constantly evolving. You should though keep in mind the general good MATLAB-coding practices, such as those listed above.</description>
		<content:encoded><![CDATA[<p>Hi Jacob,</p>
<p>Sorry about the slow response. You are correct that the code would be slower without the JIT/accelerator.</p>
<p>The JIT/accelerator was first introduced in MATLAB 6.5 (Release 13).  When you first run code, it is analyzed and, when possible, machine code is generated and used in place of the interpreted code.  The JIT/accelerator initially sped up some calculations in loops that obeyed certain characteristics, e.g., scalar operations on double values.   The JIT/accelerator has continued to evolve and handles much more of the MATLAB language, but still incurs costs the first time (or few times) through a loop so assumptions about variables being scalar, for example, can be verified.  If the assumptions are ever broken, MATLAB does know to use the interpreted version of code.  But if the assumptions hold, the loops can execute quickly.</p>
<p>Let&#8217;s look at the first case with nx1=nx2=50. When the JIT/accelerator is turned off, we get a speed around 371 seconds. Compared to the case I showed in the blog post of 296 seconds. Interestingly, you could put a tic-toc pair within the inner-most for-loop itself to see what effect the JIT has on each loop.  The subsequent loops after the first loop are on average 27 times faster. </p>
<p>If you are working on elements of a matrix and iterating through them (such as one of our later iterations of the code), a single for-loop is for certain cases almost just as fast as the matrix operation, especially if you remember that MATLAB stores matrix columns in monotonically increasing memory locations. This means that processing data column-wise results in faster times then processing data row-wise. </p>
<p>Stuart McGarrity has a great News and Notes article written about this here:<br />
<a href="http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html" rel="nofollow">http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html</a></p>
<p>We recommend not writing specifically for the JIT since the JIT is constantly evolving. You should though keep in mind the general good MATLAB-coding practices, such as those listed above.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Eddins</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29569</link>
		<dc:creator>Steve Eddins</dc:creator>
		<pubDate>Mon, 07 Jul 2008 18:24:44 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29569</guid>
		<description>Oliver&#8212;In addition to the links Loren provided about logical indexing, I have &lt;a href="http://blogs.mathworks.com/steve/2008/01/28/logical-indexing/" rel="nofollow"&gt;posted about logical indexing&lt;/a&gt; as well.</description>
		<content:encoded><![CDATA[<p>Oliver&mdash;In addition to the links Loren provided about logical indexing, I have <a href="http://blogs.mathworks.com/steve/2008/01/28/logical-indexing/" rel="nofollow">posted about logical indexing</a> as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yakov</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29554</link>
		<dc:creator>Yakov</dc:creator>
		<pubDate>Sat, 28 Jun 2008 01:49:21 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29554</guid>
		<description>In response to jhofman, for this application bsxfun is only slightly slower than that website describes, but uses much less memory. Additional advantage is the easier to read code.</description>
		<content:encoded><![CDATA[<p>In response to jhofman, for this application bsxfun is only slightly slower than that website describes, but uses much less memory. Additional advantage is the easier to read code.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Loren</title>
		<link>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29550</link>
		<dc:creator>Loren</dc:creator>
		<pubDate>Wed, 25 Jun 2008 21:31:59 +0000</pubDate>
		<guid>http://blogs.mathworks.com/loren/2008/06/25/speeding-up-matlab-applications/#comment-29550</guid>
		<description>Dan-

I haven't looked into the innards of MATLAB, but I think what's happening is that the expression with nPts creates the vector with indices first and then uses it.  The expression with 'end' has more oppportunties for optimizing since 'end' depends on context and doesn't just get calculated without knowing the lay of the land.  Once MATLAB has the expression in context, MATLAB might not need to create the full index vector first, but can construct the output without the intermediate vector.  That guess is predicated on end and using :.  If you did something else like [1 2 3 end], I don't know if you'd see similar performance results.  In any case, this is all speculation on my part.

--Loren</description>
		<content:encoded><![CDATA[<p>Dan-</p>
<p>I haven&#8217;t looked into the innards of MATLAB, but I think what&#8217;s happening is that the expression with nPts creates the vector with indices first and then uses it.  The expression with &#8216;end&#8217; has more oppportunties for optimizing since &#8216;end&#8217; depends on context and doesn&#8217;t just get calculated without knowing the lay of the land.  Once MATLAB has the expression in context, MATLAB might not need to create the full index vector first, but can construct the output without the intermediate vector.  That guess is predicated on end and using :.  If you did something else like [1 2 3 end], I don&#8217;t know if you&#8217;d see similar performance results.  In any case, this is all speculation on my part.</p>
<p>&#8211;Loren</p>
]]></content:encoded>
	</item>
</channel>
</rss>
