This week, I was going to make a great video on recursion to solve a fun little puzzle. However, my code runs too slow. Before my recursive solution to the larger puzzle is practical, I need to clean out this bottleneck in the solver.
I have found the bottleneck to be here (taken out of context for clarity):
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currently this take 3 seconds, would like to do significantly better
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
%
connectedComponents = bwconncomp((board == oldColor),4);
vi = connectedComponents.PixelIdxList{1};
end
toc
Notice that I only want the indices of the pixels that are in the ‘four connected pool’ of pixels of the same value as the pixel in the upper left corner.
The benchmark to beat is 15000 calls in 3 seconds.
Here is a graphical example of what this should do.
My code uses the Image Processing Toolbox. I am hoping a dedicated function could be optimized to run faster than BWCONNCOMP since I am only using a small amount of the information that this function calculates.
The best solution to this by next week will get a MATLAB t-shirt, if it can cut the time on my machine by at least one half.
That does cut the time down well. The funny thing is, early on I used exactly that code and somehow convinced myself BWCONNCOMP was faster. OOPS! I must have changed some other stuff elsewhere too, or I was using larger board sizes at the time.
I still think/hope that someone will be able to make a custom MATLAB function that can really knock the time down.
I don’t have the image processing toolbox, but if I understand the puzzle correctly, this runs in .2 seconds.
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currently this take 3 seconds, would like to do significantly better
board = round(rand(sizeBoardEdge)*numColors);
connectedComponents = islandsmx(board);
if connectedComponents(1,1)
vi = find(connectedComponents == connectedComponents(1,1));
end
end
toc
The islands is on the file exchange, and was created as a result of one of your previous puzzlers!
That is what I am talking about! I just added an else statement such that if no connected components are found, just return vi = 1;
This is going to be hard to beat. I like that the file was already made, and I missed it on the File Exchange. Now to plug this in to my larger solver.
Here’s a solution. I’m sure it can be improved upon, but it sits at around 0.4 seconds on my computer. It’s very, umm… problem-specific, with an extra if clause to help it speed up the first iteration.
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currently this take 3 seconds, would like to do significantly better
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
%
[I, J] = find(board==oldColor);
edgeIdxs = false(length(I),1);
usedIdxs = edgeIdxs;
edgeIdxs(1) = true;
if any(I + J == 3) % Just a cheat for the first instance
while any(edgeIdxs)
oldEdgeIdxs = edgeIdxs;
edgeIdxs(:) = false;
for ii=find(oldEdgeIdxs)'
testMask = abs(I - I(ii)) + abs(J - I(ii));
edgeIdxs(testMask==1 & ~usedIdxs) = 1;
end
usedIdxs = usedIdxs | oldEdgeIdxs;
end
vi = find(usedIdxs);
else
vi = 1;
end
end
toc
I tried this with the number of board colors set to three so that I would get bigger islands, and the function did not get the right answer. It was working for smaller islands that are produced in my original examples though.
I do not understand how this code works, but optimization like this often means the code is harder to understand! That is why I often say wait to optimize.
Thanks
-Doug
ps. If you post corrected code, I will just edit out this comment and your originals for clarity!
I had 1 typo, and one logico, the locations of which are in the code comments. Try this:
[I, J] = find(board==oldColor);
if any(I + J == 3) % Just a cheat for the first instance
edgeIdxs = false(length(I),1);
usedIdxs = edgeIdxs;
edgeIdxs(1) = true;
while any(edgeIdxs)
oldEdgeIdxs = edgeIdxs;
edgeIdxs(:) = false;
for ii=find(oldEdgeIdxs)'
testMask = abs(I - I(ii)) + abs(J - J(ii)); % Site of the typo
edgeIdxs(testMask==1) = 1;
end
usedIdxs = usedIdxs | oldEdgeIdxs;
edgeIdxs = edgeIdxs & ~usedIdxs;
end
vi = sub2ind([sizeBoardEdge sizeBoardEdge], I(usedIdxs), J(usedIdxs)); % Site of the logico... before I forgot (sub2ind)
else
vi = 1;
end
You’re welcome to leave my old stuff in as an example of what not to do. I agree that my current solution is also an example of overly specific code, that should really be avoided.
It works on the principle that, for 4-connectedness, a connected pixel is always adjacent to its connected neighbour. So if you use find() with [I,J] output, a connected pixel must have a difference of 0 rows and 1 column (or vice versa).
This is easily understood if you think about the shortcut I’m using. The only possible connected pixels to the (1,1) pixels must be at (1,2) or (2,1). Ie, the indices add up to 3 (or are different to (1,1) by a combined total of 1).
Phew, that’s still probably not that clear. Anyway, with the addition of my sub2ind() call at the end, it’s now a little bit slower.
This code seems to runs 3.7 times faster, it takes about 0.64 sec to calculate:
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currentlcol this take 3 seconds, would like to do significantlcol better
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
boardPath = round(zeros(sizeBoardEdge)*numColors);
updated=1;
while updated==1
updated=0;
for row=1:numColors
for col=1:sizeBoardEdge
if row==1 && col==1 && boardPath(row,col)==0;
boardPath(row,col) = 1;
updated=1;
continue;
end
if board(row,col)== oldColor && row>1 && boardPath(row-1,col)~=0 && boardPath(row,col)==0;
boardPath(row,col) = 1;
updated=1;
continue;
end
if board(row,col)== oldColor && col>1 && boardPath(row,col-1)~=0 && boardPath(row,col)==0;
boardPath(row,col) = 1;
updated=1;
continue;
end
if board(row,col)== oldColor && col<sizeBoardEdge && boardPath(row,col+1)~=0 && boardPath(row,col)==0;
boardPath(row, col) = 1;
updated=1;
continue;
end
if board(row,col)== oldColor && row<numColors && boardPath(row+1,col)~=0 && boardPath(row,col)==0;
boardPath(row,col) = 1;
updated=1;
continue;
end
end
end
end
vi = find(boardPath==1);
end
toc
board = round(rand(sizeBoardEdge)*numColors);
w = zeros(sizeBoardEdge) ;
t = 1 ;
for j=1:sizeBoardEdge,
dobreak = 1 ;
for i=1:sizeBoardEdge,
w(i,j) = t ;
if i > 1 && board(i,j)==board(i-1,j),
w(w==w(i,j)) = w(i-1,j) ;
dobreak = 0 ;
end
if j > 1 && board(i,j)==board(i,j-1),
w(w==w(i,j)) = w(i,j-1) ;
dobreak = 0 ;
else
t = t + 1 ;
end
end
if dobreak,
break ;
end
end
vi1 = find(w==w(1)) ;
On my (old) PC, 15000 runs with this algorithm took 0.67 seconds vs. 3.96 seconds using BWLABEL.
It seems there is a problem in your code. For a matrix A, if the connected components are say index = [1 7 8], and also A(5)~=A(6), then your code returns only index 1.
Essentially the same algo as before but I’ve joined all the if-s into one. This simple optimization speed this algo up 500%. Now it takes 0.16 sec on my pc vs. 0.66 sec before.
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currentlcol this take 3 seconds, would like to do significantlcol better
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
boardPath = round(zeros(sizeBoardEdge)*numColors);
updated=1;
while updated==1
updated=0;
for row=1:numColors
for col=1:sizeBoardEdge
if row==1 && col==1 && boardPath(row,col)==0 ...
|| board(row,col)== oldColor && row>1 && boardPath(row-1,col)~=0 && boardPath(row,col)==0 ...
|| board(row,col)== oldColor && col>1 && boardPath(row,col-1)~=0 && boardPath(row,col)==0 ...
|| board(row,col)== oldColor && col<sizeBoardEdge && boardPath(row,col+1)~=0 && boardPath(row,col)==0 ...
|| board(row,col)== oldColor && row<numColors && boardPath(row+1,col)~=0 && boardPath(row,col)==0;
boardPath(row,col) = 1;
updated=1;
end
end
end
end
vi = find(boardPath==1);
end
toc
This is a simple solution although may not be the fastest one.
sizeBoardEdge = 6;
numColors = 6;
% extended board to simplify programming
map = false(sizeBoardEdge+2);
% neighborhood offset
neighbor = [1 -1 sizeBoardEdge+2 -(sizeBoardEdge+2)];
tic
for i = 1:15000
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
%map of oldcolor locations
map(2:sizeBoardEdge+1,2:sizeBoardEdge+1) = board==oldColor;
%map of island
island = false(sizeBoardEdge+2);
island(2,2) = true;
%current island search map
currentIsland = island;
while any(any(currentIsland)) %search neighbors for at least one location
idx=find(currentIsland,1); %current location
currentIsland(idx)=false; %remove current location from search list
map(idx)=false; %remove current location from map
locs = idx + neighbor; %neighborhood locations
locs = locs(map(locs)); %beighborhood belong to the island
island(locs) = true; %set island location
currentIsland(locs) = true; %set current island search map
end
% island locations
vi = find(island(2:sizeBoardEdge+1,2:sizeBoardEdge+1));
end
toc
Wow, I am surprised that the pure MATLAB implementation was able to beat Matt’s MEX implementation, but it does, and at several sizes of boards and different densities.
There was a slight error where the row counter should go to sizeBoardEdge also, but that was easily changed. Here is the scoreboard.
Doug
Elapsed time is 3.086509 seconds.
Kenneth Eaton
Elapsed time is 1.720698 seconds.
Matt Fig
Elapsed time is 0.271876 seconds.
Sven final
Elapsed time is 1.665966 seconds.
Dimitry
Elapsed time is 0.256186 seconds.
Jos
Elapsed time is 0.713501 seconds.
Dimitry 2
Elapsed time is 0.125217 seconds.
Yi
Elapsed time is 0.713681 seconds.
Impressive. I am amazed at how much speed people are able to squeeze out of this.
I don’t think it really competes in terms of speed, but I was curious if Dimitry’s method could be done in a slightly more vectorized way. Here’s what I came up with if anyone else is curious:
sizeBoardEdge = 6;
numColors = 6;
v = -1; % fill value
ind = [1 -1 sizeBoardEdge+2 -sizeBoardEdge-2]; % offsets to linear index
% to reach 4-connected neighbors after image is padded 1 element on each
% side
tic
for i=1:15000
board = round(rand(sizeBoardEdge)*numColors);
oldBoard = board; % pad image so we can avoid checking edge conditions
board = ones(sizeBoardEdge+2)*-2;
board(2:end-1,2:end-1) = oldBoard;
oldColor = board(2,2);
board(2,2)=v; % start filling
q=[sizeBoardEdge+4 0]; % queue
n=1; % current queue index
N=2; % queue length
% queue loop
while n<N
match = board(q(n) + ind) == oldColor; % find all matching 4-connected neighbors
if any(match) % if any found...
l = sum(match);
q(N:N+l-1) = q(n)+ind(match); % add to queue (not preallocated)
board(q(N:N+l-1))=v; % fill corresponding image matrix elements
N=N+l;
end
n=n+1;
end
board = board(2:end-1,2:end-1); % de-pad image
vi = q - sizeBoardEdge-3; % de-pad indices
vi = vi - 2*floor(vi/(sizeBoardEdge+2));
vi = sort(vi(1:n-1)); % sort indices
end
toc
It runs in about .73 seconds on my machine, less if the sort at the end is not needed. Dimitry’s second post takes about .25 seconds for me.
Here is another ugly crack at it. I put it in a function.
function [idx] = connectorfig(A)
% Returns the indices of the 4-connected components of the first element of
% input matrix A.
A1 = A(1);
if A(2)~=A1 && A(1,2)~=A1
idx = 1;
return
end
[r,c] = size(A);
idx = zeros(r*c,1);
idx(1) = 1;
P = inf;
A(1) = inf;
bf = true;
while bf
bf = false;
% Examine first and last column.
for k = 2:c-1
if A(k,1)==A1 && (A(k-1,1)==P || A(k+1,1)==P || A(k,2)==P)
A(k,1) = P;
idx(k) = 1;
bf = true;
end
if A(k,c)==A1 && (A(k-1,c)==P || A(k+1,c)==P || A(k,c-1)==P)
A(k,c) = P;
idx(k + (c-1) * r) = 1;
bf = true;
end
end
% Examine first and last row.
for k = 2:r-1
if A(1,k)==A1 && (A(1,k-1)==P || A(1,k+1)==P || A(2,k)==P)
A(1,k) = P;
idx(1 + (k-1) * r) = 1;
bf = true;
end
if A(r,k)==A1 && (A(r,k-1)==P || A(r,k+1)==P || A(r-1,k)==P)
A(r,k) = P;
idx(r + (k-1) * r) = 1;
bf = true;
end
end
% Examine lower-left corner
if A(r,1)==A1 && (A(r-1,1)==P || A(r,2)==P)
A(r,1) = P;
idx(r) = 1;
bf = true;
end
% Examine lower-right corner
if A(r,c)==A1 && (A(r-1,c)==P || A(r,c-1)==P)
A(r,c) = P;
idx(r + (c-1) * r) = 1;
bf = true;
end
% Examine upper-right corner
if A(1,c)==A1 && (A(1,c-1)==P || A(2,c)==P)
A(1,c) = P;
idx(1 + (c-1) * r) = 1;
bf = true;
end
% Look at middle matrix.
for k = 2:r-1
for m = 2:c-1
if A(k,m)==A1
if A(k,m-1)==P || A(k,m+1)==P || A(k-1,m)==P || A(k+1,m)==P
A(k,m) = P;
idx(k + (m-1) * r) = 1;
bf = true;
end
end
end
end
end
idx = find(idx);
I still think there is an error in Jos’s code as I mentioned above. Also Dmitry’s code returns the wrong thing sometimes. It is probably a simple mistake on my part, but look at this one for example made by testing numColors = 2:
Dmitry’s code returns [1 2 7] when it should be: [1 2 3 7]. Can anyone verify that or refute it? I admit I may have made a mistake in putting his code into a function. Anyway, thanks for another great puzzler Doug!
Matt, you’re right. Insted of iterating the number of rows, I was iterating the number of colors, which lead to the defect when the number of colors isn’t equal to the number of rows. Sorry, I should have checked it better. here is the fixed version:
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currentlcol this take 3 seconds, would like to do significantlcol better
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
boardPath = zeros(sizeBoardEdge);
updated=1;
while updated==1
updated=0;
for row=1:sizeBoardEdge
for col=1:sizeBoardEdge
if row==1 && col==1 && boardPath(row,col)==0 ...
|| board(row,col)== oldColor && row>1 && boardPath(row-1,col)~=0 && boardPath(row,col)==0 ...
|| board(row,col)== oldColor && col>1 && boardPath(row,col-1)~=0 && boardPath(row,col)==0 ...
|| board(row,col)== oldColor && col<sizeBoardEdge && boardPath(row,col+1)~=0 && boardPath(row,col)==0 ...
|| board(row,col)== oldColor && row<numColors && boardPath(row+1,col)~=0 && boardPath(row,col)==0;
boardPath(row,col) = 1;
updated=1;
end
end
end
end
vi = find(boardPath==1);
end
toc
I think that Dmitri’s code fails; at least for boards as above.
Here’s my updated function, that seems to work now:
function [vi,w] = jos2(board)
% VI - indices for elements 4-connected to board(1)
% W - labeling of 4-connected elements (like bwlabel)
w = zeros(size(board)) ;
t = 1 ;
for r = 1:size(board,1),
for c = 1:size(board,2),
w(r,c) = t ;
if r > 1 && board(r,c)==board(r-1,c),
w(w==w(r,c)) = w(r-1,c) ;
end
if c > 1 && board(r,c)==board(r,c-1),
w(w==w(r,c)) = w(r,c-1) ;
end
t = t + 1 ;
end
end
vi = find(w == w(1)) ;
Here is a simplified version of Dmitry’s code. It should be faster.
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currentlcol this take 3 seconds, would like to do significantlcol better
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
updated=1;
while updated==1
updated=0;
for row=1:sizeBoardEdge
for col=1:sizeBoardEdge
if board(row,col)== oldColor && ((row==1 && col==1) ...
|| (row>1 && board(row-1,col)1 && board(row,col-1)<0) ...
|| (col<sizeBoardEdge && board(row,col+1)<0) ...
|| (row<sizeBoardEdge && board(row+1,col)<0))
board(row,col) = -1;
updated=1;
end
end
end
end
vi = find(board<0);
end
toc
The function version of the modified code is as follows.
function vi = colorgroup(board)
sizeBoardEdge=size(board,1);
oldColor = board(1,1);
% boardPath = zeros(sizeBoardEdge);
updated=1;
while updated==1
updated=0;
for row=1:sizeBoardEdge
for col=1:sizeBoardEdge
if board(row,col)== oldColor && ((row==1 && col==1) ...
|| (row>1 && board(row-1,col)1 && board(row,col-1)<0) ...
|| (col<sizeBoardEdge && board(row,col+1)<0) ...
|| (row<sizeBoardEdge && board(row+1,col)<0))
board(row,col) = -1;
updated=1;
end
end
end
end
vi = find(board<0);
I do not know why, the code actually displayed is different from what originally posted. Hope this time it is correct.
test code:
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currentlcol this take 3 seconds, would like to do significantlcol better
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
updated=1;
while updated==1
updated=0;
for row=1:sizeBoardEdge
for col=1:sizeBoardEdge
if board(row,col)== oldColor && ((row==1 && col==1) ...
|| (row>1 && board(row-1,col)1 && board(row,col-1)<0) ...
|| (col<sizeBoardEdge && board(row,col+1)<0) ...
|| (row<sizeBoardEdge && board(row+1,col)<0))
board(row,col) = -1;
updated=1;
end
end
end
end
vi = find(board<0);
end
toc
Function code:
function vi = colorgroup(board)
sizeBoardEdge=size(board,1);
oldColor = board(1,1);
% boardPath = zeros(sizeBoardEdge);
updated=1;
while updated==1
updated=0;
for row=1:sizeBoardEdge
for col=1:sizeBoardEdge
if board(row,col)== oldColor && ((row==1 && col==1) ...
|| (row>1 && board(row-1,col)1 && board(row,col-1)<0) ...
|| (col<sizeBoardEdge && board(row,col+1)<0) ...
|| (row<sizeBoardEdge && board(row+1,col)<0))
board(row,col) = -1;
updated=1;
end
end
end
end
vi = find(board<0);
yes, I simplified it even more and moved to logical variables. It is now 30% faster than the last version. Interesting that these simplifications have such an impact. This was quite a learning experience, thanks Doug!
I tried to replace the internal loop with an index variable (like i=1:sizeBoardEdge, not sure I use the right term) but failed. Perhaps somebody can do it? It should increase the performance even more.
sizeBoardEdge = 6;
numColors = 6;
tic
for i = 1:15000
%optimize contents of loop
%currentlcol this take 3 seconds, would like to do significantlcol better
board = round(rand(sizeBoardEdge)*numColors);
oldColor = board(1,1);
boardPath = false(sizeBoardEdge);
boardPath(1,1)=true;
updated=true;
while updated
updated=false;
for row=1:sizeBoardEdge
for col=1:sizeBoardEdge
if ~boardPath(row,col) && board(row,col) == oldColor && ...
(row>1 && boardPath(row-1,col) ...
|| col>1 && boardPath(row,col-1) ...
|| col<sizeBoardEdge && boardPath(row,col+1) ...
|| row<sizeBoardEdge && boardPath(row+1,col) );
boardPath(row,col) = true;
updated=true;
end
end
end
end
vi = find(boardPath);
end
toc
Yi, I’ve checked your idea to store the path as negative numbers in the same array, it increases the performance by another 1.7%. I would still vote for logical variables because I think that with them we can somehow make use of index variables and get rid of the inner loop. This might provide significant performance increase also I failed to realize this.
I think the reason for your code to be successful is due to the JIT acceleration. To achieve this, the code need to be as simple as possible. Any vectorization to get rid of the inner loop may deteriorate the performance.
Wow, a guy steps away for lunch and finds a dozen new comments. I think I am testing the latest and greatest code here:
Doug
Elapsed time is 2.849295 seconds.
Kenneth Eaton
Elapsed time is 1.535686 seconds.
Matt Fig
Elapsed time is 0.191490 seconds.
Sven final
Elapsed time is 0.506205 seconds.
Dimitry
Elapsed time is 0.492957 seconds.
Jos
Elapsed time is 0.272301 seconds.
Dimitry 2
Elapsed time is 0.187197 seconds.
Yi
Elapsed time is 0.305517 seconds.
Dimitri final
Elapsed time is 0.125054 seconds.
Yi final
Elapsed time is 0.147474 seconds.
Though, at this point the code seems to be “community” owned. This is impressive. I was just telling someone in the office that no matter how efficient you think your code is, there is someone out there that can make it a little faster!
JIT is also different for different versions of MATLAB, which is probably one reason it is not well documented. TMW doesn’t want you writing code specifically for the JIT which is subject to change.
For example, I get completely different results for the relative timings of the solutions to this current puzzle than does Doug. I suspect he is using the latest version of MATLAB, while I am using 2007a.
The MATLAB code beats a mexFunction because the JIT handles just pure MATLAB, and because the matrix is so small. With a dummy mexFunction that does nothing at all (just returning immediately), the total time is about the same as a mexFunction that computes the complete result.
So with a larger matrix, the loop overhead in M (which is not JIT-ed when using a mexFunction) becomes neglible.
With a 100-by-100 board, Yi’s solution takes 22 seconds whereas a simple mexFunction implementation takes 14 seconds. Of that total time, the rand statement takes all the time. The mexFunction itself takes just 0.16 seconds.
So you really have to be careful when looking at performance comparisons. I suggest you try a larger board, and time just the puzzle solution, not the time to construct the board.
What you will find as the board size grows, is that searching the entire board at each iteration takes far too much time. The asymptotic complexity is way to high.
Here’s my mexFunction solution. Note that the result is returned as a row vector (as Doug requested). It is not sorted (which was not in Doug’s specifications).
#include "mex.h"
void mexFunction
(
int nargout,
mxArray *pargout [ ],
int nargin,
const mxArray *pargin [ ]
)
{
double *board, color, *result ;
char *visited ;
mwSignedIndex *queue, m, n, mn, i, k, head = 0, tail = 0, east, west,
north, south, col ;
if (nargin != 1 || nargout > 1)
{
mexErrMsgTxt ("Usage: s = puzzle(board)\n") ;
}
board = mxGetPr (pargin [0]) ;
m = mxGetM (pargin [0]) ;
n = mxGetN (pargin [0]) ;
mn = m*n ;
if (n == 0 || m == 0)
{
mexErrMsgTxt ("board cannot be empty") ;
}
visited = mxCalloc (m*n, sizeof (char)) ;
queue = mxMalloc (m*n * sizeof (mwSignedIndex)) ;
/* place the corner element in the breadth-first-search queue */
head = 0 ;
tail = 0 ;
queue [tail++] = 0 ;
visited [0] = 1 ;
color = board [0] ;
while (head = 0 && !visited [east] && board [east] == color)
{
queue [tail++] = east ;
visited [east] = 1 ;
}
/* West neighbor */
west = i+m ;
if (west = 0 && north/m == col &&
!visited [north] && board [north] == color)
{
queue [tail++] = north ;
visited [north] = 1 ;
}
/* South neighbor */
south = i+1 ;
if (south/m == col && !visited [south] && board [south] == color)
{
queue [tail++] = south ;
visited [south] = 1 ;
}
}
/* return the result */
pargout [0] = mxCreateDoubleMatrix (1, tail, mxREAL) ;
result = mxGetPr (pargout [0]) ;
for (k = 0 ; k < tail ; k++)
{
result [k] = queue [k] + 1 ;
}
mxFree (visited) ;
mxFree (queue) ;
}
Argh. Posting code in a comment is nasty. The “>” didn’t come through correctly. There needs to be a better method at posting code snippets in a comment in a blog. You can see the solution here:
When I replaced your loop contents with the following, I got a time of around 2.79 seconds (compared to about 5.4 seconds for your above code):
Kenneth,
That does cut the time down well. The funny thing is, early on I used exactly that code and somehow convinced myself BWCONNCOMP was faster. OOPS! I must have changed some other stuff elsewhere too, or I was using larger board sizes at the time.
I still think/hope that someone will be able to make a custom MATLAB function that can really knock the time down.
Thanks,
Doug
I don’t have the image processing toolbox, but if I understand the puzzle correctly, this runs in .2 seconds.
sizeBoardEdge = 6; numColors = 6; tic for i = 1:15000 %optimize contents of loop %currently this take 3 seconds, would like to do significantly better board = round(rand(sizeBoardEdge)*numColors); connectedComponents = islandsmx(board); if connectedComponents(1,1) vi = find(connectedComponents == connectedComponents(1,1)); end end tocThe islands is on the file exchange, and was created as a result of one of your previous puzzlers!
Matt,
That is what I am talking about! I just added an else statement such that if no connected components are found, just return vi = 1;
This is going to be hard to beat. I like that the file was already made, and I missed it on the File Exchange. Now to plug this in to my larger solver.
Doug
Here’s a solution. I’m sure it can be improved upon, but it sits at around 0.4 seconds on my computer. It’s very, umm… problem-specific, with an extra if clause to help it speed up the first iteration.
sizeBoardEdge = 6; numColors = 6; tic for i = 1:15000 %optimize contents of loop %currently this take 3 seconds, would like to do significantly better board = round(rand(sizeBoardEdge)*numColors); oldColor = board(1,1); % [I, J] = find(board==oldColor); edgeIdxs = false(length(I),1); usedIdxs = edgeIdxs; edgeIdxs(1) = true; if any(I + J == 3) % Just a cheat for the first instance while any(edgeIdxs) oldEdgeIdxs = edgeIdxs; edgeIdxs(:) = false; for ii=find(oldEdgeIdxs)' testMask = abs(I - I(ii)) + abs(J - I(ii)); edgeIdxs(testMask==1 & ~usedIdxs) = 1; end usedIdxs = usedIdxs | oldEdgeIdxs; end vi = find(usedIdxs); else vi = 1; end end tocOh, and a minor improvement if you bring the code:
inside the if-statement.
Sven,
I tried this with the number of board colors set to three so that I would get bigger islands, and the function did not get the right answer. It was working for smaller islands that are produced in my original examples though.
I do not understand how this code works, but optimization like this often means the code is harder to understand! That is why I often say wait to optimize.
Thanks
-Doug
ps. If you post corrected code, I will just edit out this comment and your originals for clarity!
Haha, oops.
I had 1 typo, and one logico, the locations of which are in the code comments. Try this:
[I, J] = find(board==oldColor); if any(I + J == 3) % Just a cheat for the first instance edgeIdxs = false(length(I),1); usedIdxs = edgeIdxs; edgeIdxs(1) = true; while any(edgeIdxs) oldEdgeIdxs = edgeIdxs; edgeIdxs(:) = false; for ii=find(oldEdgeIdxs)' testMask = abs(I - I(ii)) + abs(J - J(ii)); % Site of the typo edgeIdxs(testMask==1) = 1; end usedIdxs = usedIdxs | oldEdgeIdxs; edgeIdxs = edgeIdxs & ~usedIdxs; end vi = sub2ind([sizeBoardEdge sizeBoardEdge], I(usedIdxs), J(usedIdxs)); % Site of the logico... before I forgot (sub2ind) else vi = 1; endYou’re welcome to leave my old stuff in as an example of what not to do. I agree that my current solution is also an example of overly specific code, that should really be avoided.
It works on the principle that, for 4-connectedness, a connected pixel is always adjacent to its connected neighbour. So if you use find() with [I,J] output, a connected pixel must have a difference of 0 rows and 1 column (or vice versa).
This is easily understood if you think about the shortcut I’m using. The only possible connected pixels to the (1,1) pixels must be at (1,2) or (2,1). Ie, the indices add up to 3 (or are different to (1,1) by a combined total of 1).
Phew, that’s still probably not that clear. Anyway, with the addition of my sub2ind() call at the end, it’s now a little bit slower.
Cheers,
Sven.
This code seems to runs 3.7 times faster, it takes about 0.64 sec to calculate:
sizeBoardEdge = 6; numColors = 6; tic for i = 1:15000 %optimize contents of loop %currentlcol this take 3 seconds, would like to do significantlcol better board = round(rand(sizeBoardEdge)*numColors); oldColor = board(1,1); boardPath = round(zeros(sizeBoardEdge)*numColors); updated=1; while updated==1 updated=0; for row=1:numColors for col=1:sizeBoardEdge if row==1 && col==1 && boardPath(row,col)==0; boardPath(row,col) = 1; updated=1; continue; end if board(row,col)== oldColor && row>1 && boardPath(row-1,col)~=0 && boardPath(row,col)==0; boardPath(row,col) = 1; updated=1; continue; end if board(row,col)== oldColor && col>1 && boardPath(row,col-1)~=0 && boardPath(row,col)==0; boardPath(row,col) = 1; updated=1; continue; end if board(row,col)== oldColor && col<sizeBoardEdge && boardPath(row,col+1)~=0 && boardPath(row,col)==0; boardPath(row, col) = 1; updated=1; continue; end if board(row,col)== oldColor && row<numColors && boardPath(row+1,col)~=0 && boardPath(row,col)==0; boardPath(row,col) = 1; updated=1; continue; end end end end vi = find(boardPath==1); end tocHere is a solution, that mimicks BWLABEL:
board = round(rand(sizeBoardEdge)*numColors); w = zeros(sizeBoardEdge) ; t = 1 ; for j=1:sizeBoardEdge, dobreak = 1 ; for i=1:sizeBoardEdge, w(i,j) = t ; if i > 1 && board(i,j)==board(i-1,j), w(w==w(i,j)) = w(i-1,j) ; dobreak = 0 ; end if j > 1 && board(i,j)==board(i,j-1), w(w==w(i,j)) = w(i,j-1) ; dobreak = 0 ; else t = t + 1 ; end end if dobreak, break ; end end vi1 = find(w==w(1)) ;On my (old) PC, 15000 runs with this algorithm took 0.67 seconds vs. 3.96 seconds using BWLABEL.
Jos
Jos,
It seems there is a problem in your code. For a matrix A, if the connected components are say index = [1 7 8], and also A(5)~=A(6), then your code returns only index 1.
For example:
A =
1 1 5 4 6 1
5 1 5 0 2 2
0 5 1 2 0 3
2 5 3 5 0 2
3 0 3 2 3 6
4 1 6 5 2 6
But now change A(5) = 4, and your code gets it.
So far the results are as such:
Doug
Elapsed time is 2.995547 seconds.
Kenneth Eaton
Elapsed time is 1.664112 seconds.
Matt Fig
Elapsed time is 0.227788 seconds.
Sven
Elapsed time is 0.602323 seconds.
Dimitry
Elapsed time is 0.487081 seconds.
Jos
Elapsed time is 0.267618 seconds.
As I play around with the size and composition of the matrix, I find that Matt’s works the fastest under most conditions.
There is still time for more entries!
Thanks
Doug
Essentially the same algo as before but I’ve joined all the if-s into one. This simple optimization speed this algo up 500%. Now it takes 0.16 sec on my pc vs. 0.66 sec before.
sizeBoardEdge = 6; numColors = 6; tic for i = 1:15000 %optimize contents of loop %currentlcol this take 3 seconds, would like to do significantlcol better board = round(rand(sizeBoardEdge)*numColors); oldColor = board(1,1); boardPath = round(zeros(sizeBoardEdge)*numColors); updated=1; while updated==1 updated=0; for row=1:numColors for col=1:sizeBoardEdge if row==1 && col==1 && boardPath(row,col)==0 ... || board(row,col)== oldColor && row>1 && boardPath(row-1,col)~=0 && boardPath(row,col)==0 ... || board(row,col)== oldColor && col>1 && boardPath(row,col-1)~=0 && boardPath(row,col)==0 ... || board(row,col)== oldColor && col<sizeBoardEdge && boardPath(row,col+1)~=0 && boardPath(row,col)==0 ... || board(row,col)== oldColor && row<numColors && boardPath(row+1,col)~=0 && boardPath(row,col)==0; boardPath(row,col) = 1; updated=1; end end end end vi = find(boardPath==1); end tocThis is a simple solution although may not be the fastest one.
sizeBoardEdge = 6; numColors = 6; % extended board to simplify programming map = false(sizeBoardEdge+2); % neighborhood offset neighbor = [1 -1 sizeBoardEdge+2 -(sizeBoardEdge+2)]; tic for i = 1:15000 board = round(rand(sizeBoardEdge)*numColors); oldColor = board(1,1); %map of oldcolor locations map(2:sizeBoardEdge+1,2:sizeBoardEdge+1) = board==oldColor; %map of island island = false(sizeBoardEdge+2); island(2,2) = true; %current island search map currentIsland = island; while any(any(currentIsland)) %search neighbors for at least one location idx=find(currentIsland,1); %current location currentIsland(idx)=false; %remove current location from search list map(idx)=false; %remove current location from map locs = idx + neighbor; %neighborhood locations locs = locs(map(locs)); %beighborhood belong to the island island(locs) = true; %set island location currentIsland(locs) = true; %set current island search map end % island locations vi = find(island(2:sizeBoardEdge+1,2:sizeBoardEdge+1)); end tocWow, I am surprised that the pure MATLAB implementation was able to beat Matt’s MEX implementation, but it does, and at several sizes of boards and different densities.
There was a slight error where the row counter should go to sizeBoardEdge also, but that was easily changed. Here is the scoreboard.
Doug
Elapsed time is 3.086509 seconds.
Kenneth Eaton
Elapsed time is 1.720698 seconds.
Matt Fig
Elapsed time is 0.271876 seconds.
Sven final
Elapsed time is 1.665966 seconds.
Dimitry
Elapsed time is 0.256186 seconds.
Jos
Elapsed time is 0.713501 seconds.
Dimitry 2
Elapsed time is 0.125217 seconds.
Yi
Elapsed time is 0.713681 seconds.
Impressive. I am amazed at how much speed people are able to squeeze out of this.
Doug
I don’t think it really competes in terms of speed, but I was curious if Dimitry’s method could be done in a slightly more vectorized way. Here’s what I came up with if anyone else is curious:
sizeBoardEdge = 6; numColors = 6; v = -1; % fill value ind = [1 -1 sizeBoardEdge+2 -sizeBoardEdge-2]; % offsets to linear index % to reach 4-connected neighbors after image is padded 1 element on each % side tic for i=1:15000 board = round(rand(sizeBoardEdge)*numColors); oldBoard = board; % pad image so we can avoid checking edge conditions board = ones(sizeBoardEdge+2)*-2; board(2:end-1,2:end-1) = oldBoard; oldColor = board(2,2); board(2,2)=v; % start filling q=[sizeBoardEdge+4 0]; % queue n=1; % current queue index N=2; % queue length % queue loop while n<N match = board(q(n) + ind) == oldColor; % find all matching 4-connected neighbors if any(match) % if any found... l = sum(match); q(N:N+l-1) = q(n)+ind(match); % add to queue (not preallocated) board(q(N:N+l-1))=v; % fill corresponding image matrix elements N=N+l; end n=n+1; end board = board(2:end-1,2:end-1); % de-pad image vi = q - sizeBoardEdge-3; % de-pad indices vi = vi - 2*floor(vi/(sizeBoardEdge+2)); vi = sort(vi(1:n-1)); % sort indices end tocIt runs in about .73 seconds on my machine, less if the sort at the end is not needed. Dimitry’s second post takes about .25 seconds for me.
Here is another ugly crack at it. I put it in a function.
I still think there is an error in Jos’s code as I mentioned above. Also Dmitry’s code returns the wrong thing sometimes. It is probably a simple mistake on my part, but look at this one for example made by testing numColors = 2:
Dmitry’s code returns [1 2 7] when it should be: [1 2 3 7]. Can anyone verify that or refute it? I admit I may have made a mistake in putting his code into a function. Anyway, thanks for another great puzzler Doug!
@ Matt (in reply 11 & 17):
You’re right. It can be fixed by changing “dobreak = 1″ into “dobreak = j>1″.
Matt, you’re right. Insted of iterating the number of rows, I was iterating the number of colors, which lead to the defect when the number of colors isn’t equal to the number of rows. Sorry, I should have checked it better. here is the fixed version:
sizeBoardEdge = 6; numColors = 6; tic for i = 1:15000 %optimize contents of loop %currentlcol this take 3 seconds, would like to do significantlcol better board = round(rand(sizeBoardEdge)*numColors); oldColor = board(1,1); boardPath = zeros(sizeBoardEdge); updated=1; while updated==1 updated=0; for row=1:sizeBoardEdge for col=1:sizeBoardEdge if row==1 && col==1 && boardPath(row,col)==0 ... || board(row,col)== oldColor && row>1 && boardPath(row-1,col)~=0 && boardPath(row,col)==0 ... || board(row,col)== oldColor && col>1 && boardPath(row,col-1)~=0 && boardPath(row,col)==0 ... || board(row,col)== oldColor && col<sizeBoardEdge && boardPath(row,col+1)~=0 && boardPath(row,col)==0 ... || board(row,col)== oldColor && row<numColors && boardPath(row+1,col)~=0 && boardPath(row,col)==0; boardPath(row,col) = 1; updated=1; end end end end vi = find(boardPath==1); end toc@ Matt, for some input your function (in reply 17) runs “forever”, for instance for:
A = [0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 1 0 1 0 1 0 ]I think that Dmitri’s code fails; at least for boards as above.
Here’s my updated function, that seems to work now:
function [vi,w] = jos2(board) % VI - indices for elements 4-connected to board(1) % W - labeling of 4-connected elements (like bwlabel) w = zeros(size(board)) ; t = 1 ; for r = 1:size(board,1), for c = 1:size(board,2), w(r,c) = t ; if r > 1 && board(r,c)==board(r-1,c), w(w==w(r,c)) = w(r-1,c) ; end if c > 1 && board(r,c)==board(r,c-1), w(w==w(r,c)) = w(r,c-1) ; end t = t + 1 ; end end vi = find(w == w(1)) ;Jos
I was refering to Dmitri’s code in reply 13, not 19 !
Here is a simplified version of Dmitry’s code. It should be faster.
sizeBoardEdge = 6; numColors = 6; tic for i = 1:15000 %optimize contents of loop %currentlcol this take 3 seconds, would like to do significantlcol better board = round(rand(sizeBoardEdge)*numColors); oldColor = board(1,1); updated=1; while updated==1 updated=0; for row=1:sizeBoardEdge for col=1:sizeBoardEdge if board(row,col)== oldColor && ((row==1 && col==1) ... || (row>1 && board(row-1,col)1 && board(row,col-1)<0) ... || (col<sizeBoardEdge && board(row,col+1)<0) ... || (row<sizeBoardEdge && board(row+1,col)<0)) board(row,col) = -1; updated=1; end end end end vi = find(board<0); end tocThe function version of the modified code is as follows.
function vi = colorgroup(board) sizeBoardEdge=size(board,1); oldColor = board(1,1); % boardPath = zeros(sizeBoardEdge); updated=1; while updated==1 updated=0; for row=1:sizeBoardEdge for col=1:sizeBoardEdge if board(row,col)== oldColor && ((row==1 && col==1) ... || (row>1 && board(row-1,col)1 && board(row,col-1)<0) ... || (col<sizeBoardEdge && board(row,col+1)<0) ... || (row<sizeBoardEdge && board(row+1,col)<0)) board(row,col) = -1; updated=1; end end end end vi = find(board<0);I do not know why, the code actually displayed is different from what originally posted. Hope this time it is correct.
test code:
sizeBoardEdge = 6; numColors = 6; tic for i = 1:15000 %optimize contents of loop %currentlcol this take 3 seconds, would like to do significantlcol better board = round(rand(sizeBoardEdge)*numColors); oldColor = board(1,1); updated=1; while updated==1 updated=0; for row=1:sizeBoardEdge for col=1:sizeBoardEdge if board(row,col)== oldColor && ((row==1 && col==1) ... || (row>1 && board(row-1,col)1 && board(row,col-1)<0) ... || (col<sizeBoardEdge && board(row,col+1)<0) ... || (row<sizeBoardEdge && board(row+1,col)<0)) board(row,col) = -1; updated=1; end end end end vi = find(board<0); end tocFunction code:
function vi = colorgroup(board) sizeBoardEdge=size(board,1); oldColor = board(1,1); % boardPath = zeros(sizeBoardEdge); updated=1; while updated==1 updated=0; for row=1:sizeBoardEdge for col=1:sizeBoardEdge if board(row,col)== oldColor && ((row==1 && col==1) ... || (row>1 && board(row-1,col)1 && board(row,col-1)<0) ... || (col<sizeBoardEdge && board(row,col+1)<0) ... || (row<sizeBoardEdge && board(row+1,col)<0)) board(row,col) = -1; updated=1; end end end end vi = find(board<0);yes, I simplified it even more and moved to logical variables. It is now 30% faster than the last version. Interesting that these simplifications have such an impact. This was quite a learning experience, thanks Doug!
I tried to replace the internal loop with an index variable (like i=1:sizeBoardEdge, not sure I use the right term) but failed. Perhaps somebody can do it? It should increase the performance even more.
It is corrupted again! It is really strange. The error is in line 13. This is what displyed:
Actually, it should be read as:
and
I see. It is because “<”. Here is the code:
sizeBoardEdge = 6; numColors = 6; tic for i = 1:15000 board = round(rand(sizeBoardEdge)*numColors); oldColor = board(1,1); updated=1; while updated==1 updated=0; for row=1:sizeBoardEdge for col=1:sizeBoardEdge if board(row,col)== oldColor && (row==1 && col==1 ... || row > 1 && board(row-1,col) < 0 ... || col > 1 && board(row,col-1) < 0 ... || col < sizeBoardEdge && board(row,col+1) < 0 ... || row < sizeBoardEdge && board(row+1,col) < 0) board(row,col) = -1; updated=1; end end end end vi = find(board<0); end tocAnd the function version:
function vi = colorgroup(board) sizeBoardEdge=size(board,1); oldColor = board(1,1); % boardPath = zeros(sizeBoardEdge); updated=1; while updated==1 updated=0; for row=1:sizeBoardEdge for col=1:sizeBoardEdge if board(row,col)== oldColor && (row==1 && col==1 ... || row>1 && board(row-1,col)<0 ... || col>1 && board(row,col-1)<0 ... || col<sizeBoardEdge && board(row,col+1)<0 ... || row<sizeBoardEdge && board(row+1,col)<0) board(row,col) = -1; updated=1; end end end end vi = find(board<0);Yi, I’ve checked your idea to store the path as negative numbers in the same array, it increases the performance by another 1.7%. I would still vote for logical variables because I think that with them we can somehow make use of index variables and get rid of the inner loop. This might provide significant performance increase also I failed to realize this.
Dmitry,
I think the reason for your code to be successful is due to the JIT acceleration. To achieve this, the code need to be as simple as possible. Any vectorization to get rid of the inner loop may deteriorate the performance.
Yi
Wow, a guy steps away for lunch and finds a dozen new comments. I think I am testing the latest and greatest code here:
Doug
Elapsed time is 2.849295 seconds.
Kenneth Eaton
Elapsed time is 1.535686 seconds.
Matt Fig
Elapsed time is 0.191490 seconds.
Sven final
Elapsed time is 0.506205 seconds.
Dimitry
Elapsed time is 0.492957 seconds.
Jos
Elapsed time is 0.272301 seconds.
Dimitry 2
Elapsed time is 0.187197 seconds.
Yi
Elapsed time is 0.305517 seconds.
Dimitri final
Elapsed time is 0.125054 seconds.
Yi final
Elapsed time is 0.147474 seconds.
Though, at this point the code seems to be “community” owned. This is impressive. I was just telling someone in the office that no matter how efficient you think your code is, there is someone out there that can make it a little faster!
-Doug
Yi, I didn’t know that matlab has a JIT compilation. Is that true? Are there some guidelines available to better make use of it?
Dmitry
JIT is one of things which are not well documented. You may search support website to get something to read.
Yi
Dmitry,
JIT is also different for different versions of MATLAB, which is probably one reason it is not well documented. TMW doesn’t want you writing code specifically for the JIT which is subject to change.
For example, I get completely different results for the relative timings of the solutions to this current puzzle than does Doug. I suspect he is using the latest version of MATLAB, while I am using 2007a.
The MATLAB code beats a mexFunction because the JIT handles just pure MATLAB, and because the matrix is so small. With a dummy mexFunction that does nothing at all (just returning immediately), the total time is about the same as a mexFunction that computes the complete result.
So with a larger matrix, the loop overhead in M (which is not JIT-ed when using a mexFunction) becomes neglible.
With a 100-by-100 board, Yi’s solution takes 22 seconds whereas a simple mexFunction implementation takes 14 seconds. Of that total time, the rand statement takes all the time. The mexFunction itself takes just 0.16 seconds.
So you really have to be careful when looking at performance comparisons. I suggest you try a larger board, and time just the puzzle solution, not the time to construct the board.
What you will find as the board size grows, is that searching the entire board at each iteration takes far too much time. The asymptotic complexity is way to high.
Here’s my mexFunction solution. Note that the result is returned as a row vector (as Doug requested). It is not sorted (which was not in Doug’s specifications).
#include "mex.h" void mexFunction ( int nargout, mxArray *pargout [ ], int nargin, const mxArray *pargin [ ] ) { double *board, color, *result ; char *visited ; mwSignedIndex *queue, m, n, mn, i, k, head = 0, tail = 0, east, west, north, south, col ; if (nargin != 1 || nargout > 1) { mexErrMsgTxt ("Usage: s = puzzle(board)\n") ; } board = mxGetPr (pargin [0]) ; m = mxGetM (pargin [0]) ; n = mxGetN (pargin [0]) ; mn = m*n ; if (n == 0 || m == 0) { mexErrMsgTxt ("board cannot be empty") ; } visited = mxCalloc (m*n, sizeof (char)) ; queue = mxMalloc (m*n * sizeof (mwSignedIndex)) ; /* place the corner element in the breadth-first-search queue */ head = 0 ; tail = 0 ; queue [tail++] = 0 ; visited [0] = 1 ; color = board [0] ; while (head = 0 && !visited [east] && board [east] == color) { queue [tail++] = east ; visited [east] = 1 ; } /* West neighbor */ west = i+m ; if (west = 0 && north/m == col && !visited [north] && board [north] == color) { queue [tail++] = north ; visited [north] = 1 ; } /* South neighbor */ south = i+1 ; if (south/m == col && !visited [south] && board [south] == color) { queue [tail++] = south ; visited [south] = 1 ; } } /* return the result */ pargout [0] = mxCreateDoubleMatrix (1, tail, mxREAL) ; result = mxGetPr (pargout [0]) ; for (k = 0 ; k < tail ; k++) { result [k] = queue [k] + 1 ; } mxFree (visited) ; mxFree (queue) ; }Argh. Posting code in a comment is nasty. The “>” didn’t come through correctly. There needs to be a better method at posting code snippets in a comment in a blog. You can see the solution here:
http://www.cise.ufl.edu/~davis/puzzle.c
Tim,
I always forget how to add code in blocks correctly, and the above didn’t work for me either. But if you look at the instructions here:
http://blogs.mathworks.com/videos/2008/08/18/puzzler-find-largest-connected-island/
Those seem to work (See post number 17 in the current blog comments.)
Yi, Matt, thanks for your comment. Tim, I like your method very much. It is beautifull.