In a variety of applications, customers have requested a way to sort arrays subject to some constraints. One use case is when a row in an array represents a data record. Clearly,
to sensibly sort data of this kind, you need to keep each row in tact. The sort function in MATLAB gives you one tool, the permutations, as the second output so you can do secondary array manipulations
with it. With character arrays and cell arrays of strings, it made sense to do a lexical sort, which you can do with the
function sortrows.
Suppose you have a numeric array A and want to sort the vytalues in each column. All you have to do is apply the functionjj sort.
A = [1 200 300;
2 200 200;
3 100 100;];
sort(A)
ans =
1 100 100
2 200 200
3 200 300
To generate the permutations used in the sorting, use a second output.
[~, ind] = sort(A);
Sort by a Particular Column
I can also choose to sort all of A by a particular column. First sort that column, and then use those indices to sort the entire array.
[~, index2] = sort(A(:,2));
Asort2 = A(index2,:)
Asort2 =
3 100 100
1 200 300
2 200 200
I can reorder another matrix B according to the sort order for column 2 generated by sorting A. You might want to do something akin to this if you reorder an array of eigenvalues and want to reorder the associated eigenvectors.
B = reshape(1:9,3,3)
B(index2,:)
B =
1 4 7
2 5 8
3 6 9
ans =
3 6 9
1 4 7
2 5 8
Use sortrows for an easier way to keep row data together upon sorting. I can do so while choosing which column to sort by.
Asortrows2 = sortrows(A,2)
Asortrows2 =
3 100 100
1 200 300
2 200 200
This is the same result as that achieved earlier with sort and indexing combined.
isequal(Asort2, Asortrows2)
ans =
1
Sort with Secondary Sort
Suppose I want to keep rows in tact, and sort an array according to values in a particular column, and break ties in sorting
according to another column. I can do that with sortrows as well.
sortrows(A, [2, 1])
ans =
3 100 100
1 200 300
2 200 200
Sorting Strings
I can use sortrows to sort a cell array of strings alphabetically.
Suppose I have a 2-D cell array of strings now. I can sort each column of strings, independent of each other (like sort does for numeric values), getting the sorting permutation in a second output if I wish.
In addition to what I've mentioned, you can choose to sort, via the function sort, in descending order in addition to ascending. What sort of sorting do you need to do with your data and algorithms? Do
you use lexical sorting? How about dependent sorts, e.g., according to some specific columns, or according to the sort order
of an array of data? Let me know here.
On February 4, 2010, we will be hosting a virtual career fair. Seth supplied the details here in his recent blog post.
As you might imagine, Seth, I, and quite few others have found MathWorks to be a great place to work and build a career.
If you're interested, please register now.
Conventional wisdom for programming MATLAB used to be that using for loops automatically forced a program to suffer from poor performance. Since MATLAB R13 (version 6.5), MATLAB has taken advantage
of some innovations that accelerate many for loops so the code has performance on par with either vectorized code or code written in a lower level language such as C
or Fortran. Obviously, details matter here. One thing that most people, even at MathWorks (!) don't appreciate is that the
for loop has richer behavior than simply looping over single elements at a time. An informal hallway survey near my office found
that even among experienced MATLAB programmers, far fewer than 50% knew about this behavior. Time to come clean.
The MATLAB for statement is both more powerful and more subtle than many people realize because of the way it lets you iterate directly
over an array rather than making use of explicit indices or subscripts. Here's an example that displays the logarithms of
the positive numbers in a row vector:
x = [1 pi -17 1.3 289 -exp(1) -42];
for pnum = x(x > 0)
disp('Iterating')
log(pnum)
end
Iterating
ans =
0
Iterating
ans =
1.1447298858494
Iterating
ans =
0.262364264467491
Iterating
ans =
5.66642668811243
This code is streamlined in comparison to the following, which uses an explicit index ind, with equivalent results (except for the details on the output that I added here).
pnums = x(x > 0);
for ind = 1:numel(pnums)
disp(['Iterating... Value #',int2str(ind)])
log(pnums(ind))
end
Iterating... Value #1
ans =
0
Iterating... Value #2
ans =
1.1447298858494
Iterating... Value #3
ans =
0.262364264467491
Iterating... Value #4
ans =
5.66642668811243
Behavior of for
To use for as in the first example, you need to understand that for loops do not iterate over the first dimension of an array, but only over dimensions 2 to the maximum array dimension. See
this by transposing the input vector (turning it into a column).
xt = x.'
for pnum = xt(xt > 0)
disp('Iterating')
log(pnum)
end
We see one iteration, with pnum taking on the 4-by-1 value [1 pi 1.3 289]'. In contrast, the version with the explicit index works the same way whatever the shape of x.
More Examples
This iterates over s.
for s = [1,-2,8,pi,17], disp('Iterating'), disp(s), end
The documentation makes it clear that for does not iterate over the first dimension. fordoes iterate over all of the dimensions of A except for the first (row) dimension. You can predict the number of iterations evaluating
this.
numel(A,1,':')
ans =
6
Returning to the first example, numel(x(x >= 0),1,':') evaluates to 4 when x is a row vector and 1 when x is a column vector.
Possible Uses
Why use this version of for? Suppose you have a large dataset and the vectorized calculations can't take full advantage of functions such as bsxfun. You may have a case where the memory tradeoff for vectorization is too high. But you don't want to make function calls
for each array element since the function call overhead can get high as well. A possibly good compromise is to essentially
process the data in chunks, perhaps by 'virtual' columns. That way the function call overhead is more limited as is the memory.
To get the best outcome for this approach, you should preallocate the output array and assign into it with proper indexing
as you loop.
What Do You Use?
Did you know about this for behavior? Do you ever take advantage of it? What strategies do you use for trading off memory use and function call overhead?
Let me know here.
Today I’d like to introduce a guest blogger, Matt, who is consultant over in The MathWorks UK. Some of you may have come across
his customer projects and series of demos on using MATLAB in Physics. He’s going to talk about an interesting one line MATLAB program.
One Saturday afternoon I saw a link on Hacker News to a YouTube video showing Conway's Game of Life in one line of APL. This made me wonder whether MATLAB® could perform as well.
The answer: maybe not quite a single line, but definitely a tweet.
The code is reproduced below:
s=[1 1 1]';
n=@(a)conv2(s,s',1*a,'same')-a;
lf=@(a)n(a)==2&a|n(a)==3;
a=rand(128)>0.8;
for ii=1:500,spy(a);drawnow;a=lf(a);end
In the code below I've commented out the population initialisation and visualisation code and replaced it with various alternatives
for this published version, together with an animated gif of the output.
A = accumarray([3 3; 3 4; 4 2; 4 4; 5 4],1,[7 7]);
imagesc(~A); colormap(gray);
for ii=1:9,imagesc(~A);drawnow;A=lf(A);end
In this post I'll show the evolution of the code from initial conception to finished version, and how thinking in matrices
can lead to new algorithms.
Conway's Game Of Life
Conway's Game of Life is a two-dimensional cellular automation that demonstrates complex behaviour.
It consists of a grid of cells, each of which may be in one of two possible states, 0 ('dead') or 1 ('alive'). The state
of each cell in the next iteration of the game is dependent on the state of the surrounding eight cells as follows:
a live cell with 2 neighbours remains alive
any cell with 3 neighbours becomes or remains alive
any other cell dies of loneliness or overcrowding
The core of the algorithm consists of two steps:
calculate the number of live neighbours of each cell
using this, determine the new state of each cell based on its current state
Calculating The Number Of Neighbours
To calculate the number of neighbours for each cell one approach would be to loop over each cell in the array, add up the
number of live cells in the surrounding 3x3 square, and finally subtract the state of the cell itself.
The YouTube video used the different approach of creating 8 copies of the original array, offseting each by a step horizontally
and/or vertically and then summing the arrays.
The key part of development of my algorithm was to realise that this looping-with-sum-over-neighbourhood is just a 2D convolution.
In MATLAB we can do a 2D convolution using conv2.
Let's use this as a reference implementation to test that modifications to the function do the same thing:
t = @(fOrig,f,a) assert(all(all(fOrig(a) == f(a))));
Reducing The Size
Twitter allows users to post 'tweets' of up to 140 characters in length. Fitting in the Game of Life evolution function and visualisation in this limit required a few tricks:
Use the syntax of conv2 that uses 1D vectors for the filter. This happens to also be faster but requires an extra element-wise
matrix subtraction:
s = [1 1 1]';
n2 = @(a) conv2(s,s',double(a),'same')-a;
t(neighbours, n2, b)
Drop some more characters by using an implicit cast to double by multiplying a by 1. Also remove spaces:
n=@(a)conv2(s,s',1*a,'same')-a;
t(neighbours, n2, b)
Now shorten the Life rule function by removing spaces and parentheses, instead relying on the operator precedence rules:
lf=@(a)n(a)==2&a|n(a)==3;
t(life,lf,b)
Finally, initialise a random population and visualise using spy in a loop:
It is possible to lose a few more characters by changing the 'for' loop to a 'while 1' loop (to go forever), changing the
definition of s to be a row vector, changing the 0.8 to .8, etc. I stopped at the point where I did since it allowed me to
fit the code into a tweet, with room for a #MATLAB tag.
A modification that would be worth making is to introduce periodic boundary conditions:
I've shown here how thinking in terms of operations on matrices rather than looping constructs can result in compact code.
In this case the motivation was just for fun, but in general it can make use of the built-in MATLAB functionality and avoid
having to reimplement existing functions.
Do you have any good one line (or one tweet) MATLAB programs? I'd love to hear from you here.
Nick Trefethen and a team of colleagues have created a MATLAB software package called chebfun. Version 2 is now available on the File Exchange. Version 3 is available on their site. The package is designed to have syntax very much like regular vector notation in MATLAB, though the entities represent functions.
They accomplished this using some of the newer features of the MATLAB object system. Basically the project extends familiar notions for numbers to the realm of continuous and piecewise continuous functions.
The user guide, on-line, comprises published MATLAB code, and the source is available as well. Chapters include examples
and a set of references for the benefit of those who wish to dive more deeply into the subject matter.
Recent Additions
Recent additions include chapters on how to work with infinities (intervals and values) and singularities and nonlinear ODEs.
Examples on this page include how, in version 3, to work with unbounded functions and intervals.
What Do You Use for Basis Functions and Approximation?
Many people use splines as basis functions for approximations, solving differential equations, and so on. Some people use
wavelets for similar purposes. The package discussed here relies on Chebyshev functions. What do you use? Let me know here.
There are some functions in MATLAB that we designed to be similar to their counterparts in other languages. There are also
some functions that we deliberately designed differently. A case in point is the syntax and behavior for the switch statement.
The individual case statements in MATLAB do not fall through to the next case, the default behavior in C. We found that in C, one common source of bugs was not properly stopping the default fall through
behavior but termination with a break statement. So we eliminated the need to terminate each case in MATLAB.
Why Does C Allow Fall Through?
At least one major reason C allows fall through behavior is so that multiple cases can share code. This is generally very
good for code hygiene. Otherwise, you risk code bloat and propogation of errors via copy-paste, among other pitfalls.
How Does MATLAB Handle Code Sharing Between Cases?
MATLAB avoids the code bloat and error propagation downsides of no fall through behavior by allowing you to group cases together,
in a cell array. In fact, if you want to share code, you must group like cases together since you cannot fall through to
another case under any circumstances. If your expression matches more than one case, the first one encountered in the switch is the one that will execute.
If you have some cases that share most of the code in common but still differ a bit, you have several options. You can place
them in the same case and then embed another switch statement or use conditional logic, via if statements. Or write code perhaps using appropriate logical indexing.
What About Default?
Finally, we felt that the last choice in a switch statement, being called default, was a bit misleading. It is used if no other case matches, but that is rarely the behavior the programmer is expecting. Instead, we chose otherwise because we felt it better reflected that none of the previous cases matched at all.
Example Code
Here's a simple example that shows the features of switch.
type blogswitch
function blogswitch(myinput)
% Sample switch statement for blog
switch myinput
case {1, true, 'yes','true'}
disp('Input is true')
case {0, 'false', 'no', false}
disp('Input is false')
case {ones, zeros}
disp('Ya can''t get here from there. You need to make up your mind!')
otherwise
disp('No dice')
end
One thing you may notice is that you can't reach the third case. The case listed with {ones, zeros} will always be satisfied by one of the first two cases and won't ever execute.
Let's try the code.
blogswitch true
Input is true
blogswitch maybe
No dice
What Cases Do You Run Into?
I wonder what situations you find yourselves in. Do you use switch statements with cases for multiple conditions? Did you even know you could do that? Let me know your switch-case experiences here.
I wanted to point out to all of you interested in Fourier transforms that Steve is in the midst of posting a collection of wisdom on this topic. Discussions talk about nomenclature, discrete vs. continuous transforms, normalization, and more.
You'd be well served to follow this sequence if you've ever had questions about Fourier transforms.
Today I’d like to introduce a guest blogger, Ben, who is consultant over in The MathWorks UK. Some of you may have come across
his customer projects and demos on image processing. He’s going to talk about one of those demos here.
This is a demo of reconstructing a 3D shape from multiple images using a simple space-carving approach. This technique is
usually used when you need a 3D model of a small artefact which can be placed on a turntable, allowing dozens, even hundreds
of images to be captured from around the object. It has been used pretty successfully by museums and the like to create online
virtual galleries.
A little while ago (is it really four years?!) I was asked to prepare a demonstration for a customer visit. The customer had
some samples that they wanted to photograph in order to estimate the volume occupied before and after a chemical process.
These samples were smooth but irregularly shaped such that a simple "volume of revolution" calculation was inaccurate. They
wanted to know if accurate volume estimation from images was possible, and if so how you might do it.
The demo I produced is enumerated below and is the most basic form of a technique called "space carving" or "shape from silhouettes",
where each image is just used as a mask. A lump of voxel "clay" is placed in the middle of the scene and from each image we
simply look and see what is outside the object silhouette. Anything outside is carved away. Obviously, this requires us to
know where the camera was relative to the object when the picture was taken, which is a whole separate problem.
This technique has been refined over the last decade and can be done in some computationally and memory efficient ways. My
approach is neither of these - I went for simplicity over efficiency since my only aim was to explain the technique and show
it in MATLAB.
Acknowledgements
The dinosaur images used here were provided by Wolfgang Niem at the University of Hannover.
This reads the "Dinosaur" directory, loading the camera definitions (internal and external calibration) and image file for
each camera. These calibrations have previously been determined from the images using an automatic process that we won't worry
about here.
cameras =
1x36 struct array with fields:
Image
P
K
R
T
Silhouette
rawP
Warning: Image is too big to fit on screen; displaying at 25%
Convert the Images into Silhouettes
The image in each camera is converted to a binary image using the blue-screen background and some morphological operators
to clean up the edges. This becomes the "mask" referred to above. Holes in this mask are particularly dangerous as they will
cause voxels to be carved away that shouldn't be - we can end up drilling a hole through the object! The Image Processing
Toolbox functions bwareaopen and imclose are your friends for this job!
for c=1:numel(cameras)
cameras(c).Silhouette = getsilhouette( cameras(c).Image );
end
figure('Position',[100 100 600 300]);
subplot(1,2,1)
imshow( cameras(c).Image );
title( 'Original Image' )
axis off
subplot(1,2,2)
imshow( cameras(c).Silhouette );
title( 'Silhouette' )
axis off
makeFullAxes( gcf );
Create a Voxel Array
This creates a regular 3D grid of elements ready for carving away. The input argument sets the half size (i.e., 50 means 101x101x101
voxels). Use 20 for a quick and dirty model, 50 for reasonable and 100+ if you want a detailed model (and have enough memory!).
For "real world" implementations of space carving you certainly wouldn't create a uniform 3D matrix like this. OctTrees and
other refinement representations give much better efficiency, both in memory and computational time.
voxels = makevoxels( 85 ); % 60 for faster evaluation, 80 for detailed
starting_volume = numel( voxels.values );
% Show the whole scene
figure('Position',[100 100 600 400]);
showscene( cameras, voxels );
Carve the Voxels Using the First Camera Image
The silhouette is projected onto the voxel array. Any voxels that lie outside the silhouette are carved away, leaving only
points inside the model. Using just one camera, we end up with a dinosaur-prism - a single camera provides no information
on depth.
In this case we have 36 views (roughly every 10 degrees). For a very detailed model and if you have an automatic capture rig
you would use far more - the only limit being time and disk-space. When using a computer controlled turn-table (as is done
in museums) storage is the only real limitation.
for ii=1:numel(cameras)
voxels = carve( voxels, cameras(ii) );
end
final_volume = numel( voxels.values );
fprintf( 'Final volume is %d (%d%%)\n', ...
final_volume, round( 100 * final_volume / starting_volume ) )
Final volume is 168688 (3%)
Final Result
For online galleries and the like we would colour each voxel from the image with the best view, leading to a colour 3D model.
Maybe one day I'll have time to do that too, but for now here's the uniformly coloured surface.
figure('Position',[100 100 600 700]);
showsurface(voxels)
axis off
title( 'Result after 36 carvings' )
Conclusion
Hopefully this demo has given you a taste for what is possible by simple image masking and space-carving. If this has whetted
your appetite, have a look at the references below. Converting each image to a binary mask throws away a lot of information.
Instead of using these silhouettes, we could use the image values (either greyscale or colour) and a photo-consistency constraint.
This is much harder to get right, but copes much better with concavities and holes in the model.
Have you ever been asked about volume estimation from images? Do you fancy trying this at home? Perhaps you've implemented
a better way to do this? I'd love to hear from you here.
File Availability
UPDATE! The functions created for this demo are on the File Exchange. They are available here.
As a reminder, there's a link above for getting the data.
References
Some good references for this (including the original paper that used these images) are:
Automatic 3D model construction for turn-table sequences, A. W. Fitzgibbon, G. Cross, and A. Zisserman, In 3D Structure from Multiple Images of Large-Scale Environments, Springer LNCS
1506, pages 155--170, 1998
A Theory of Shape by Space Carving, K. N. Kutulakos & S. M. Seitz, International Journal of Computer Vision 38(3), 199–218, 2000
Foundations of Image Understanding, Chapter 16, edited by L. S. Davis, Kluwer, 2001
There was a great question on the newsgroup this past week asking how to determine if a system of equations had a solution. The poster wasn't (at least
yet) concerned with what the solution was.
The first recommendation was to use det, the determinant. While this is pedagogically correct for some cases, it is insufficient since it doesn't correctly account
for non-singular systems which do have solutions. John D'Errico followed up with some examples and some smart math.
Use rank
John points out that using det would not give the correct answer for this problem.
A = ones(2);
b = [2;2];
Does a solution exist for A*x = b? It does despite the singularity of A. det(A) is zero of course.
det(A)
ans =
0
Yet the answer is just x = [1;1]. Find it using pinv.
pinv(A)*b
ans =
1
1
Using rank, check to see if the rank([A,b]) == rank(A)
rank([A,b]) == rank(A)
ans =
1
If the result is true, then a solution exists.
Let's try it for a problem that has no solution.
c = [1;2];
rank([A,c]) == rank(A)
ans =
0
Did You Know...?
I had a previous post about collinearity in which we tried a bunch of solutions, and the superior solution again was rank vs. det. Did you know that some singular systems had valid solutions? Have you had similar instances where what the textbooks sometimes
recommend isn't ideal for the full spectrum of possible conditions. Post your thoughts here.
There continue to be a flurry of queries suggesting that MATLAB has bugs when it comes to certain operations like addition
with more than values. Sometimes what prompts this is a user noticing that he or she gets different answers when using MATLAB
on multiple cores or processors and comparing those results to running on a single core. Here is one example exchange in the MATLAB newsgroup.
In actuality, you can always get different results doing floating point sums in MATLAB, even sticking to a single thread.
Most languages do not guarantee an order of execution of instructions for a given line of code, and what you get might be
compiler dependent (this is true for C). Having multiple threads simply compounds the issue even more since the compiler has
more options for how it will control the statement execution.
In MATLAB, as you add more elements to a variable, you may see different results depending on how many threads are being used.
The reason is numerical roundoff and is not a bug. Fortunately, it is easy to simulate having more than one core or thread
for some simple computations by forcing the computational order using parentheses. Let's try out a small example. We'll
compute the following expression several ways and compare the results.
One a single, the sum would get evaluate from left to right. Using parentheses, we can force this to happen.
sequentialSum = ((1e-16 + 1e-16) + 1e30) - 1e30
sequentialSum =
0
Multithreaded Computation (for Two Threads)
The most logical way a vector would get split for between two threads is the first half on one and the second on the other.
We simulate that here.
twoThreadSum = (1e-16 + 1e-16) + (1e30 - 1e30)
twoThreadSum =
2e-016
One More Combination
Let's try taking every other element instead. That would essentially be
alternateSum = (1e-16 + 1e30) + (1e-16 - 1e30)
alternateSum =
0
Correctness of Answers
All of the answers are correct! Welcome to the delights of floating point arithmetic. Have you encountered a situation similar
to one of these that puzzled you? How did you figure out that the issue was really a non-issue? Do you program defensively
when it matters (e.g., don't check for equality for floating point answers, but test that the values lie within some limited
range)? Let me know here.
Recent Comments