Skip to Main Content Skip to Search
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Loren on the Art of MATLAB

March 15th, 2007

Selecting Data and Surrounding Values

I recently was working with Dave Schoppik on extracting a subset of elements from his dataset. In his lab, they work to understand the ways our brains enable us to move. Here's Dave's description.

  We record the activity of single neurons in the cortex of monkeys
  while the monkeys move their eyes around.  Neural activity is
  represented by the binary vector, usually sampled at 1kHz -- a "1"
  indicates that the neuron was active, while a "0" indicates that the
  neuron was quiet (for our purposes, individual neurons are either 100%
  active or they are not).
  The eye movements, also sampled at 1kHz are the vector of doubles.
  When we want to know what sort of eye movements are associated with
  neural activity, the most efficient computation we can perform is to
  simply convolve the two vectors; the appropriate segment of the
  convolution tells us the sum of the eye movement vector associated
  with the "true" values in the neural vector. There are a number of
  analyses, though, that need all the segments of the eye movements that
  are associated with the neural activity.
  I should mention that there are between 10,000-100,000 "true" values
  in the array of neural activity, and that the vector of eye movements
  can run for 10-30 minutes, so we're talking about a whole lot of data.

So Dave has lots of data, and using repmat becomes expensive in both time and memory. Resorting to double for loops is a bit slow though.

Contents

Vectorizing Selection of Chunks with repmat

Suppose we want to select samples near some known locations, including the exact sample plus some surrounding values. Assuming the values we want are not too near the ends (so I don't have to do any error checking here!), we could achieve this using repmat.

wb = 2
wa = 2

binvec = logical([0 0 0 1 0 0 0 1 0 1 0 0 0])
doubvec = [1 3 5 9 8 7 1 2 0 3 9 5 0]
desired = [3 5 9 8 7 ; 7 1 2 0 3 ; 2 0 3 9 5]
wb =
     2
wa =
     2
binvec =
     0     0     0     1     0     0     0     1     0     1     0     0     0
doubvec =
     1     3     5     9     8     7     1     2     0     3     9     5     0
desired =
     3     5     9     8     7
     7     1     2     0     3
     2     0     3     9     5

An algorithm that works is to find the locations of the points of interest and add a base vector to that to get all the locations.

base = -wb:wa
idx = find(binvec == true)'
locations = repmat(base,length(idx),1) + repmat(idx,1,length(base))
base =
    -2    -1     0     1     2
idx =
     4
     8
    10
locations =
     2     3     4     5     6
     6     7     8     9    10
     8     9    10    11    12

Then just index into the array.

repout = doubvec(locations)
repout =
     3     5     9     8     7
     7     1     2     0     3
     2     0     3     9     5

Check that we got the right output.

isequal(desired, repout)
ans =
     1

Remember though, Dave said he may have 100,000 points of interest. So we need to create 2 intermediate arrays of dimensions (length(base), # of points), which can be sizable.

Vectorizing Selection of Chunks with bsxfun

Instead of creating these intermediate arrays, we can use one of the new features in R2007a, bsxfun. bsxfun stands for binary singleton expansion, and allows me to create an array from two other arrays, performing singleton expansion along dimensions where only one has non-scalar value. It does this by marching down the columns (even for N dimensions) so the output array is created in contiguous chunks.

careabout = bsxfun(@plus,base,idx)
bsxout = doubvec(careabout);
isequal(desired, bsxout)
isequal(locations,careabout)
careabout =
     2     3     4     5     6
     6     7     8     9    10
     8     9    10    11    12
ans =
     1
ans =
     1

Comparing 3 Methods: Preallocated for loop, fsxfun, and a MEX-file

Dave graciously compared code written in a C MEX-file with bsxfun. You can try out the code with some of his data by downloading files here, including data, C code for the MEX-file (taken from colleague Katherine Nagel before it was fully polished for critiquing). Timings were done in R2007a on a G5 desktop (2x2.5GHz, 4GB RAM).

wb = 1000;
wa = 1000;
fortime = 13.5
bsxtime = 5.67
mextime = 0.99
fortime =
   13.5000
bsxtime =
    5.6700
mextime =
    0.9900

Using bsxfun, the speed improvement over a for loop with preallocated output is a factor of 2.4. You can get even more improvement, at least for the simple sum we are doing here, by using a MEX-file. Looking at the C code, I see that the entire output is calculated inside a double loop, using C's + operator. The C code doesn't make a call to a general function, unlike bsxfun, and, in fact, doesn't make a direct library call at all. As mentioned above, bsxfun marches over the columns, calling the specified function for each column. In being general, bsxfun does this for any function you might choose that's binary, including an M-file you might have written or using an overloaded method for an operator, if you've done that. bsxfun trades off speed for this extra flexibility. Now, with bsxfun in MATLAB, there's an option where memory doesn't have to be one of the trade-offs you consider.

More Brain Background Information

From Dave again:

  There's a nice illustration of what the Lisberger lab thinks about
  here: http://www.schoppik.com/data/articles/jneurosci2006.html, which
  I point you to for your enjoyment, as well as because the MEX code I'd
  mentioned that we use to solve the problem was written by the woman
  who drew the illustration.

Other Uses for bsxfun?

Does anyone else yet have an interesting story to tell about using the new function bsxfun? Tell us here.


Get the MATLAB code

Published with MATLAB® 7.4

Leave a Reply


Loren Shure works on design of the MATLAB language at The MathWorks. She writes here about once a week on MATLAB programming and related topics.

  • Ulla Vainio: That error bar width adjustment was extremely useful and I would never have figured it out myself....
  • Peter Perkins: Jessee, there is a property that you can use to tag variables with units. For example, >> load...
  • Jessee: I could potentially see myself using dataset for casually looking at data, but from an application standpoint...
  • Loren: Oktay- It very much depends on the details of the calculations you are doing. Vectorization can sometimes...
  • Oktay: Hello, Is there any significant difference between using: - Vectorization inside a subfunction - Benefiting...
  • Loren: Clare- Yes, sum can sum a double vector: x = [.3 .4 pi/3] y = sum(x) x = 0.3 0.4 1.0472 y = 1.7472 You must...
  • Clare J: R2007a - Student Version When I use sum to sum a vector of type double I get this error message: ???...
  • Sarah Zaranek: Hi Jacob, Sorry about the slow response. You are correct that the code would be slower without the...
  • Navaneethan Santhanam: Thanks a lot, Loren! That worked perfectly.
  • Mike N: Should it be OK to use “persistent 221; variables in a deployed application? What if I have two...

These postings are the author's and don't necessarily represent the opinions of The MathWorks.

Related Topics