{"id":84,"date":"2007-03-15T14:36:27","date_gmt":"2007-03-15T19:36:27","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/2007\/03\/15\/selecting-data-and-surrounding-values\/"},"modified":"2016-08-02T12:58:24","modified_gmt":"2016-08-02T17:58:24","slug":"selecting-data-and-surrounding-values","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2007\/03\/15\/selecting-data-and-surrounding-values\/","title":{"rendered":"Selecting Data and Surrounding Values"},"content":{"rendered":"<div class=\"content\">\n<p>I recently was working with Dave Schoppik on extracting a subset of elements from his dataset. In his lab, they work to understand<br \/>\nthe ways our brains enable us to move. Here's Dave's description.<\/p>\n<pre>  We record the activity of single neurons in the cortex of monkeys\r\n  while the monkeys move their eyes around.  Neural activity is\r\n  represented by the binary vector, usually sampled at 1kHz -- a \"1\"\r\n  indicates that the neuron was active, while a \"0\" indicates that the\r\n  neuron was quiet (for our purposes, individual neurons are either 100%\r\n  active or they are not).<\/pre>\n<pre>  The eye movements, also sampled at 1kHz are the vector of doubles.\r\n  When we want to know what sort of eye movements are associated with\r\n  neural activity, the most efficient computation we can perform is to\r\n  simply convolve the two vectors; the appropriate segment of the\r\n  convolution tells us the sum of the eye movement vector associated\r\n  with the \"true\" values in the neural vector. There are a number of\r\n  analyses, though, that need all the segments of the eye movements that\r\n  are associated with the neural activity.<\/pre>\n<pre>  I should mention that there are between 10,000-100,000 \"true\" values\r\n  in the array of neural activity, and that the vector of eye movements\r\n  can run for 10-30 minutes, so we're talking about a whole lot of data.<\/pre>\n<p>So Dave has lots of data, and using <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/repmat.html\"><tt>repmat<\/tt><\/a> becomes expensive in both time and memory. Resorting to double for loops is a bit slow though.<\/p>\n<p>&nbsp;<\/p>\n<h3>Contents<\/h3>\n<div>\n<ul>\n<li><a href=\"#1\">Vectorizing Selection of Chunks with repmat<\/a><\/li>\n<li><a href=\"#6\">Vectorizing Selection of Chunks with bsxfun<\/a><\/li>\n<li><a href=\"#7\">Comparing 3 Methods: Preallocated for loop, fsxfun, and a MEX-file<\/a><\/li>\n<li><a href=\"#9\">More Brain Background Information<\/a><\/li>\n<li><a href=\"#10\">Other Uses for bsxfun?<\/a><\/li>\n<\/ul>\n<\/div>\n<h3>Vectorizing Selection of Chunks with repmat<a name=\"1\"><\/a><\/h3>\n<p>Suppose we want to select samples near some known locations, including the exact sample plus some surrounding values. Assuming<br \/>\nthe values we want are not too near the ends (so I don't have to do any error checking here!), we could achieve this using<br \/>\n<tt>repmat<\/tt>.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">wb = 2\r\nwa = 2\r\n\r\nbinvec = logical([0 0 0 1 0 0 0 1 0 1 0 0 0])\r\ndoubvec = [1 3 5 9 8 7 1 2 0 3 9 5 0]\r\ndesired = [3 5 9 8 7 ; 7 1 2 0 3 ; 2 0 3 9 5]<\/pre>\n<pre style=\"font-style: oblique;\">wb =\r\n     2\r\nwa =\r\n     2\r\nbinvec =\r\n     0     0     0     1     0     0     0     1     0     1     0     0     0\r\ndoubvec =\r\n     1     3     5     9     8     7     1     2     0     3     9     5     0\r\ndesired =\r\n     3     5     9     8     7\r\n     7     1     2     0     3\r\n     2     0     3     9     5\r\n<\/pre>\n<p>An algorithm that works is to find the locations of the points of interest and add a base vector to that to get all the locations.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">base = -wb:wa\r\nidx = find(binvec == true)'\r\nlocations = repmat(base,length(idx),1) + repmat(idx,1,length(base))<\/pre>\n<pre style=\"font-style: oblique;\">base =\r\n    -2    -1     0     1     2\r\nidx =\r\n     4\r\n     8\r\n    10\r\nlocations =\r\n     2     3     4     5     6\r\n     6     7     8     9    10\r\n     8     9    10    11    12\r\n<\/pre>\n<p>Then just index into the array.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">repout = doubvec(locations)<\/pre>\n<pre style=\"font-style: oblique;\">repout =\r\n     3     5     9     8     7\r\n     7     1     2     0     3\r\n     2     0     3     9     5\r\n<\/pre>\n<p>Check that we got the right output.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">isequal(desired, repout)<\/pre>\n<pre style=\"font-style: oblique;\">ans =\r\n     1\r\n<\/pre>\n<p>Remember though, Dave said he may have 100,000 points of interest. So we need to create 2 intermediate arrays of dimensions<br \/>\n<tt>(length(base), # of points)<\/tt>, which can be sizable.<\/p>\n<h3>Vectorizing Selection of Chunks with bsxfun<a name=\"6\"><\/a><\/h3>\n<p>Instead of creating these intermediate arrays, we can use one of the new features in R2007a, <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/bsxfun.html\"><tt>bsxfun<\/tt><\/a>. <tt>bsxfun<\/tt> stands for binary singleton expansion, and allows me to create an array from two other arrays, performing singleton expansion<br \/>\nalong dimensions where only one has non-scalar value. It does this by marching down the columns (even for N dimensions) so<br \/>\nthe output array is created in contiguous chunks.<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">careabout = bsxfun(@plus,base,idx)\r\nbsxout = doubvec(careabout);\r\nisequal(desired, bsxout)\r\nisequal(locations,careabout)<\/pre>\n<pre style=\"font-style: oblique;\">careabout =\r\n     2     3     4     5     6\r\n     6     7     8     9    10\r\n     8     9    10    11    12\r\nans =\r\n     1\r\nans =\r\n     1\r\n<\/pre>\n<h3>Comparing 3 Methods: Preallocated for loop, fsxfun, and a MEX-file<a name=\"7\"><\/a><\/h3>\n<p>Dave graciously compared code written in a C MEX-file with <tt>bsxfun<\/tt>. You can try out the code with some of his data by downloading files here, including data, C code for the MEX-file (taken from colleague Katherine Nagel before it was fully polished for critiquing).<br \/>\nTimings were done in R2007a on a G5 desktop (2x2.5GHz, 4GB RAM).<\/p>\n<pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid #c8c8c8;\">wb = 1000;\r\nwa = 1000;\r\nfortime = 13.5\r\nbsxtime = 5.67\r\nmextime = 0.99<\/pre>\n<pre style=\"font-style: oblique;\">fortime =\r\n   13.5000\r\nbsxtime =\r\n    5.6700\r\nmextime =\r\n    0.9900\r\n<\/pre>\n<p>Using <tt>bsxfun<\/tt>, the speed improvement over a for loop with preallocated output is a factor of 2.4. You can get even more improvement, at<br \/>\nleast for the simple sum we are doing here, by using a MEX-file. Looking at the C code, I see that the entire output is calculated<br \/>\ninside a double loop, using C's <tt>+<\/tt> operator. The C code doesn't make a call to a general function, unlike <tt>bsxfun<\/tt>, and, in fact, doesn't make a direct library call at all. As mentioned above, <tt>bsxfun<\/tt> marches over the columns, calling the specified function for each column. In being general, <tt>bsxfun<\/tt> does this for any function you might choose that's binary, including an M-file you might have written or using an overloaded<br \/>\nmethod for an operator, if you've done that. <tt>bsxfun<\/tt> trades off speed for this extra flexibility. Now, with <tt>bsxfun<\/tt> in MATLAB, there's an option where memory doesn't have to be one of the trade-offs you consider.<\/p>\n<h3>More Brain Background Information<a name=\"9\"><\/a><\/h3>\n<p>From Dave again:<\/p>\n<pre>  There's a nice illustration of what the Lisberger lab thinks about\r\n  here: http:\/\/www.schoppik.com\/data\/articles\/jneurosci2006.html, which\r\n  I point you to for your enjoyment, as well as because the MEX code I'd\r\n  mentioned that we use to solve the problem was written by the woman\r\n  who drew the illustration.<\/pre>\n<h3>Other Uses for bsxfun?<a name=\"10\"><\/a><\/h3>\n<p>Does anyone else yet have an interesting story to tell about using the new function <tt>bsxfun<\/tt>? Tell us <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=84\">here<\/a>.<\/p>\n<p><script>\/\/ <![CDATA[\nfunction grabCode_fc2bb66e52f24138be24136a6ed77cbd() {\n        \/\/ Remember the title so we can use it in the new page\n        title = document.title;\n\n        \/\/ Break up these strings so that their presence\n        \/\/ in the Javascript doesn't mess up the search for\n        \/\/ the MATLAB code.\n        t1='fc2bb66e52f24138be24136a6ed77cbd ' + '##### ' + 'SOURCE BEGIN' + ' #####';\n        t2='##### ' + 'SOURCE END' + ' #####' + ' fc2bb66e52f24138be24136a6ed77cbd';\n    \n        b=document.getElementsByTagName('body')[0];\n        i1=b.innerHTML.indexOf(t1)+t1.length;\n        i2=b.innerHTML.indexOf(t2);\n \n        code_string = b.innerHTML.substring(i1, i2);\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\n\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \n        \/\/ in the XML parser.\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\n        \/\/ doesn't go ahead and substitute the less-than character. \n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\n\n        author = 'Loren Shure';\n        copyright = 'Copyright 2007 The MathWorks, Inc.';\n\n        w = window.open();\n        d = w.document;\n        d.write('\n\n\n\n\n\n<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\n\n\n\n\n\n\n\\n');\n      \n      d.title = title + ' (MATLAB code)';\n      d.close();\n      }\n\/\/ ]]><\/script><\/p>\n<p style=\"text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;\"><a><span style=\"font-size: x-small; font-style: italic;\">Get<br \/>\nthe MATLAB code<br \/>\n<noscript>(requires JavaScript)<\/noscript><\/span><\/a><\/p>\n<p>Published with MATLAB\u00ae 7.4<\/p>\n<\/div>\n<p><!--\nfc2bb66e52f24138be24136a6ed77cbd ##### SOURCE BEGIN #####\n%% Selecting Data and Surrounding Values\n% I recently was working with Dave Schoppik on extracting a subset of\n% elements from his dataset.  In his lab, they work to understand the ways\n% our brains enable us to move.  Here's Dave's description.\n%\n%    We record the activity of single neurons in the cortex of monkeys\n%    while the monkeys move their eyes around.  Neural activity is\n%    represented by the binary vector, usually sampled at 1kHz REPLACE_WITH_DASH_DASH a \"1\"\n%    indicates that the neuron was active, while a \"0\" indicates that the\n%    neuron was quiet (for our purposes, individual neurons are either 100%\n%    active or they are not).\n%\n%    The eye movements, also sampled at 1kHz are the vector of doubles.\n%    When we want to know what sort of eye movements are associated with\n%    neural activity, the most efficient computation we can perform is to\n%    simply convolve the two vectors; the appropriate segment of the\n%    convolution tells us the sum of the eye movement vector associated\n%    with the \"true\" values in the neural vector. There are a number of\n%    analyses, though, that need all the segments of the eye movements that\n%    are associated with the neural activity.\n%\n%    I should mention that there are between 10,000-100,000 \"true\" values\n%    in the array of neural activity, and that the vector of eye movements\n%    can run for 10-30 minutes, so we're talking about a whole lot of data.\n%\n% So Dave has lots of data, and using\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/repmat.html |repmat|>\n% becomes expensive in both time and memory.  Resorting to\n% double for loops is a bit slow though.\n\n%% Vectorizing Selection of Chunks with repmat\n% Suppose we want to select samples near some known locations, including\n% the exact sample plus some surrounding values.  Assuming the values we\n% want are not too near the ends (so I don't have to do any error checking\n% here!), we could achieve this using |repmat|.\nwb = 2\nwa = 2\n\nbinvec = logical([0 0 0 1 0 0 0 1 0 1 0 0 0])\ndoubvec = [1 3 5 9 8 7 1 2 0 3 9 5 0]\ndesired = [3 5 9 8 7 ; 7 1 2 0 3 ; 2 0 3 9 5]\n%%\n% An algorithm that works is to find the locations of the points of\n% interest and add a base vector to that to get all the locations.\nbase = -wb:wa\nidx = find(binvec == true)'\nlocations = repmat(base,length(idx),1) + repmat(idx,1,length(base))\n%%\n% Then just index into the array.\nrepout = doubvec(locations)\n%%\n% Check that we got the right output.\nisequal(desired, repout)\n%%\n% Remember though, Dave said he may have 100,000 points of interest.  So we\n% need to create 2 intermediate arrays of dimensions |(length(base), # of points)|,\n% which can be sizable.\n%% Vectorizing Selection of Chunks with bsxfun\n% Instead of creating these intermediate arrays, we can use one of the\n% <https:\/\/www.mathworks.com\/access\/helpdesk\/help\/techdoc\/rn\/bq1zizq.html new features>\n% in R2007a,\n% <https:\/\/www.mathworks.com\/help\/matlab\/ref\/bsxfun.html |bsxfun|>.\n% |bsxfun| stands for binary singleton expansion, and allows me to create\n% an array from two other arrays, performing singleton expansion along\n% dimensions where only one has non-scalar value.  It does this by marching\n% down the columns (even for N dimensions) so the output array is created\n% in contiguous chunks.\ncareabout = bsxfun(@plus,base,idx)\nbsxout = doubvec(careabout);\nisequal(desired, bsxout)\nisequal(locations,careabout)\n\n%% Comparing 3 Methods: Preallocated for loop, fsxfun, and a MEX-file\n% Dave graciously compared code written in a C MEX-file with |bsxfun|.\n% You can try out the code\n% with some of his data by downloading files\n% <http:\/\/www.keck.ucsf.edu\/~schoppik\/lorendata\/ here>, including data, C\n% code for the MEX-file (taken from colleague Katherine Nagel before it was\n% fully polished for critiquing).  Timings were done in R2007a on a G5 desktop\n% (2x2.5GHz, 4GB RAM).\nwb = 1000;\nwa = 1000;\nfortime = 13.5\nbsxtime = 5.67\nmextime = 0.99\n%%\n% Using |bsxfun|, the speed improvement over a for loop with preallocated\n% output is a factor of 2.4.  You can get even more improvement, at least\n% for the simple sum we are doing here, by using a MEX-file.  Looking at\n% the C code, I see that the entire output is calculated inside a double\n% loop, using C's |+| operator.  The C code doesn't make a call to a\n% general function, unlike |bsxfun|, and, in fact, doesn't make a direct\n% library call at all.  As mentioned above, |bsxfun| marches over the\n% columns, calling the specified function for each column.  In being\n% general, |bsxfun| does this for any function you might choose that's\n% binary, including an M-file you might have written or using an overloaded\n% method for an operator, if you've done that.  |bsxfun| trades off speed\n% for this extra flexibility.  Now, with |bsxfun|\n% in MATLAB, there's an option where memory doesn't have to be one of the\n% trade-offs you consider.\n%% More Brain Background Information\n% From Dave again:\n%\n%    There's a nice illustration of what the Lisberger lab thinks about\n%    here: http:\/\/www.schoppik.com\/data\/articles\/jneurosci2006.html, which\n%    I point you to for your enjoyment, as well as because the MEX code I'd\n%    mentioned that we use to solve the problem was written by the woman\n%    who drew the illustration.\n%% Other Uses for bsxfun?\n% Does anyone else yet have an interesting story to tell about using the\n% new function |bsxfun|?  Tell us <https:\/\/blogs.mathworks.com\/loren\/?p=84 here>.\n\n##### SOURCE END ##### fc2bb66e52f24138be24136a6ed77cbd\n--><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\nI recently was working with Dave Schoppik on extracting a subset of elements from his dataset. In his lab, they work to understand<br \/>\nthe ways our brains enable us to move. Here's Dave's description.<br \/>\n ... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2007\/03\/15\/selecting-data-and-surrounding-values\/\">read more >><\/a><\/p>\n","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[10,7,6],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/84"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=84"}],"version-history":[{"count":1,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/84\/revisions"}],"predecessor-version":[{"id":1884,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/84\/revisions\/1884"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=84"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=84"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=84"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}