{"id":185,"date":"2009-05-20T15:31:33","date_gmt":"2009-05-20T15:31:33","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/2009\/05\/20\/from-struct-to-dataset\/"},"modified":"2009-05-21T10:46:39","modified_gmt":"2009-05-21T10:46:39","slug":"from-struct-to-dataset","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2009\/05\/20\/from-struct-to-dataset\/","title":{"rendered":"From struct to dataset"},"content":{"rendered":"<div xmlns:mwsh=\"https:\/\/www.mathworks.com\/namespace\/mcode\/v1\/syntaxhighlight.dtd\" class=\"content\">\r\n   <introduction>\r\n      <p>When I got to work last Friday, I saw an email discussion, on behalf of a customer, trying to find a good way to add a new\r\n         field to a <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009a\/techdoc\/ref\/struct.html\"><tt>struct<\/tt><\/a> array.  So this post will start with that problem, and then show a different way to collect the same information, in a <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009a\/toolbox\/stats\/datasetclass.html\"><tt>dataset<\/tt><\/a> array.\r\n      <\/p>\r\n   <\/introduction>\r\n   <h3>Contents<\/h3>\r\n   <div>\r\n      <ul>\r\n         <li><a href=\"#1\">Initial struct and New Data<\/a><\/li>\r\n         <li><a href=\"#4\">First Pass - for loop<\/a><\/li>\r\n         <li><a href=\"#5\">Second Pass - arrayfun<\/a><\/li>\r\n         <li><a href=\"#6\">Third Pass - deal<\/a><\/li>\r\n         <li><a href=\"#7\">Fourth Pass - Comma-separated List<\/a><\/li>\r\n         <li><a href=\"#8\">Same Results?<\/a><\/li>\r\n         <li><a href=\"#9\">What's the Data Look Like?<\/a><\/li>\r\n         <li><a href=\"#12\">Completely Different View<\/a><\/li>\r\n         <li><a href=\"#15\">Concatenate dataset Arrays<\/a><\/li>\r\n         <li><a href=\"#22\">How Do You Arrange Your Data?<\/a><\/li>\r\n      <\/ul>\r\n   <\/div>\r\n   <h3>Initial struct and New Data<a name=\"1\"><\/a><\/h3>\r\n   <p>Let's create some information to store in a <tt>struct<\/tt>.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">names = {<span style=\"color: #A020F0\">'John'<\/span>; <span style=\"color: #A020F0\">'Henri'<\/span>};\r\nages = {26; 18};\r\ninitS = struct(<span style=\"color: #A020F0\">'Name'<\/span>, names, <span style=\"color: #A020F0\">'Age'<\/span>, ages);<\/pre><p>Note that the <tt>ages<\/tt> data is a cell array.  In addition to <tt>Name<\/tt> and <tt>Age<\/tt>, I have <tt>Height<\/tt> information in a numeric, not cell, array.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">Heights = [168; 175];<\/pre><p>How do I add this information to my <tt>struct<\/tt>?  What follows are a series of possibilities, <i>definitely not exhaustive<\/i>!\r\n   <\/p>\r\n   <h3>First Pass - for loop<a name=\"4\"><\/a><\/h3>\r\n   <p>Let's start with a <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009a\/techdoc\/ref\/for.html\"><tt>for<\/tt><\/a> loop. I add Height information to each element of the <tt>struct<\/tt> array, one at a time.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">S1 = initS;\r\n<span style=\"color: #0000FF\">for<\/span> index = 1:length(S1)\r\n    S1(index).Height = \tHeights(index);\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><h3>Second Pass - arrayfun<a name=\"5\"><\/a><\/h3>\r\n   <p>I can use <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009a\/techdoc\/ref\/arrayfun.html\"><tt>arrayfun<\/tt><\/a> to remove the loop.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">S2 = initS;\r\nF = @(S,h) setfield(S, <span style=\"color: #A020F0\">'Height'<\/span>, h);\r\nS2 = arrayfun(F, S2, Heights);<\/pre><h3>Third Pass - deal<a name=\"6\"><\/a><\/h3>\r\n   <p>If the data were in a cell array, I could easily distribute it to multiple outputs.  Here I store the height data in a cell\r\n      and <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009a\/techdoc\/ref\/deal.html\"><tt>deal<\/tt><\/a> it out.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">S3 = initS;\r\ncH = num2cell(Heights);\r\n[S3.Height] = deal(cH{:});<\/pre><h3>Fourth Pass - Comma-separated List<a name=\"7\"><\/a><\/h3>\r\n   <p>If the data is in a cell array already, I can skip the step with <tt>deal<\/tt> and just dish out different cells to different outputs.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">S4 = initS;\r\ncH = num2cell(Heights);\r\n[S4.Height] = cH{:};<\/pre><h3>Same Results?<a name=\"8\"><\/a><\/h3>\r\n   <p>Let's quickly check that we get the same results with each technique.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">allsame = isequal(S1,S2,S3,S4)<\/pre><pre style=\"font-style:oblique\">allsame =\r\n     1\r\n<\/pre><h3>What's the Data Look Like?<a name=\"9\"><\/a><\/h3>\r\n   <p>It's hard to look at the data here (in, e.g., <tt>S1<\/tt>) because the contents of each <tt>struct<\/tt> element is completely at the users's disposal.  So I can look at one array element at a time.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">S1(1)<\/pre><pre style=\"font-style:oblique\">ans = \r\n      Name: 'John'\r\n       Age: 26\r\n    Height: 168\r\n<\/pre><p>Or I can look at all of the data in a single field at once.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">[S1.Age]<\/pre><pre style=\"font-style:oblique\">ans =\r\n    26    18\r\n<\/pre><p>But I don't get to see <b>all<\/b> of the data in one glance.\r\n   <\/p>\r\n   <h3>Completely Different View<a name=\"12\"><\/a><\/h3>\r\n   <p>And now for something completely different.  I've <a href=\"https:\/\/blogs.mathworks.com\/loren\/2008\/07\/23\/using-matlab-to-grade\/\">blogged before<\/a> about <tt>dataset<\/tt> arrays from <a href=\"https:\/\/www.mathworks.com\/products\/statistics\/\">Statistics Toolbox<\/a>. Here's another instance where one might be useful.  I treat the columns like individual fields, and the rows as individual\r\n      records.  Each column contains data of a single datatype. Here's the data.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">names = {<span style=\"color: #A020F0\">'John'<\/span>; <span style=\"color: #A020F0\">'Henri'<\/span>}\r\nages = [26; 18];\r\nd1 = dataset({names, <span style=\"color: #A020F0\">'Name'<\/span>}, {ages, <span style=\"color: #A020F0\">'Age'<\/span>})<\/pre><pre style=\"font-style:oblique\">names = \r\n    'John'\r\n    'Henri'\r\nd1 = \r\n    Name           Age\r\n    'John'         26 \r\n    'Henri'        18 \r\n<\/pre><p>Two things to note here in contrast to using a <tt>struct<\/tt> to contain the information. First, the arguments appear in a different order in the two solutions.  Second, the numeric data\r\n      doesn't need to be placed in a cell array for the <tt>dataset<\/tt>, making the data management more natural, in my opinion.\r\n   <\/p>\r\n   <p>Let me make a new dataset with additional data, heights.<\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">d2 = dataset({names, <span style=\"color: #A020F0\">'Name'<\/span>}, {[168 ;175] <span style=\"color: #A020F0\">'Height'<\/span>})<\/pre><pre style=\"font-style:oblique\">d2 = \r\n    Name           Height\r\n    'John'         168   \r\n    'Henri'        175   \r\n<\/pre><h3>Concatenate dataset Arrays<a name=\"15\"><\/a><\/h3>\r\n   <p>Now let me collect the original dataset <tt>d1<\/tt> with the new information in <tt>d2<\/tt>.  Here are some ways to achieve this.  First, just use square brackets (<tt>[]<\/tt>) as you would for regular array concatenation.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">dnew1 = [d1 d2]<\/pre><pre style=\"font-style:oblique\">dnew1 = \r\n    Name           Age    Height\r\n    'John'         26     168   \r\n    'Henri'        18     175   \r\n<\/pre><p>Another way to do this is to add the information in a <tt>struct<\/tt>-like way to the original <tt>dataset<\/tt>.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">dnew2 = d1;\r\ndnew2.Height = [168; 175]<\/pre><pre style=\"font-style:oblique\">dnew2 = \r\n    Name           Age    Height\r\n    'John'         26     168   \r\n    'Henri'        18     175   \r\n<\/pre><p>Now let's make different <tt>dataset<\/tt> with new information, but with the order of the 2 entries swapped.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">d3 = dataset({{<span style=\"color: #A020F0\">'Henri'<\/span>; <span style=\"color: #A020F0\">'John'<\/span>}, <span style=\"color: #A020F0\">'Name'<\/span>}, {[175; 168] <span style=\"color: #A020F0\">'Height'<\/span>})<\/pre><pre style=\"font-style:oblique\">d3 = \r\n    Name           Height\r\n    'Henri'        175   \r\n    'John'         168   \r\n<\/pre><p>What happens if we try to collect <tt>d1<\/tt> and <tt>d3<\/tt> together into one <tt>dataset<\/tt>?\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\"><span style=\"color: #0000FF\">try<\/span>\r\n    dnew3 = [d1 d3];\r\n<span style=\"color: #0000FF\">catch<\/span> ExcDataset\r\n    disp(ExcDataset.message)\r\n<span style=\"color: #0000FF\">end<\/span><\/pre><pre style=\"font-style:oblique\">Duplicate variable names with distinct data.\r\n<\/pre><p>As you can see, I can't just collect them together via concatenation. However, I can combine or <a href=\"https:\/\/www.mathworks.com\/help\/releases\/R2009a\/toolbox\/stats\/dataset.join.html\"><tt>join<\/tt><\/a> the two datasets correctly.\r\n   <\/p><pre style=\"background: #F9F7F3; padding: 10px; border: 1px solid rgb(200,200,200)\">dnew3 = join(d1,d2,<span style=\"color: #A020F0\">'Name'<\/span>)<\/pre><pre style=\"font-style:oblique\">dnew3 = \r\n    Name           Age    Height\r\n    'John'         26     168   \r\n    'Henri'        18     175   \r\n<\/pre><p>Notice how easily I can see all the data at once here, compared to the <tt>struct<\/tt> array.\r\n   <\/p>\r\n   <h3>How Do You Arrange Your Data?<a name=\"22\"><\/a><\/h3>\r\n   <p>Do you use either of these strategies for arranging your data (<tt>struct<\/tt> or <tt>dataset<\/tt> arrays)?  Or do you do something different?  I'd love to hear your experiences <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=185#respond\">here<\/a>.\r\n   <\/p><script language=\"JavaScript\">\r\n<!--\r\n\r\n    function grabCode_0c01710e0eab4b4fa85c77f34e43b8b4() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='0c01710e0eab4b4fa85c77f34e43b8b4 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 0c01710e0eab4b4fa85c77f34e43b8b4';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        author = 'Loren Shure';\r\n        copyright = 'Copyright 2009 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add author and copyright lines at the bottom if specified.\r\n        if ((author.length > 0) || (copyright.length > 0)) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (author.length > 0) {\r\n                d.writeln('% _' + author + '_');\r\n            }\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n      \r\n      d.title = title + ' (MATLAB code)';\r\n      d.close();\r\n      }   \r\n      \r\n-->\r\n<\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_0c01710e0eab4b4fa85c77f34e43b8b4()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n            the MATLAB code \r\n            <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; 7.8<br><\/p>\r\n<\/div>\r\n<!--\r\n0c01710e0eab4b4fa85c77f34e43b8b4 ##### SOURCE BEGIN #####\r\n%% From struct to dataset\r\n% When I got to work last Friday, I saw an email discussion, on behalf of a\r\n% customer, trying to find a good way to add a new field to a\r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009a\/techdoc\/ref\/struct.html |struct|>\r\n% array.  So this post will start with that problem, and then show a\r\n% different way to collect the same information, in a \r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009a\/toolbox\/stats\/datasetclass.html |dataset|>\r\n% array.\r\n%%  Initial struct and New Data\r\n% Let's create some information to store in a |struct|.\r\nnames = {'John'; 'Henri'};\r\nages = {26; 18};\r\ninitS = struct('Name', names, 'Age', ages);\r\n%%\r\n% Note that the |ages| data is a cell array.  In addition to |Name| and\r\n% |Age|, I have |Height| information in a numeric, not cell, array.\r\nHeights = [168; 175];\r\n%%\r\n% How do I add this information to my |struct|?  What follows are a series\r\n% of possibilities, _definitely not exhaustive_!\r\n%% First Pass - for loop\r\n% Let's start with a \r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009a\/techdoc\/ref\/for.html |for|>\r\n% loop. I add Height information to each element of the |struct| array, one\r\n% at a time.\r\nS1 = initS;\r\nfor index = 1:length(S1)\r\n    S1(index).Height = \tHeights(index);\r\nend\r\n%% Second Pass - arrayfun\r\n% I can use\r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009a\/techdoc\/ref\/arrayfun.html |arrayfun|>\r\n% to remove the loop.\r\nS2 = initS;\r\nF = @(S,h) setfield(S, 'Height', h);\r\nS2 = arrayfun(F, S2, Heights);\r\n%% Third Pass - deal\r\n% If the data were in a cell array, I could easily distribute it to\r\n% multiple outputs.  Here I store the height data in a cell and\r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009a\/techdoc\/ref\/deal.html |deal|>\r\n% it out.\r\nS3 = initS;\r\ncH = num2cell(Heights);\r\n[S3.Height] = deal(cH{:});\r\n%% Fourth Pass - Comma-separated List\r\n% If the data is in a cell array already, I can skip the step with |deal|\r\n% and just dish out different cells to different outputs.\r\nS4 = initS;\r\ncH = num2cell(Heights);\r\n[S4.Height] = cH{:};\r\n%% Same Results?\r\n% Let's quickly check that we get the same results with each technique.\r\nallsame = isequal(S1,S2,S3,S4)\r\n%% What's the Data Look Like?\r\n% It's hard to look at the data here (in, e.g., |S1|) because the contents \r\n% of each |struct| element is completely at the users's disposal.  So I can\r\n% look at one array element at a time.\r\nS1(1)\r\n%%\r\n% Or I can look at all of the data in a single field at once.\r\n[S1.Age]\r\n%%\r\n% But I don't get to see *all* of the data in one glance.\r\n%% Completely Different View\r\n% And now for something completely different.  I've \r\n% <https:\/\/blogs.mathworks.com\/loren\/2008\/07\/23\/using-matlab-to-grade\/ blogged before>\r\n% about |dataset| arrays from \r\n% <https:\/\/www.mathworks.com\/products\/statistics\/ Statistics Toolbox>. \r\n% Here's another instance where one might be useful.  I treat the columns\r\n% like individual fields, and the rows as individual records.  Each column\r\n% contains data of a single datatype. Here's the data.\r\nnames = {'John'; 'Henri'}\r\nages = [26; 18];\r\nd1 = dataset({names, 'Name'}, {ages, 'Age'})\r\n%%\r\n% Two things to note here in contrast to using a |struct| to contain the\r\n% information. First, the arguments appear in a different order in the two\r\n% solutions.  Second, the numeric data doesn't need to be placed in a \r\n% cell array for the |dataset|, making the data management more natural, in\r\n% my opinion.\r\n%%\r\n% Let me make a new dataset with additional data, heights.\r\nd2 = dataset({names, 'Name'}, {[168 ;175] 'Height'})\r\n%% Concatenate dataset Arrays\r\n% Now let me collect the original dataset |d1| with the new information in\r\n% |d2|.  Here are some ways to achieve this.  First, just use square \r\n% brackets (|[]|) as you would for regular array concatenation.\r\ndnew1 = [d1 d2]\r\n%%\r\n% Another way to do this is to add the information in a |struct|-like way\r\n% to the original |dataset|.\r\ndnew2 = d1;\r\ndnew2.Height = [168; 175]\r\n%%\r\n% Now let's make different |dataset| with new information, but with the\r\n% order of the 2 entries swapped.\r\nd3 = dataset({{'Henri'; 'John'}, 'Name'}, {[175; 168] 'Height'})\r\n%% \r\n% What happens if we try to collect |d1| and |d3| together into one\r\n% |dataset|?\r\n%%\r\ntry\r\n    dnew3 = [d1 d3];\r\ncatch ExcDataset\r\n    disp(ExcDataset.message)\r\nend\r\n%%\r\n% As you can see, I can't just collect them together via concatenation.\r\n% However, I can combine or\r\n% <https:\/\/www.mathworks.com\/help\/releases\/R2009a\/toolbox\/stats\/dataset.join.html |join|>\r\n% the two datasets correctly.\r\ndnew3 = join(d1,d2,'Name')\r\n%%\r\n% Notice how easily I can see all the data at once here, compared to the\r\n% |struct| array.  \r\n%% How Do You Arrange Your Data?\r\n% Do you use either of these strategies for arranging your data (|struct|\r\n% or |dataset| arrays)?  Or do you do something different?  I'd love to\r\n% hear your experiences \r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=185#respond here>.\r\n\r\n\r\n\r\n\r\n##### SOURCE END ##### 0c01710e0eab4b4fa85c77f34e43b8b4\r\n-->","protected":false},"excerpt":{"rendered":"<p>\r\n   \r\n      When I got to work last Friday, I saw an email discussion, on behalf of a customer, trying to find a good way to add a new\r\n         field to a struct array.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2009\/05\/20\/from-struct-to-dataset\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6,5],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/185"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=185"}],"version-history":[{"count":0,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/185\/revisions"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}