{"id":875,"date":"2014-04-07T10:28:36","date_gmt":"2014-04-07T15:28:36","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=875"},"modified":"2017-04-04T08:11:07","modified_gmt":"2017-04-04T13:11:07","slug":"debunking-bad-news-analysis-with-matlab","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2014\/04\/07\/debunking-bad-news-analysis-with-matlab\/","title":{"rendered":"Debunking Bad News Analysis with MATLAB"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>With spring comes the tax filing deadline. This post is also about taxes. I'd like to introduce this week's guest blogger, Toshi Takeuchi. Toshi analyzes web data and runs online ad campaigns here at MathWorks.<\/p><p>Hi, I am Toshi. I am a big fan of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Nate_Silver\">Nate Silver<\/a> who made analyzing data very cool and mainstream. Because I analyze data a lot, it bugs me when I see questionable analyses passed around in the news media.<\/p><p>So when I saw this <a href=\"\">CNBC post on Google+<\/a>, my &#8220;bogus data analysis&#8221; radar started sending high alerts.<\/p><p>\r\n\r\n<a href=\"https:\/\/blogs.mathworks.com\/loren\/files\/Debunking_Bad_News_Analysis_with_MATLAB_map.png\"><img decoding=\"async\" loading=\"lazy\" width=\"300\" height=\"191\" src=\"https:\/\/blogs.mathworks.com\/loren\/files\/Debunking_Bad_News_Analysis_with_MATLAB_map-300x191.png\" alt=\"\" class=\"alignnone size-medium wp-image-2269\" \/><\/a>\r\n\r\n<\/p><p>This map shows you the ranking of states based on average tax amount, adjusted to the cost of living index. Let me pretend some data journalism here.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#b6f1d239-f122-417e-a645-a219d9fbfacf\">What&#8217;s wrong with that?<\/a><\/li><li><a href=\"#91a77bf7-3e0c-4379-a2bc-8d691d52a9e1\">Data analysis is easy with MATLAB<\/a><\/li><li><a href=\"#97aee445-356a-4697-9fd0-b55b56451dc3\">Import data<\/a><\/li><li><a href=\"#113c0732-a9c2-4393-bf85-1a2ecf8e4e9e\">Merge the tables<\/a><\/li><li><a href=\"#bee060d8-5bee-464a-bf8d-56d54f009406\">Compare two rankings - Top 20<\/a><\/li><li><a href=\"#1a8f077c-7f2a-434c-b942-a46aa9726653\">Prepare a new map<\/a><\/li><li><a href=\"#7699cda6-5a0b-44c3-867e-520a5f972844\">Plot the new map.<\/a><\/li><li><a href=\"#ecf7f256-4755-465c-9cff-411bf483b5ad\">Download the data<\/a><\/li><li><a href=\"#af028fcc-5d74-4fb2-9dab-84d42c47150c\">Use MATLAB to Fight the Noise<\/a><\/li><\/ul><\/div><h4>What&#8217;s wrong with that?<a name=\"b6f1d239-f122-417e-a645-a219d9fbfacf\"><\/a><\/h4><p>Well, I happen to think that the tax amount is correlated more directly to income, rather than cost of living. The average tax amount should be higher if you live in a state with high median income. Cost of living may be also higher in those states, but that's a secondary effect.<\/p><p>In order to understand the true picture, you actually need to think in terms of tax to income ratio instead. This is what you get when you use this metric.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/map.png\" alt=\"\"> <\/p><p>You can see that the color shifted in a number of states if you compare the first map and the second map. Massachusetts, where I live, actually looks pretty good; so are states in Mid-Atlantic region, while they were red in the first map. On the other hand, the color flips in the other direction in some Gulf Coast states in the South.<\/p><p>If you believed in the original analysis and moved from Massachusetts to one of those states, then your taxes may go down, but your income may also go down even more. Not a good move, IMHO.<\/p><pre class=\"codeinput\"><span class=\"comment\">% Disclaimer: don't trust my analysis, either - I only did it for<\/span>\r\n<span class=\"comment\">% debunking the original story; it is not meant as a robust analysis.<\/span>\r\n<span class=\"comment\">% Please don't plan your relocation based on this analysis, just in case.<\/span>\r\n<\/pre><h4>Data analysis is easy with MATLAB<a name=\"91a77bf7-3e0c-4379-a2bc-8d691d52a9e1\"><\/a><\/h4><p>If you are interested in playing a data journalist, this type of analysis is fairly easy with MATLAB.<\/p><p>All I had to do was to download the median income dataset from the Census Bureau website and merge two datasets with the newly introduced <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/tables.html\">table<\/a> (available in MATLAB since R2013b). I also used <a href=\"https:\/\/www.mathworks.com\/products\/mapping\/\">Mapping Toolbox<\/a> to visualize the data.<\/p><h4>Import data<a name=\"97aee445-356a-4697-9fd0-b55b56451dc3\"><\/a><\/h4><p>First, I went to the data sources to get the data. You can use Excel to import HTML tables into a spreadsheet directly. Census data is also available in Excel format. To match the time period with the original analysis, I used<\/p><pre>Historical (1984 to 2012):\r\nMedian Household Income by State - Single-Year Estimates [XLS - 98k]<\/pre><p>Data sources<\/p><div><ul><li><a href=\"http:\/\/wallethub.com\/edu\/best-worst-states-to-be-a-taxpayer\/2416\/\">WalletHub<\/a><\/li><li><a href=\"https:\/\/www2.census.gov\/programs-surveys\/cps\/tables\/time-series\/historical-income-households\/h08.xls\">Census Bureau<\/a><\/li><\/ul><\/div><pre class=\"codeinput\"><span class=\"comment\">% load data from files into tables<\/span>\r\nTax = readtable(<span class=\"string\">'bestWorstStateTax.csv'<\/span>);\r\nIncome = readtable(<span class=\"string\">'medianIncome.csv'<\/span>);\r\n\r\n<span class=\"comment\">% inspect the content<\/span>\r\ndisp <span class=\"string\">Tax<\/span>\r\ndisp(Tax(1:5,:))\r\ndisp <span class=\"string\">Income<\/span>\r\ndisp(Income(1:5,:))\r\n<\/pre><pre class=\"codeoutput\">Tax\r\n    Rank        State         AvgAnnualStateLocalTaxes\r\n    ____    ______________    ________________________\r\n    1       'Wyoming'         2365                    \r\n    2       'Alaska'          2791                    \r\n    3       'Nevada'          3370                    \r\n    4       'Florida'         3648                    \r\n    5       'South Dakota'    3766                    \r\n\r\n    PercentDiffFromNationalAvg    AdjRankCostOfLivingIdx\r\n    __________________________    ______________________\r\n    -0.66                         1                     \r\n    -0.66                         4                     \r\n    -0.52                         2                     \r\n    -0.48                         3                     \r\n    -0.46                         5                     \r\nIncome\r\n       State        MedianIncome    StandardError\r\n    ____________    ____________    _____________\r\n    'Alabama'       43464           2529.4       \r\n    'Alaska'        63648           2839.1       \r\n    'Arizona'       47044           2921.7       \r\n    'Arkansas'      39018           2811.5       \r\n    'California'    57020           1237.5       \r\n<\/pre><h4>Merge the tables<a name=\"113c0732-a9c2-4393-bf85-1a2ecf8e4e9e\"><\/a><\/h4><p>Now we have two tables in the workspace, and you can also see that each column has a header and can contain a different data type. Both tables contains the same column called \"State\" containing the text string of state names. We can use that as the key to join those two tables. We don't need all the columns for this analysis, so I will join just the columns I need.<\/p><pre class=\"codeinput\"><span class=\"comment\">% |table| is smart - it automatically uses that \"State\" column as the key.<\/span>\r\n<span class=\"comment\">% Just using |State| and |AvgAnnualStteLocalTaxes| and |State| and<\/span>\r\n<span class=\"comment\">% |MedianIncome|.<\/span>\r\nT1 = join(Tax(:,2:3),Income(:,1:2));\r\n<span class=\"comment\">% rename columns<\/span>\r\nT1.Properties.VariableNames = {<span class=\"string\">'State'<\/span>,<span class=\"string\">'Tax'<\/span>,<span class=\"string\">'Income'<\/span>};\r\n\r\n<span class=\"comment\">% compute tax to income ratio<\/span>\r\nT1.Ratio = T1.Tax.\/T1.Income;\r\n<span class=\"comment\">% create a new table ranked by tax to income ratio<\/span>\r\nT2 = sortrows(T1,{<span class=\"string\">'Ratio'<\/span>});\r\n\r\n<span class=\"comment\">% inspect the new table<\/span>\r\ndisp <span class=\"string\">T2<\/span>\r\ndisp(T2(1:5,:))\r\n<\/pre><pre class=\"codeoutput\">T2\r\n        State         Tax     Income     Ratio  \r\n    ______________    ____    ______    ________\r\n    'Wyoming'         2365    57512     0.041122\r\n    'Alaska'          2791    63648     0.043851\r\n    'Washington'      3823    62187     0.061476\r\n    'Nevada'          3370    47333     0.071197\r\n    'South Dakota'    3766    49415     0.076212\r\n<\/pre><h4>Compare two rankings - Top 20<a name=\"bee060d8-5bee-464a-bf8d-56d54f009406\"><\/a><\/h4><p>Check whether the new metric produced any meaningful differences.<\/p><pre class=\"codeinput\">disp(<span class=\"string\">'Top 20'<\/span>)\r\ndisp(<span class=\"string\">'By Avg. Tax               By Avg. Ratio'<\/span>)\r\ndisp([T1.State(1:20) T2.State(1:20)])\r\n<\/pre><pre class=\"codeoutput\">Top 20\r\nBy Avg. Tax               By Avg. Ratio\r\n    'Wyoming'          'Wyoming'             \r\n    'Alaska'           'Alaska'              \r\n    'Nevada'           'Washington'          \r\n    'Florida'          'Nevada'              \r\n    'South Dakota'     'South Dakota'        \r\n    'Washington'       'Florida'             \r\n    'Texas'            'Colorado'            \r\n    'Delaware'         'Texas'               \r\n    'North Dakota'     'North Dakota'        \r\n    'Colorado'         'Utah'                \r\n    'New Mexico'       'Delaware'            \r\n    'Alabama'          'Massachusetts'       \r\n    'Arizona'          'New Hampshire'       \r\n    'Utah'             'Virginia'            \r\n    'Mississippi'      'Maryland'            \r\n    'Indiana'          'District of Columbia'\r\n    'Louisiana'        'Rhode Island'        \r\n    'West Virginia'    'Arizona'             \r\n    'Montana'          'Hawaii'              \r\n    'Oklahoma'         'New Jersey'          \r\n<\/pre><h4>Prepare a new map<a name=\"1a8f077c-7f2a-434c-b942-a46aa9726653\"><\/a><\/h4><p>Now we will start using the functions from Mapping Toolbox. First we will assemble the required pieces of data to prepare a map.<\/p><p>Note: I also used <a href=\"https:\/\/www.mathworks.com\/products\/bioinfo\/\">Bioinformatic Toolbox<\/a> function <a href=\"https:\/\/www.mathworks.com\/help\/bioinfo\/ref\/redgreencmap.html\">redgreencmap<\/a> to create the colormap to go from green to red to mirror the scheme in the original map. If you don't have this toolbox, you can easily create a custom map in MATLAB. <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/colors-1.html\">Colormaps<\/a> are arrays of RGB values (triplets) in the range from 0 to 1.<\/p><pre class=\"codeinput\"><span class=\"comment\">% get the US geography data as a structure array<\/span>\r\nstates = shaperead(<span class=\"string\">'usastatelo'<\/span>, <span class=\"string\">'UseGeoCoords'<\/span>, true);\r\n\r\n<span class=\"comment\">% Get the state names as a cell array of strings.<\/span>\r\nnames = {states.Name};\r\n\r\n<span class=\"comment\">% This is a vector that the stores ranking of each state.<\/span>\r\nranking = zeros(length(names),1);\r\n<span class=\"keyword\">for<\/span> i=1:length(names)\r\n    ranking(i)=find(strcmpi(names(i),T2.State));\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<span class=\"comment\">% Create a colormap that goes from green to red in 51 steps.<\/span>\r\ncolors = redgreencmap(length(ranking));\r\n\r\n<span class=\"comment\">% Sort colors by state ranking.<\/span>\r\nstateColors = colors(ranking,:);\r\n\r\n<span class=\"comment\">% Separate Hawaii and Alaska from the Continental US.<\/span>\r\nindexHawaii = strcmp(<span class=\"string\">'Hawaii'<\/span>,names);\r\nindexAlaska = strcmp(<span class=\"string\">'Alaska'<\/span>,names);\r\nindexConus = 1:numel(states);\r\nindexConus(indexHawaii|indexAlaska) = [];\r\n<\/pre><h4>Plot the new map.<a name=\"7699cda6-5a0b-44c3-867e-520a5f972844\"><\/a><\/h4><p>Now we are ready to draw the map.<\/p><pre class=\"codeinput\"><span class=\"comment\">% This creates a figure with axes of US geography.<\/span>\r\n<span class=\"comment\">% It contains three axes - Continental  US, Alaska and Hawaii.<\/span>\r\nfigure; ax = usamap(<span class=\"string\">'all'<\/span>);\r\n\r\n<span class=\"comment\">% We don't need the axes, so turn them off.<\/span>\r\nset(ax, <span class=\"string\">'Visible'<\/span>, <span class=\"string\">'off'<\/span>)\r\n\r\n<span class=\"comment\">% Draw the states with specified color within the Continental US.<\/span>\r\n<span class=\"keyword\">for<\/span> j = 1:length(indexConus)\r\n    geoshow(ax(1), states(indexConus(j)),<span class=\"string\">'FaceColor'<\/span>,stateColors(indexConus(j),:))\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<span class=\"comment\">% Now do the same for Alaska and Hawaii.<\/span>\r\ngeoshow(ax(2), states(indexAlaska),<span class=\"string\">'FaceColor'<\/span>,stateColors(indexAlaska,:))\r\ngeoshow(ax(3), states(indexHawaii),<span class=\"string\">'FaceColor'<\/span>,stateColors(indexHawaii,:))\r\n\r\n<span class=\"comment\">% We don't need geographical details, so turn them off for each axes.<\/span>\r\n<span class=\"keyword\">for<\/span> k = 1:3\r\n    setm(ax(k), <span class=\"string\">'Frame'<\/span>, <span class=\"string\">'off'<\/span>, <span class=\"string\">'Grid'<\/span>, <span class=\"string\">'off'<\/span>,<span class=\"keyword\">...<\/span>\r\n      <span class=\"string\">'ParallelLabel'<\/span>, <span class=\"string\">'off'<\/span>, <span class=\"string\">'MeridianLabel'<\/span>, <span class=\"string\">'off'<\/span>)\r\n<span class=\"keyword\">end<\/span>\r\n\r\n<span class=\"comment\">% Add a colorbar.<\/span>\r\ncolormap(flipud(colors))\r\nc= colorbar(<span class=\"string\">'YTickLabel'<\/span>,<span class=\"keyword\">...<\/span>\r\n    {<span class=\"string\">'51'<\/span>,<span class=\"string\">'41'<\/span>,<span class=\"keyword\">...<\/span>\r\n     <span class=\"string\">'31'<\/span>,<span class=\"string\">'21'<\/span>,<span class=\"string\">'11'<\/span>,<span class=\"string\">'1'<\/span>});\r\nylabel(c,<span class=\"string\">'Ranking'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/bestWorstStateTax_01.png\" alt=\"\"> <h4>Download the data<a name=\"ecf7f256-4755-465c-9cff-411bf483b5ad\"><\/a><\/h4><p>You can download the data files from these links:<\/p><div><ul><li><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014TaxCode\/bestWorstStateTax.csv\">Get bestWorstStateTax.csv<\/a><\/li><li><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014TaxCode\/medianIncome.csv\">Get medianIncome.csv<\/a><\/li><\/ul><\/div><h4>Use MATLAB to Fight the Noise<a name=\"af028fcc-5d74-4fb2-9dab-84d42c47150c\"><\/a><\/h4><p>Did you enjoy seeing how to use MATLAB for debunking some bad news analysis? Would you like to try? Perhaps you already do this yourself. Tell us know about it <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=875#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_0b195c05b1e04cd2927d8479c3e0e234() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='0b195c05b1e04cd2927d8479c3e0e234 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 0b195c05b1e04cd2927d8479c3e0e234';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2014 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_0b195c05b1e04cd2927d8479c3e0e234()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2014a<br><\/p><p class=\"footer\"><br>\r\n      Published with MATLAB&reg; R2014a<br><\/p><\/div><!--\r\n0b195c05b1e04cd2927d8479c3e0e234 ##### SOURCE BEGIN #####\r\n%% Debunking Bad News Analysis with MATLAB\r\n% With spring comes the tax filing deadline. This post is also about\r\n% taxes. I'd like to introduce this week's guest blogger, Toshi Takeuchi. \r\n% Toshi analyzes web data and runs online ad campaigns here at MathWorks.\r\n% \r\n% Hi, I am Toshi. I am a big fan of <http:\/\/en.wikipedia.org\/wiki\/Nate_Silver \r\n% Nate Silver> who made analyzing data very cool and mainstream. Because I \r\n% analyze data a lot, it bugs me when I see questionable analyses passed\r\n% around in the news media.  \r\n% \r\n% So when I saw this \r\n% < \r\n% CNBC post on Google+>, my \u00e2\u20ac\u0153bogus data analysis\u00e2\u20ac\ufffd radar started sending \r\n% high alerts. \r\n% \r\n% <html>\r\n% <div style=\"position:relative;width:556px;height:347px\">\r\n% <iframe loading=\"lazy\" \r\n% src=\"http:\/\/d2e70e9yced57e.cloudfront.net\/common\/wallethub\/best-worst-states-to-be-a-taxpayer.html\" \r\n% width=\"556\" height=\"347\" frameBorder=\"0\" scrolling=\"no\"><\/iframe>\r\n% <a \r\n% href=\"http:\/\/wallethub.com\/edu\/best-worst-states-to-be-a-taxpayer\/2416\/\" \r\n% style=\"position:absolute;top:319px;left:465px;\">\r\n% <img decoding=\"async\" loading=\"lazy\" \r\n% src=\"http:\/\/d2e70e9yced57e.cloudfront.net\/wallethub\/images\/blog\/wh-gcharts-logo_V6185_.png\" \r\n% width=\"76\" height=\"13\" alt=\"WalletHub\" \/>\r\n% <\/a>\r\n% <\/div>\r\n% <\/html>\r\n%\r\n% This map shows you the ranking of states based on average tax amount,\r\n% adjusted to the cost of living index. Let me pretend some data \r\n% journalism here. \r\n%\r\n%%% What\u00e2\u20ac\u2122s wrong with that?\r\n% Well, I happen to think that the tax amount is correlated more directly \r\n% to income, rather than cost of living. The average tax amount should be\r\n% higher if you live in a state with high median income. Cost of living may\r\n% be also higher in those states, but that's a secondary effect. \r\n% \r\n% In order to understand the true picture, you actually need to think in \r\n% terms of tax to income ratio instead. This is what you get when you use\r\n% this metric. \r\n% \r\n% <<map.png>>\r\n% \r\n% You can see that the color shifted in a number of states if you compare\r\n% the first map and the second map. Massachusetts, where I live, actually\r\n% looks pretty good; so are states in Mid-Atlantic region, while they were\r\n% red in the first map. On the other hand, the color flips in the other\r\n% direction in some Gulf Coast states in the South.\r\n% \r\n% If you believed in the original analysis and moved from Massachusetts to\r\n% one of those states, then your taxes may go down, but your income may\r\n% also go down even more. Not a good move, IMHO.\r\n\r\n% Disclaimer: don't trust my analysis, either - I only did it for\r\n% debunking the original story; it is not meant as a robust analysis. \r\n% Please don't plan your relocation based on this analysis, just in case.\r\n\r\n%% Data analysis is easy with MATLAB\r\n% If you are interested in playing a data journalist, this type of analysis\r\n% is fairly easy with MATLAB.\r\n% \r\n% All I had to do was to download the median income dataset from the Census \r\n% Bureau website and merge two datasets with the newly introduced \r\n% <https:\/\/www.mathworks.com\/help\/matlab\/tables.html table>\r\n% (available in MATLAB since R2013b). I also used \r\n% <https:\/\/www.mathworks.com\/products\/mapping\/ Mapping Toolbox> to\r\n% visualize the data. \r\n\r\n%% Import data\r\n% First, I went to the data sources to get the data. You can use Excel to\r\n% import HTML tables into a spreadsheet directly. Census data is also\r\n% available in Excel format. To match the time period with the original\r\n% analysis, I used \r\n% \r\n%  Historical (1984 to 2012):\r\n%  Median Household Income by State - Single-Year Estimates [XLS - 98k]\r\n% \r\n% Data sources\r\n% \r\n% * <http:\/\/wallethub.com\/edu\/best-worst-states-to-be-a-taxpayer\/2416\/ WalletHub>\r\n% * <http:\/\/www.census.gov\/hhes\/www\/income\/data\/statemedian\/ Census Bureau>\r\n\r\n% load data from files into tables\r\nTax = readtable('bestWorstStateTax.csv');\r\nIncome = readtable('medianIncome.csv');\r\n\r\n% inspect the content\r\ndisp Tax\r\ndisp(Tax(1:5,:))\r\ndisp Income\r\ndisp(Income(1:5,:))\r\n\r\n%% Merge the tables\r\n% Now we have two tables in the workspace, and you can also see that each\r\n% column has a header and can contain a different data type. Both tables\r\n% contains the same column called \"State\" containing the text string of\r\n% state names. We can use that as the key to join those two tables. We\r\n% don't need all the columns for this analysis, so I will join just the\r\n% columns I need. \r\n\r\n% |table| is smart - it automatically uses that \"State\" column as the key.\r\n% Just using |State| and |AvgAnnualStteLocalTaxes| and |State| and\r\n% |MedianIncome|.\r\nT1 = join(Tax(:,2:3),Income(:,1:2)); \r\n% rename columns\r\nT1.Properties.VariableNames = {'State','Tax','Income'};\r\n\r\n% compute tax to income ratio\r\nT1.Ratio = T1.Tax.\/T1.Income;\r\n% create a new table ranked by tax to income ratio\r\nT2 = sortrows(T1,{'Ratio'});\r\n\r\n% inspect the new table\r\ndisp T2\r\ndisp(T2(1:5,:))\r\n\r\n%% Compare two rankings - Top 20\r\n% Check whether the new metric produced any meaningful differences.\r\n\r\ndisp('Top 20')\r\ndisp('By Avg. Tax               By Avg. Ratio')\r\ndisp([T1.State(1:20) T2.State(1:20)])\r\n\r\n%% Prepare a new map\r\n% Now we will start using the functions from Mapping Toolbox. First we will\r\n% assemble the required pieces of data to prepare a map. \r\n%\r\n% Note: I also used <https:\/\/www.mathworks.com\/products\/bioinfo\/\r\n% Bioinformatic Toolbox> function\r\n% <https:\/\/www.mathworks.com\/help\/bioinfo\/ref\/redgreencmap.html\r\n% redgreencmap> to create the colormap to go from green to red to mirror\r\n% the scheme in the original map. If you don't have this toolbox, you can\r\n% easily create a custom map in MATLAB.\r\n% <https:\/\/www.mathworks.com\/help\/matlab\/colors-1.html Colormaps> are arrays\r\n% of RGB values (triplets) in the range from 0 to 1.\r\n\r\n% get the US geography data as a structure array\r\nstates = shaperead('usastatelo', 'UseGeoCoords', true);\r\n\r\n% Get the state names as a cell array of strings.\r\nnames = {states.Name};\r\n\r\n% This is a vector that the stores ranking of each state.\r\nranking = zeros(length(names),1);\r\nfor i=1:length(names)\r\n    ranking(i)=find(strcmpi(names(i),T2.State));\r\nend\r\n\r\n% Create a colormap that goes from green to red in 51 steps.\r\ncolors = redgreencmap(length(ranking));\r\n\r\n% Sort colors by state ranking.\r\nstateColors = colors(ranking,:);\r\n\r\n% Separate Hawaii and Alaska from the Continental US.\r\nindexHawaii = strcmp('Hawaii',names);\r\nindexAlaska = strcmp('Alaska',names);\r\nindexConus = 1:numel(states);\r\nindexConus(indexHawaii|indexAlaska) = []; \r\n\r\n%% Plot the new map.\r\n% Now we are ready to draw the map. \r\n\r\n% This creates a figure with axes of US geography. \r\n% It contains three axes - Continental  US, Alaska and Hawaii. \r\nfigure; ax = usamap('all');\r\n\r\n% We don't need the axes, so turn them off.\r\nset(ax, 'Visible', 'off')\r\n\r\n% Draw the states with specified color within the Continental US.\r\nfor j = 1:length(indexConus)\r\n    geoshow(ax(1), states(indexConus(j)),'FaceColor',stateColors(indexConus(j),:))\r\nend\r\n\r\n% Now do the same for Alaska and Hawaii.\r\ngeoshow(ax(2), states(indexAlaska),'FaceColor',stateColors(indexAlaska,:))\r\ngeoshow(ax(3), states(indexHawaii),'FaceColor',stateColors(indexHawaii,:))\r\n\r\n% We don't need geographical details, so turn them off for each axes.\r\nfor k = 1:3\r\n    setm(ax(k), 'Frame', 'off', 'Grid', 'off',...\r\n      'ParallelLabel', 'off', 'MeridianLabel', 'off')\r\nend\r\n\r\n% Add a colorbar.\r\ncolormap(flipud(colors))\r\nc= colorbar('YTickLabel',...\r\n    {'51','41',...\r\n     '31','21','11','1'});\r\nylabel(c,'Ranking')\r\n\r\n%% Download the data\r\n% You can download the data files from these links:\r\n%\r\n% * <https:\/\/blogs.mathworks.com\/images\/loren\/2014TaxCode\/bestWorstStateTax.csv Get bestWorstStateTax.csv>\r\n% * <https:\/\/blogs.mathworks.com\/images\/loren\/2014TaxCode\/medianIncome.csv Get medianIncome.csv>\r\n\r\n\r\n%% Use MATLAB to Fight the Noise\r\n% Did you enjoy seeing how to use MATLAB for debunking some bad news\r\n% analysis? Would you like to try? Perhaps you already do this yourself.\r\n% Tell us know about it <https:\/\/blogs.mathworks.com\/loren\/?p=875#respond here>.\r\n%\r\n\r\n##### SOURCE END ##### 0b195c05b1e04cd2927d8479c3e0e234\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/bestWorstStateTax_01.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>With spring comes the tax filing deadline. This post is also about taxes. I'd like to introduce this week's guest blogger, Toshi Takeuchi. Toshi analyzes web data and runs online ad campaigns here at MathWorks.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2014\/04\/07\/debunking-bad-news-analysis-with-matlab\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[62,33,6,61],"tags":[67],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/875"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=875"}],"version-history":[{"count":14,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/875\/revisions"}],"predecessor-version":[{"id":2275,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/875\/revisions\/2275"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=875"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=875"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=875"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}