{"id":3831,"date":"2020-09-16T15:59:53","date_gmt":"2020-09-16T19:59:53","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=3831"},"modified":"2020-09-16T16:00:02","modified_gmt":"2020-09-16T20:00:02","slug":"excess-mortality-analysis-for-the-usa","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2020\/09\/16\/excess-mortality-analysis-for-the-usa\/","title":{"rendered":"Excess Mortality Analysis for the USA"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Today's guest blogger is <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1141841\">Jos Martin<\/a>, from the Parallel Computing team at MathWorks. Whilst normally focussing on making parallel computing both easy and fast in MATLAB, occasionally he likes to use MATLAB to explore other problem spaces. Here, he writes about using some CDC provided datasets in MATLAB to analyse and visualize excess death statistics across the US.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#7e5f7125-1a9c-4400-8f3f-052bbbef3b56\">Introduction<\/a><\/li><li><a href=\"#6cf78fb9-09d5-4611-96dc-ff9f6d37993e\">Getting the Data<\/a><\/li><li><a href=\"#685bcabc-3ae0-492c-9e6f-d7da95c72ad4\">Initial Data Scrubbing<\/a><\/li><li><a href=\"#271f371b-5a19-4d42-a78e-40b7971d86db\">What's Happening in Massachusetts?<\/a><\/li><li><a href=\"#e9aeaf6d-9458-4d7b-9791-745580a0434e\">How About the Whole of the USA?<\/a><\/li><li><a href=\"#9d776c22-0bf3-4960-89b3-3976a53bcd09\">How Many States are Exceeding their Upper Bound?<\/a><\/li><li><a href=\"#6dbf6306-f833-41c4-bfad-94c7928d69ba\">Which States are Exceeding their Upper Bound?<\/a><\/li><\/ul><\/div><h4>Introduction<a name=\"7e5f7125-1a9c-4400-8f3f-052bbbef3b56\"><\/a><\/h4><p>I was curious to see data on how different US states were dealing with the pandemic and the evolution of the virus over time across the whole continent. I couldn't find any visualisations that told me what I wanted to know, so I started experimenting with examining data for myself. I'm a MathWorker so obviously I turned to my favourite analysis tool, MATLAB!<\/p><p>One of the issues we face when looking at the impact of SARS-COV-2 (COVID-19) on a population is that testing for and ascribing cause from the virus is fraught with problems. Different tests have different specificities, false positives and false negatives; not all the population is tested, and even those that are tested and return positive tests may or may not have symptoms, and may or may not ultimately die from the virus. Thus working out what is actually going on is difficult. Just as an example of this issue on 12 August 2020 Public Health England <a href=\"https:\/\/www.theguardian.com\/world\/2020\/aug\/12\/coronavirus-death-toll-in-england-revised-down-by-more-than-5000\">revised down the number of COVID-19<\/a> deaths in England by ~12% due to changes in how deaths may or may not have been impacted by COVID-19.<\/p><p>Given that a bottom up approach of classifying every event (hospitalization, death, etc.) as <i>due to COVID-19<\/i> <i>or not<\/i> is going to be difficult, what we can do is look at the overall aggregate statistics for a population and ask how they differ from normal (or previous data without the effects that are currently being observed). Where the data differs from past years we know that this is probably due to those effects. An excess mortality analysis is of this type. It simply looks at the total number of deaths reported in a given region and tracks that over time to build up a reasonable model for the expected number of deaths (usually during a given week of the year). An example of such an analysis is the <a href=\"https:\/\/www.euromomo.eu\/graphs-and-maps\">EUROMOMO Project<\/a> (for Europe) which collects this data for a number of countries around Europe and provides different views of the data to help show how current events are impacting deaths in different places.<\/p><p>The <a href=\"https:\/\/www.cdc.gov\/nchs\/nvss\/vsrr\/covid19\/excess_deaths.htm\">equivalent data for the USA<\/a> is available from the CDC but has much more limited viewing options and I was particularly keen to see how different states were dealing with the pandemic and comparing the evolution of the virus over time across the whole continent.<\/p><h4>Getting the Data<a name=\"6cf78fb9-09d5-4611-96dc-ff9f6d37993e\"><\/a><\/h4><p>The following analysis is done on the <a href=\"https:\/\/www.cdc.gov\/nchs\/nvss\/vsrr\/covid19\/excess_deaths.htm\">data available from the CDC<\/a> where the total number of deaths are recorded and can be compared to the expected number of deaths in that week from prior years data. We can use the MATLAB <tt>webread<\/tt> function to simply import all the data directly from the CDC website (<a href=\"https:\/\/data.cdc.gov\/api\/views\/xkkf-xrst\/rows.csv?accessType=DOWNLOAD&amp;bom=true&amp;format=true%20target=\">National and State Estimates of Excess Deaths<\/a>) and start working with the data. Since the underlying data is tabular in a CSV file the result in MATLAB is going to be a <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/tables.html\">table<\/a><\/p><pre class=\"codeinput\">t = webread(<span class=\"string\">'https:\/\/data.cdc.gov\/api\/views\/xkkf-xrst\/rows.csv'<\/span>);\r\n<\/pre><p>To help plot the data on a <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/geobubble.html\">geobubble<\/a> chart it is useful to have another table that has the names of states (and other locations like \"New York City\" found in the above data) along with a Latitude and Longitude for that location<\/p><pre class=\"codeinput\">locations = webread(<span class=\"string\">\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/stateLocation.csv\"<\/span>);\r\n<\/pre><h4>Initial Data Scrubbing<a name=\"685bcabc-3ae0-492c-9e6f-d7da95c72ad4\"><\/a><\/h4><p>We ONLY want to look at the <tt>\"All Causes\"<\/tt> data (as there are other partitions of the data in this CSV file) and we want all the data to be weighted to predicted values (since the recent data often lacks reported deaths since it takes time to get that data to the CDC - see <a href=\"https:\/\/www.cdc.gov\/nchs\/nvss\/vsrr\/covid19\/excess_deaths.htm\">technical note<\/a> on the CDC site).<\/p><p>Convert <tt>Type<\/tt>, <tt>State<\/tt> and <tt>Outcome<\/tt> to categorical as this simplifies much of the following code which sub-selects and joins on these variables. Convert <tt>ExceedsThreshold<\/tt> to a logical as this is easier to deal with that a set of different strings.<\/p><pre class=\"codeinput\">t.Type      = categorical(t.Type);\r\nt.Outcome   = categorical(t.Outcome);\r\nt.State     = categorical(t.State);\r\n\r\nt.ExceedsThreshold = strcmp(t.ExceedsThreshold, <span class=\"string\">'true'<\/span>);\r\n<\/pre><p>Also convert <tt>State<\/tt> in the <tt>locations<\/tt> variable to a categorical as we are going to use it in conjunction with <tt>State<\/tt> in the main table later on.<\/p><pre class=\"codeinput\">locations.State = categorical(locations.State);\r\n<\/pre><p>Select the data we want (<tt>All causes<\/tt> and <tt>Predicted (weighted)<\/tt>) and remove the general <tt>State == \"United States\"<\/tt> which is the amalgamated data for all the states<\/p><pre class=\"codeinput\">t = t(t.Type == <span class=\"string\">\"Predicted (weighted)\"<\/span> &amp; t.Outcome == <span class=\"string\">\"All causes\"<\/span> &amp; t.State ~= <span class=\"string\">\"United States\"<\/span>, :);\r\n<\/pre><p>Get a time range covering all the data for later use<\/p><pre class=\"codeinput\">timeRange = unique(t.WeekEndingDate);\r\n<\/pre><h4>What's Happening in Massachusetts?<a name=\"271f371b-5a19-4d42-a78e-40b7971d86db\"><\/a><\/h4><p>Sub-select the data using the <tt>State<\/tt> variable (you can change this to any state you prefer to see what is happening in that state)<\/p><pre class=\"codeinput\">m = t(t.State == <span class=\"string\">\"Massachusetts\"<\/span>, :);\r\n<\/pre><p>Plot both the observed number of deaths over the time period, along with the upper bound threshold which represents the expected number of deaths plus several standard deviations of the distribution. Where the number of deaths is significantly more than the upper bound there is significantly more excess mortality than expected.<\/p><pre class=\"codeinput\">plot(timeRange, m.ObservedNumber, <span class=\"string\">'b'<\/span>);\r\nhold <span class=\"string\">on<\/span>\r\nplot(timeRange, m.UpperBoundThreshold, <span class=\"string\">'r'<\/span>)\r\ngrid <span class=\"string\">on<\/span>\r\ntitle <span class=\"string\">'Recorded Deaths per week'<\/span>\r\nylabel <span class=\"string\">'Total number of recorded deaths per week'<\/span>\r\nhold <span class=\"string\">off<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/covidAnalysisInM_01.png\" alt=\"\"> <p>[ <i>Mid Sept 2020 comment<\/i> ] Good to see that for the last several weeks the excess mortality seems to be returning under the upper bound which suggests that MA has returned its mortality rates to close to normal even though there is still a pandemic across the US and the world.<\/p><h4>How About the Whole of the USA?<a name=\"e9aeaf6d-9458-4d7b-9791-745580a0434e\"><\/a><\/h4><p>What does the same data look like if plotted across the whole of the USA? Whilst we removed the aggregate numbers from the table earlier we can easily reproduce them by grouping the data by its date and summing across all the states. Use <a href=\"https:\/\/uk.mathworks.com\/help\/matlab\/ref\/findgroups.html\">findgroups<\/a> based on the <tt>WeekEndingDate<\/tt> to define the groups of data that all have the same week ending date so that <a href=\"https:\/\/uk.mathworks.com\/help\/matlab\/ref\/splitapply.html\">splitapply<\/a> on these groups can sum all the numbers for all states in each group to get the correct aggregate numbers.<\/p><pre class=\"codeinput\">G = findgroups(t.WeekEndingDate);\r\nplot(timeRange, splitapply(@sum, t.ObservedNumber, G), <span class=\"string\">'b'<\/span>)\r\nhold <span class=\"string\">on<\/span>\r\nplot(timeRange, splitapply(@sum, t.UpperBoundThreshold, G), <span class=\"string\">'r'<\/span>)\r\ngrid <span class=\"string\">on<\/span>\r\ntitle <span class=\"string\">'Recorded Deaths per week'<\/span>\r\nylabel <span class=\"string\">'Total number of recorded deaths per week'<\/span>\r\nhold <span class=\"string\">off<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/covidAnalysisInM_02.png\" alt=\"\"> <p>[ <i>Mid Sept 2020 comment<\/i> ] Note the significant spike during <a href=\"https:\/\/en.wikipedia.org\/wiki\/2017%E2%80%932018_United_States_flu_season\">winter 2017 - 2018 flu season<\/a>. In addition it appears that across the whole of the USA excess mortality doesn't seem to returned under the upper threshold, is running about 8,000 per week higher than expected for this time of year, and appears to be on an upward trend.<\/p><h4>How Many States are Exceeding their Upper Bound?<a name=\"9d776c22-0bf3-4960-89b3-3976a53bcd09\"><\/a><\/h4><p>Given that we know at any given time if a State is exceeding its threshold we can easily plot the number of states that exceed their upper bound for the time range.<\/p><pre class=\"codeinput\">plot(timeRange, splitapply(@sum, t.ExceedsThreshold, G));\r\ngrid <span class=\"string\">on<\/span>\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/covidAnalysisInM_03.png\" alt=\"\"> <p>Again note the significant jump in the flu season for 2017-2018, and that currently about 30 states are still above their upper bound. But this is just an aggregate number, it might be interesting to see which states are currently above their bound and to see the evolution of the pandemic across the continental USA over time by tracking which states exceeded their bound at a given point in time.<\/p><h4>Which States are Exceeding their Upper Bound?<a name=\"6dbf6306-f833-41c4-bfad-94c7928d69ba\"><\/a><\/h4><p>To look at which states are exceeding their upper bound (and by how much) we need to look at a point in time. We will measure this as a number of weeks since the latest data were recorded. We can then sub-select all the data so that we only have a particular week to analyse. Change the value below to see what happens week by week across the US<\/p><pre class=\"codeinput\">weeksAgo = 1;\r\nX = t(t.WeekEndingDate == timeRange(end-weeksAgo), :);\r\n<\/pre><p>Use the <tt>location<\/tt> table joined with the data on <tt>State<\/tt> so that we now have the Latitude and Longitude of the States on our geobubble chart<\/p><pre class=\"codeinput\"><span class=\"comment\">% Join tables<\/span>\r\nX = outerjoin(X, locations, <span class=\"string\">'Type'<\/span>, <span class=\"string\">'left'<\/span> ,<span class=\"string\">'Keys'<\/span>, {<span class=\"string\">'State'<\/span>});\r\n<\/pre><p>The bubble size should be normalized by the expected number of deaths in a given state. So compute the excess mortality (Observed - upper bound) and divide by something like the standard deviation (upper bound minus expected). This number is essentially the number of standard deviations from the expected for that State.<\/p><pre class=\"codeinput\">X.BubbleSize = max((X.ObservedNumber - X.UpperBoundThreshold).\/(X.UpperBoundThreshold-X.AverageExpectedCount), 0);\r\n<\/pre><p>Convert <tt>ExceedsThreshold<\/tt> to a categorical so that it can label the geobubble chart and set the plot limits to roughly the edges of the continental USA.<\/p><pre class=\"codeinput\">X.ExceedsThreshold = categorical(X.ExceedsThreshold);\r\nX = X(X.ExceedsThreshold == <span class=\"string\">\"true\"<\/span>, :);\r\ngeobubble(X, <span class=\"string\">\"Latitude\"<\/span>, <span class=\"string\">\"Longtitude\"<\/span>, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">\"Basemap\"<\/span>,<span class=\"string\">\"darkwater\"<\/span>, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">\"SizeVariable\"<\/span>, <span class=\"string\">\"BubbleSize\"<\/span>, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">\"ColorVariable\"<\/span>, <span class=\"string\">\"ExceedsThreshold\"<\/span>, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">\"MapLayout\"<\/span>, <span class=\"string\">\"normal\"<\/span>, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">\"SizeLimits\"<\/span>, [0 10]);\r\n\r\ngeolimits([20 51], [-126 -65])\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/covidAnalysisInM_04.png\" alt=\"\"> <p>This plot shows which states are exceeding their expected threshold and the size of the bubble indicates (in a population normalized way) by how much they are over. By changing the slider above (i.e. changing the week you are looking at) you can investigate the evolution of the pandemic across the states, watching its evolution mostly from the east coast at the beginning of the pandemic to the southern states more recently.<\/p><p>Please download the <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/covidAnalysis.mlx\">MLX file for this example<\/a>, and investigate the data and its implications for yourself.  Let us know what you find <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=3831#respond\">here<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_5a014a9244234d7981f0bb7dfe3a18c6() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='5a014a9244234d7981f0bb7dfe3a18c6 ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' 5a014a9244234d7981f0bb7dfe3a18c6';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2020 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_5a014a9244234d7981f0bb7dfe3a18c6()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2020b<br><\/p><\/div><!--\r\n5a014a9244234d7981f0bb7dfe3a18c6 ##### SOURCE BEGIN #####\r\n%% Excess Mortality Analysis for the USA\r\n% Today's guest blogger is\r\n% <https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1141841 Jos\r\n% Martin>, from the Parallel Computing team at MathWorks. Whilst normally\r\n% focussing on making parallel computing both easy and fast in MATLAB,\r\n% occasionally he likes to use MATLAB to explore other problem spaces.\r\n% Here, he writes about using some CDC provided datasets in MATLAB to\r\n% analyse and visualize excess death statistics across the US.\r\n\r\n%% Introduction\r\n% I was curious to see data on how different US states were dealing with the \r\n% pandemic and the evolution of the virus over time across the whole continent. \r\n% I couldn't find any visualisations that told me what I wanted to know, so I \r\n% started experimenting with examining data for myself. I'm a MathWorker so obviously \r\n% I turned to my favourite analysis tool, MATLAB!\r\n% \r\n% One of the issues we face when looking at the impact of SARS-COV-2 (COVID-19) \r\n% on a population is that testing for and ascribing cause from the virus is fraught \r\n% with problems. Different tests have different specificities, false positives \r\n% and false negatives; not all the population is tested, and even those that are \r\n% tested and return positive tests may or may not have symptoms, and may or may \r\n% not ultimately die from the virus. Thus working out what is actually going on \r\n% is difficult. Just as an example of this issue on 12 August 2020 Public Health \r\n% England <https:\/\/www.theguardian.com\/world\/2020\/aug\/12\/coronavirus-death-toll-in-england-revised-down-by-more-than-5000 \r\n% revised down the number of COVID-19> deaths in England by ~12% due to changes \r\n% in how deaths may or may not have been impacted by COVID-19.\r\n% \r\n% Given that a bottom up approach of classifying every event (hospitalization, \r\n% death, etc.) as _due to COVID-19_ _or not_ is going to be difficult, what we \r\n% can do is look at the overall aggregate statistics for a population and ask \r\n% how they differ from normal (or previous data without the effects that are currently \r\n% being observed). Where the data differs from past years we know that this is \r\n% probably due to those effects. An excess mortality analysis is of this type. \r\n% It simply looks at the total number of deaths reported in a given region and \r\n% tracks that over time to build up a reasonable model for the expected number \r\n% of deaths (usually during a given week of the year). An example of such an analysis \r\n% is the <https:\/\/www.euromomo.eu\/graphs-and-maps EUROMOMO Project> (for Europe) \r\n% which collects this data for a number of countries around Europe and provides \r\n% different views of the data to help show how current events are impacting deaths \r\n% in different places.\r\n% \r\n% The <https:\/\/www.cdc.gov\/nchs\/nvss\/vsrr\/covid19\/excess_deaths.htm equivalent \r\n% data for the USA> is available from the CDC but has much more limited viewing \r\n% options and I was particularly keen to see how different states were dealing \r\n% with the pandemic and comparing the evolution of the virus over time across \r\n% the whole continent.\r\n%% Getting the Data\r\n% The following analysis is done on the <https:\/\/www.cdc.gov\/nchs\/nvss\/vsrr\/covid19\/excess_deaths.htm \r\n% data available from the CDC> where the total number of deaths are recorded and \r\n% can be compared to the expected number of deaths in that week from prior years \r\n% data. We can use the MATLAB |webread| function to simply import all the data \r\n% directly from the CDC website (<https:\/\/data.cdc.gov\/api\/views\/xkkf-xrst\/rows.csv?accessType=DOWNLOAD&bom=true&format=true%20target= \r\n% National and State Estimates of Excess Deaths>) and start working with the data. \r\n% Since the underlying data is tabular in a CSV file the result in MATLAB is going \r\n% to be a <https:\/\/www.mathworks.com\/help\/matlab\/tables.html table> \r\n\r\nt = webread('https:\/\/data.cdc.gov\/api\/views\/xkkf-xrst\/rows.csv');\r\n%% \r\n% To help plot the data on a <https:\/\/www.mathworks.com\/help\/matlab\/ref\/geobubble.html \r\n% geobubble> chart it is useful to have another table that has the names of states \r\n% (and other locations like \"New York City\" found in the above data) along with \r\n% a Latitude and Longitude for that location\r\n\r\nlocations = webread(\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/stateLocation.csv\");\r\n%% Initial Data Scrubbing\r\n% We ONLY want to look at the |\"All Causes\"| data (as there are other partitions \r\n% of the data in this CSV file) and we want all the data to be weighted to predicted \r\n% values (since the recent data often lacks reported deaths since it takes time \r\n% to get that data to the CDC - see <https:\/\/www.cdc.gov\/nchs\/nvss\/vsrr\/covid19\/excess_deaths.htm \r\n% technical note> on the CDC site).  \r\n% \r\n% Convert |Type|, |State| and |Outcome| to categorical as this simplifies much \r\n% of the following code which sub-selects and joins on these variables. Convert \r\n% |ExceedsThreshold| to a logical as this is easier to deal with that a set of \r\n% different strings.\r\n\r\nt.Type      = categorical(t.Type);\r\nt.Outcome   = categorical(t.Outcome);\r\nt.State     = categorical(t.State);\r\n\r\nt.ExceedsThreshold = strcmp(t.ExceedsThreshold, 'true');\r\n%% \r\n% Also convert |State| in the |locations| variable to a categorical as we are \r\n% going to use it in conjunction with |State| in the main table later on.\r\n\r\nlocations.State = categorical(locations.State);\r\n%% \r\n% Select the data we want (|All causes| and |Predicted (weighted)|) and remove \r\n% the general |State == \"United States\"| which is the amalgamated data for all \r\n% the states\r\n\r\nt = t(t.Type == \"Predicted (weighted)\" & t.Outcome == \"All causes\" & t.State ~= \"United States\", :);\r\n%% \r\n% Get a time range covering all the data for later use\r\n\r\ntimeRange = unique(t.WeekEndingDate);\r\n%% What's Happening in Massachusetts?\r\n% Sub-select the data using the |State| variable (you can change this to any \r\n% state you prefer to see what is happening in that state)\r\n\r\nm = t(t.State == \"Massachusetts\", :);\r\n%% \r\n% Plot both the observed number of deaths over the time period, along with the \r\n% upper bound threshold which represents the expected number of deaths plus several \r\n% standard deviations of the distribution. Where the number of deaths is significantly \r\n% more than the upper bound there is significantly more excess mortality than \r\n% expected.\r\n\r\nplot(timeRange, m.ObservedNumber, 'b');\r\nhold on\r\nplot(timeRange, m.UpperBoundThreshold, 'r')\r\ngrid on\r\ntitle 'Recorded Deaths per week'\r\nylabel 'Total number of recorded deaths per week'\r\nhold off\r\n%% \r\n% [ _Mid Sept 2020 comment_ ] Good to see that for the last several weeks the \r\n% excess mortality seems to be returning under the upper bound which suggests \r\n% that MA has returned its mortality rates to close to normal even though there \r\n% is still a pandemic across the US and the world.\r\n%% How About the Whole of the USA?\r\n% What does the same data look like if plotted across the whole of the USA? \r\n% Whilst we removed the aggregate numbers from the table earlier we can easily \r\n% reproduce them by grouping the data by its date and summing across all the states. \r\n% Use <https:\/\/uk.mathworks.com\/help\/matlab\/ref\/findgroups.html findgroups> based \r\n% on the |WeekEndingDate| to define the groups of data that all have the same \r\n% week ending date so that <https:\/\/uk.mathworks.com\/help\/matlab\/ref\/splitapply.html \r\n% splitapply> on these groups can sum all the numbers for all states in each group \r\n% to get the correct aggregate numbers.\r\n\r\nG = findgroups(t.WeekEndingDate);\r\nplot(timeRange, splitapply(@sum, t.ObservedNumber, G), 'b')\r\nhold on\r\nplot(timeRange, splitapply(@sum, t.UpperBoundThreshold, G), 'r')\r\ngrid on\r\ntitle 'Recorded Deaths per week'\r\nylabel 'Total number of recorded deaths per week'\r\nhold off\r\n%% \r\n% [ _Mid Sept 2020 comment_ ] Note the significant spike during <https:\/\/en.wikipedia.org\/wiki\/2017%E2%80%932018_United_States_flu_season \r\n% winter 2017 - 2018 flu season>. In addition it appears that across the whole \r\n% of the USA excess mortality doesn't seem to returned under the upper threshold, \r\n% is running about 8,000 per week higher than expected for this time of year, \r\n% and appears to be on an upward trend.\r\n%% How Many States are Exceeding their Upper Bound?\r\n% Given that we know at any given time if a State is exceeding its threshold \r\n% we can easily plot the number of states that exceed their upper bound for the \r\n% time range.\r\n\r\nplot(timeRange, splitapply(@sum, t.ExceedsThreshold, G));\r\ngrid on\r\n%% \r\n% Again note the significant jump in the flu season for 2017-2018, and that \r\n% currently about 30 states are still above their upper bound. But this is just \r\n% an aggregate number, it might be interesting to see which states are currently \r\n% above their bound and to see the evolution of the pandemic across the continental \r\n% USA over time by tracking which states exceeded their bound at a given point \r\n% in time.\r\n%% Which States are Exceeding their Upper Bound?\r\n% To look at which states are exceeding their upper bound (and by how much) \r\n% we need to look at a point in time. We will measure this as a number of weeks \r\n% since the latest data were recorded. We can then sub-select all the data so \r\n% that we only have a particular week to analyse. Change the value below to see \r\n% what happens week by week across the US\r\n\r\nweeksAgo = 1;\r\nX = t(t.WeekEndingDate == timeRange(end-weeksAgo), :);\r\n%% \r\n% Use the |location| table joined with the data on |State| so that we now have \r\n% the Latitude and Longitude of the States on our geobubble chart\r\n\r\n% Join tables\r\nX = outerjoin(X, locations, 'Type', 'left' ,'Keys', {'State'});\r\n%% \r\n% The bubble size should be normalized by the expected number of deaths in a \r\n% given state. So compute the excess mortality (Observed - upper bound) and divide \r\n% by something like the standard deviation (upper bound minus expected). This \r\n% number is essentially the number of standard deviations from the expected for \r\n% that State.\r\n\r\nX.BubbleSize = max((X.ObservedNumber - X.UpperBoundThreshold).\/(X.UpperBoundThreshold-X.AverageExpectedCount), 0);\r\n%% \r\n% Convert |ExceedsThreshold| to a categorical so that it can label the geobubble \r\n% chart and set the plot limits to roughly the edges of the continental USA.\r\n\r\nX.ExceedsThreshold = categorical(X.ExceedsThreshold);\r\nX = X(X.ExceedsThreshold == \"true\", :);\r\ngeobubble(X, \"Latitude\", \"Longtitude\", ...\r\n    \"Basemap\",\"darkwater\", ...\r\n    \"SizeVariable\", \"BubbleSize\", ...\r\n    \"ColorVariable\", \"ExceedsThreshold\", ...\r\n    \"MapLayout\", \"normal\", ...\r\n    \"SizeLimits\", [0 10]);\r\n\r\ngeolimits([20 51], [-126 -65])\r\n%% \r\n% This plot shows which states are exceeding their expected threshold and the \r\n% size of the bubble indicates (in a population normalized way) by how much they \r\n% are over. By changing the slider above (i.e. changing the week you are looking \r\n% at) you can investigate the evolution of the pandemic across the states, watching \r\n% its evolution mostly from the east coast at the beginning of the pandemic to \r\n% the southern states more recently.\r\n% \r\n% Please download the <https:\/\/blogs.mathworks.com\/images\/loren\/2020\/covidAnalysis.mlx MLX file for this example>, and investigate the \r\n% data and its implications for yourself.  Let us know what you find\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=3831#respond here>.\r\n##### SOURCE END ##### 5a014a9244234d7981f0bb7dfe3a18c6\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2020\/covidAnalysisInM_04.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Today's guest blogger is <a href=\"https:\/\/www.mathworks.com\/matlabcentral\/profile\/authors\/1141841\">Jos Martin<\/a>, from the Parallel Computing team at MathWorks. Whilst normally focussing on making parallel computing both easy and fast in MATLAB, occasionally he likes to use MATLAB to explore other problem spaces. Here, he writes about using some CDC provided datasets in MATLAB to analyse and visualize excess death statistics across the US.... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2020\/09\/16\/excess-mortality-analysis-for-the-usa\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[25,39,40],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/3831"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=3831"}],"version-history":[{"count":2,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/3831\/revisions"}],"predecessor-version":[{"id":3835,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/3831\/revisions\/3835"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=3831"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=3831"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=3831"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}