Loren on the Art of MATLAB

Turn ideas into MATLAB

Note

Loren on the Art of MATLAB has been archived and will not be updated.

Can Contact Tracing Help the End COVID-19 Lockdown?

Since the pandemic became a global problem in March, we have been gratified how MATLAB users have been actively contributing to the research and development related to COVID-19. Today's guest blogger, Toshi Takeuchi, would like to focus on one interesting area of interest in the global fight against COVID-19.

Contents

Contact Tracing

As authorities are exploring strategies to gradually lifting the lockdown, contact tracing is emerging as one possible piece of a comprehensive solution. Contact tracing has over 150 years of history, but it has evolved over time in response to different types of outbreaks and with availability of technology. Are we also able to make a technology-based contribution here? Let's take a look.

John Snow

Dr. John Snow was perhaps the first person to develop a form of contact tracing. During the 1854 Cholera outbreak in London, he used data to fight against the mainstream "miasma" (or "bad air") theory of the disease to identify the true cause and helped end the outbreak. We can consider him an early pioneer of data science and epidemiology.

Here is the data Dr. Snow collected all by himself with pen and paper (source: Robin's Blog), visualized with geoscatter.

pumps = choleraKML2Tbl("pumps.kml");
cholera = choleraKML2Tbl("cholera_deaths.kml");
figure
geoscatter(cholera.Lat,cholera.Lon,cholera.Deaths*10,"filled")
hold on
geoscatter(pumps.Lat,pumps.Lon,"filled")
geoscatter(mean(cholera.Lat),mean(cholera.Lon),"magenta","filled")
geoscatter(mean(cholera.Lat),mean(cholera.Lon),40000,"magenta")
text(pumps.Lat(1),pumps.Lon(1),"The Broad Street Pump")
title("John Snow's Cholera Map, London 1854")
legend("Deaths","Pumps","Centroid of Deaths","Location","southeast")

Cholera deaths were counted and mapped to their locations. The size of a blue dot indicates the number of deaths at that location. Deaths occurred along certain routes, rather than enveloping a whole area as you would expect to see if they were caused by miasma. Dr. Snow found the Broad Street pump at the center of his map, and that led him to suspect the water supply as the possible source of the disease. Eventually, he was able to convince the local heath board to remove the handle from the pump so that people couldn't drink from that pump, and that's how he helped end the outbreak. Watch this fun YouTube video the Broad Street Pump for more details.

Note: using a circle to model the spread of the disease is more appropriate for the miasma theory. Perhaps this should be modeled as a network problem, but I kept it simple.

COVID-19 and Contact Tracing

Since location data is a critical component of contact tracing, some countries, especially in East Asia, use mobile apps to fight COVID-19. This YouTube video talks about how South Korea, thanks to lessons from MERS outbreak, was able to ease restrictions with aggressive testing and contact tracing. While there is criticism towards the use of technology in contact tracing due to privacy concerns or the premature easing of restrictions, we will eventually have to find a way out and we should explore all options.

In the US, researchers at the University of Texas at Austin use mobile data to understand how well social distancing is working, by tracking visits to retail locations such as grocery stores and pharmacies, and linking these to mortality rates. Apple and Google are also working on Bluetooth-based technology with better privacy protection. They use Bluetooth LE, which typically works in a 10-meter range, to exchange encrypted identifiers between devices that have come in contact with one another. Those keys are kept for 14 days. If a owner of such a device gets infected, then owners of the other devices are notified. This is very interesting, but is there a way for us to get a better feel for how this works with some data?

Gowalla Dataset

Mobile dataset is not widely available due to privacy concerns, but I found the Gowalla Check-in dataset that contains anonymized user activity of a now defunct mobile app service. The service let users check into a location. This is closer in nature to the retail dataset used by the researchers at the University of Texas at Austin, not the Apple/Google approach, but we may still learn something from it.

Since this is just an example, we will just use the subset of data from New York City in September 2010. Let's pretend, for the sake of this example, that the data represents what people do after the lockdown is eased, and we pick a user, 578, as the index case with diagnosis date of September 24, 2010.

nyc = loadNYCgowallaData("Gowalla_totalCheckins.txt");
contacts = 578; % index case
diagnosisDate = datetime("2010-09-24","TimeZone","America/New_York");
incubationPeriod = days(14);
contactDuration = minutes(15); % based on check-in time
maxDegreesOfSeparation = 2; % includes secondary contacts

I just picked other parameters arbitrarily, but they should be determined based on some well-established protocols set by the medical authorities.

moves = cell(maxDegreesOfSeparation + 1,1);
for ii = 1:maxDegreesOfSeparation + 1
    moves{ii} = getMovements(nyc,contacts,diagnosisDate,incubationPeriod,ii-1);
    contacts = findContacts(nyc,moves{ii},contactDuration);
end
moves = vertcat(moves{:});
moves = sortrows(moves,"Type","descend");
moves.Type = renamecats(moves.Type,"Degree0","Index Case");

Now let's visualize the data with a geodensityplot, highlighting the areas people need to avoid.

figure
colormap hot
geodensityplot(moves.Lat,moves.Lon,"FaceColor","interp","Radius",500)
hold on
alphamap(normalize((1:64).^0.2,'range'))
title(["Gowalla Check-in Dataset in NYC";"User 578 + Direct + Secondary Contacts"])

This dataset is rather limited, but it still gives us enough to appreciate some practical technical challenges.

  • Given the population density of Manhattan, the number of contacts is not very large, because Gowalla wasn't widely used. To make it effective for contact tracing, the app must be widely adopted. What is the minimum adoption rate needed to ensure it provides adequate protection?
  • How many days should we go back to for contact tracing? I used 14 days because that's the standard used for an incubation period. That's also what Apple and Google plan to use.
  • Typically, only people who come directly into contact with the index case (the first degree of separation) are traced but we could locate their contacts as well. When an alert goes out, who should be included?
  • People move around and visit many places in 14 days (this map only shows where they checked in). Is that because they were Gowalla users? Should we limit how much people can move around as we ease restrictions?
  • How do we define "contact"? I used the proximity of check-in time to keep it simple. Bluetooth LE typically works in a 10-meter range. CDC guidelines is 6 feet or 2 meters. Is 10 meters too much? It seems Apple is also planning to use the RSSI level to estimate distance.

Measuring Distance is Hard

When it comes to using RSSI to measure distance, the MathWorks UK team successfully built and demoed RFID-based people tracking system at a MATLAB Expo several years ago. Here is the floor plan that shows the positions of the base stations (blue dots) that detected nearby RFID tags and transmitted data to a central server, stationary RFID tags around catering areas (green dots), and stationary RFID tags at respective demo areas (orange dots). Attendees also walked around with mobile RFID tags.

When the time came to analyze the data collected from the event, we ran into a problem – "we had originally assumed we could use signal strength to estimate distances. However, in reality there was so much fluctuation in signal strength that you could not make a confident estimate this way". We eventually found a way around this problem as described in the linked post, but it was very hard.

We had the distance measurements for 8 base stations x 25 stationary RFID tags and the RSSI data collected from those combinations during the event. Here is the plot of RSSI against the distance. The red line represents an imaginary down-sloping line we hoped to see. As you can see, at any given distance the RSSI fluctuated so much that there was no way to derive a clear correlation between the RSSI and distance. I can't share the data, but you can read about how it was used in this post.

% distance ranges 0-35 meters
disX = 0:35;
% linear model with noise just for simplicity
RSSI = -1.6 * disX + 90 + randn(1,length(disX));
% plot the linear model against the actual data
plotExpoData(disX,RSSI);

How About GPS?

Perhaps it is easier to determine the distance with GPS? I captured GPS data using MATLAB Mobile in my neighborhood park. Rather than sharing my data, I would like to encourage you to use your own GPS data from MATLAB Mobile because it is very easy.

In order to get a distance, we need to know which points we are measuring – there is no point getting the distance between people who never actually came in contact because they were at the same point at different times. This is quite easy with timetable arrays and using the function synchronize.

load jpond.mat
synched = synchronize(jpond.A(:,1:2),jpond.B(:,1:2),"union","linear");

Then you can use the distance and deg2km functions from Mapping Toolbox. to obtain the distance.

deg = distance(synched.latitude_1,synched.longitude_1, ...
    synched.latitude_2,synched.longitude_2);
gpsDist = deg2km(deg)*1000; % convert from km to meters
[minDist,minIdx] = min(gpsDist);

In the following plot, you can see that two users walked around the pond in opposite directions, when they came in contact, and how far apart they were. There were two points where they got close, but the latter was the closest. Much easier than RSSI, but that's precisely why we have privacy concerns about using GPS data.

t = tiledlayout("flow");
nexttile([2 1])
geoplot(synched.latitude_1,synched.longitude_1,".-")
hold on
geoplot(synched.latitude_2,synched.longitude_2,".-")
geoplot(synched.latitude_1(minIdx),synched.longitude_1(minIdx),"om")
text(synched.latitude_1(minIdx),synched.longitude_1(minIdx),"Minimum", ...
    "HorizontalAlignment","right")
title("Routes")
legend("A","B","Location","best")
nexttile
plot(synched.Timestamp,synched.latitude_1)
hold on
plot(synched.Timestamp,synched.latitude_2)
plot(synched.Timestamp(minIdx),synched.latitude_2(minIdx),"mo")
text(synched.Timestamp(minIdx),synched.latitude_2(minIdx),"Minimum", ...
    "HorizontalAlignment","right")
title ("Latitude x Time")
nexttile
plot(synched.Timestamp,gpsDist)
hold on
plot(synched.Timestamp(minIdx),gpsDist(minIdx),"mo")
text(synched.Timestamp(minIdx),gpsDist(minIdx)+80,"Minimum", ...
    "HorizontalAlignment","right")
title("Distance between A & B")
ylabel("Meters")
title(t,[compose("GPS-Based Distance Tracking, min distance %.2f meters", ...
    minDist);compose("Time at min distance %s",datestr(synched.Timestamp(minIdx)))])

Call to Action

Through this brief survey, I hope I highlighted how contact tracing is related to data science and where MATLAB users may be able to make a contribution. Do you see other opportunities that I didn't mention? Perhaps you are already working on some of those problems? Share your ideas here!

Local Functions

function tbl = choleraKML2Tbl(filename)
    fileID = fopen(filename);
    C = textscan(fileID,"%s");
    fclose(fileID);
    str = join(string(C{1}'));
    coords = extractBetween(str,"<coordinates>","</coordinates>");
    coords = split(coords,",");
    coords = str2double(coords);
    tbl = table;
    if matches(filename,"pumps.kml")
        type = "Pumps";
    elseif matches(filename, "cholera_deaths.kml")
        type = "Deaths";
    end
    tbl.Type = repmat(string(type),size(coords,1),1);
    tbl.Lat = coords(:,2);
    tbl.Lon = coords(:,1);
    values = extractBetween(str,"<value>","</value>");
    if ~isempty(values)
        values = str2double(values);
        tbl.Deaths = values;
    else
        tbl.Deaths = zeros(height(tbl),1);
    end
end
function data = loadNYCgowallaData(gowallaDataFile)

    if exist("nyc.mat","file")
        load nyc.mat nyc
        data = nyc;
    else
        opts = detectImportOptions(gowallaDataFile, ...
            "FileType","delimitedtext","TextType","string");
        opts.VariableNames = {'User','CheckinTime','Lat','Lon','LocationId'};
        gowalla = readtable(gowallaDataFile,opts);
        nycLim = [40.697,-74.079;40.819,-73.880];
        nyc = gowalla(gowalla.Lat >= nycLim(1,1) & ...
            gowalla.Lat <= nycLim(2,1) & gowalla.Lon >= nycLim(1,2) & ...
            gowalla.Lon <= nycLim(2,2),:);
        nyc.CheckinTime = datetime(nyc.CheckinTime, ...
            "InputFormat","uuuu-MM-dd'T'HH:mm:ssZ","TimeZone","UTC");
        nyc.CheckinTime.TimeZone = "America/New_York";
        data = nyc;
    end
end
function moves = getMovements(tbl,users,endDate,lookBackPeriod,degree)
    moves = cell(length(users),1);
    startDate = endDate - lookBackPeriod;
    for ii = 1:length(users)
        moves{ii} = tbl(tbl.User == users(ii) & ...
            tbl.CheckinTime >= startDate & ...
            tbl.CheckinTime <= endDate,:);
    end
    moves = vertcat(moves{:});
    moves.Type = repmat("Degree" + degree,height(moves),1);
    moves.Type = categorical(moves.Type);
end
function contacts = findContacts(tbl,movements,duration)
    spots = movements(:,{'LocationId','CheckinTime'});
    spots = sortrows(spots,"CheckinTime");
    users = unique(movements.User);
    contacts = cell(height(spots),1);
    for ii = 1:height(spots)
        startTime = spots.CheckinTime(ii) - duration;
        endTime = spots.CheckinTime(ii) + duration;
        contacts{ii} = tbl(tbl.LocationId == spots.LocationId(ii) & ...
            tbl.CheckinTime > startTime & ...
            tbl.CheckinTime < endTime,:);
    end
    contacts = vertcat(contacts{:});
    contacts = unique(contacts.User);
    contacts(ismember(contacts,users)) = [];
end
function plotExpoData(x,y)
    s = load("expo.mat");
    expo = s.expo;
    figure
    plot(expo.Demos.Distance,expo.Demos.RSSI,".")
    hold on
    plot(expo.Catering.Distance,expo.Catering.RSSI,".")
    plot(x,y,"mo")
    hold off
    xlabel("Distance")
    ylabel("RSSI")
    title("Base Stations vs. Stationary Tags (Demo + Catering)")
    legend("Demo stations","Catering stations","What we wanted to see")
    text(5,10,"RSSI  vs. Distance across 8 bases x 25 anchors")
end




Published with MATLAB® R2020a


  • print