{"id":980,"date":"2014-09-06T14:25:12","date_gmt":"2014-09-06T19:25:12","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=980"},"modified":"2014-10-03T01:56:40","modified_gmt":"2014-10-03T06:56:40","slug":"analyzing-uber-ride-sharing-gps-data","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2014\/09\/06\/analyzing-uber-ride-sharing-gps-data\/","title":{"rendered":"Analyzing Uber Ride Sharing GPS Data"},"content":{"rendered":"<div class=\"content\"><!--introduction--><p>Many of us carry around smartphones that can track our GPS positions and that's an interesting source of data. How can we analyze GPS data in MATLAB?<\/p><p>Today's guest blogger, <a\r\nhref=\"\" rel=\"author\"> Toshi\r\nTakeuchi<\/a>, would like to share an analysis of a public GPS dataset\r\nfrom a popular ride sharing service Uber.<\/p><!--\/introduction--><h3>Contents<\/h3><div><ul><li><a href=\"#38418630-821e-4d2e-a7b1-8bdaa7fad5ee\">Introduction<\/a><\/li><li><a href=\"#6c10eb7e-4347-44ea-80f8-45b642974749\">Uber anonymized GPS logs<\/a><\/li><li><a href=\"#0a6bee76-e7c4-4232-84b9-478acb1e3519\">Does the usage change over time?<\/a><\/li><li><a href=\"#7a03f9c0-335c-4268-b8bb-49d10b7cae46\">Where do they go during the weekend?<\/a><\/li><li><a href=\"#a72339b0-1ad2-4c8e-9c44-9b5aaed88367\">Visualizing the traffic patterns with Gephi<\/a><\/li><li><a href=\"#6771dcf0-ccb5-4ac7-88d7-90f24c1d7abc\">Summary<\/a><\/li><\/ul><\/div><h4>Introduction<a name=\"38418630-821e-4d2e-a7b1-8bdaa7fad5ee\"><\/a><\/h4><p><a href=\"http:\/\/uber.com\">Uber<\/a> is a ride sharing service that connects passengers with private drivers through a mobile app and takes care of payment. They are in fact so popular that you hear about them in the news due to their conflicts with local traffic regulations and taxi business interests.<\/p><p>Uber\u2019s ride sharing GPS data was available publicly on infochimps.com, so I used it for this analysis (Unfortunately it is not available anymore).  What can we learn from this dataset?<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/car.png\" alt=\"\"> <\/p><h4>Uber anonymized GPS logs<a name=\"6c10eb7e-4347-44ea-80f8-45b642974749\"><\/a><\/h4><p>Let's start by downloading the dataset from the link above (a zipped TSV file), which contains the GPS logs taken from the mobile apps in Uber cars that were actively transporting passengers in San Francisco. The data have been anonymized by removing names, trip start and end points. The dates were also substituted. Weekdays and time of day are still intact.<\/p><p>For the purpose of this analysis, let's focus on the data captured in the city proper and visualize it with <a href=\"https:\/\/www.mathworks.com\/products\/mapping\/\">Mapping Toolbox<\/a>.<\/p><p>Run the script to load data. Check <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/loadData.m\"><tt>loadData.m<\/tt><\/a> to see the details.<\/p><pre class=\"codeinput\">loadData\r\n<\/pre><p>Overlay the GPS points on the map.<\/p><pre class=\"codeinput\">states = geoshape(shaperead(<span class=\"string\">'usastatehi'<\/span>, <span class=\"string\">'UseGeoCoords'<\/span>, true));\r\nlatlim = [min(T.Lat) max(T.Lat)];\r\nlonlim = [min(T.Lon) max(T.Lon)];\r\nocean = [0.7 0.8 1]; land = [0.9 0.9 0.8];\r\n\r\nfigure\r\nax = usamap(latlim, lonlim);\r\nsetm(ax, <span class=\"string\">'FFaceColor'<\/span>, ocean)\r\ngeoshow(states,<span class=\"string\">'FaceColor'<\/span>,land)\r\ngeoshow(T.Lat,T.Lon,<span class=\"string\">'DisplayType'<\/span>,<span class=\"string\">'Point'<\/span>,<span class=\"string\">'Marker'<\/span>,<span class=\"string\">'.'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'MarkerSize'<\/span>,4,<span class=\"string\">'MarkerEdgeColor'<\/span>,[0 0 1])\r\ntitle(<span class=\"string\">'Uber GPS Log Data'<\/span>)\r\nxlabel(<span class=\"string\">'San Francisco'<\/span>)\r\ntextm(37.802069,-122.446618,<span class=\"string\">'Marina'<\/span>)\r\ntextm(37.808376,-122.426105,<span class=\"string\">'Fishermans Wharf'<\/span>)\r\ntextm(37.797322,-122.482409,<span class=\"string\">'Presidio'<\/span>)\r\ntextm(37.774546,-122.412329,<span class=\"string\">'SOMA'<\/span>)\r\ntextm(37.770731,-122.440481,<span class=\"string\">'Haight'<\/span>)\r\ntextm(37.818276,-122.498546,<span class=\"string\">'Golden Gate Bridge'<\/span>)\r\ntextm(37.819632,-122.376065,<span class=\"string\">'Bay Bridge'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uberGPS_01.png\" alt=\"\"> <h4>Does the usage change over time?<a name=\"0a6bee76-e7c4-4232-84b9-478acb1e3519\"><\/a><\/h4><p>Let's start with a basic question - how does the use of Uber service change over time. We can use <a href=\"https:\/\/www.mathworks.com\/help\/stats\/grpstats.html\"><tt>grpstats<\/tt><\/a> to summarize data grouped by specific categorical values, such as <tt>DayName<\/tt> and <tt>TimeOfDay<\/tt>, which were added in the data loading process.<\/p><p>Get grouped summaries.<\/p><pre class=\"codeinput\">byDay = grpstats(T(:,{<span class=\"string\">'Lat'<\/span>,<span class=\"string\">'Lon'<\/span>,<span class=\"string\">'DayName'<\/span>}),<span class=\"string\">'DayName'<\/span>);\r\nbyDayTime = grpstats(T(:,{<span class=\"string\">'Lat'<\/span>,<span class=\"string\">'Lon'<\/span>,<span class=\"string\">'TimeOfDay'<\/span>,<span class=\"string\">'DayName'<\/span>}),<span class=\"keyword\">...<\/span>\r\n    {<span class=\"string\">'DayName'<\/span>,<span class=\"string\">'TimeOfDay'<\/span>});\r\n<\/pre><p>Reshape the count of entries into a 24x7 matrix.<\/p><pre class=\"codeinput\">byDayTimeCount = reshape(byDayTime.GroupCount,24,7)';\r\n<\/pre><p>Plot the data by day of week and by hours per day of week.<\/p><pre class=\"codeinput\">figure\r\nsubplot(2,1,1)\r\nbar(byDay.GroupCount); set(gca,<span class=\"string\">'XTick'<\/span>,1:7,<span class=\"string\">'XTickLabel'<\/span>,cellstr(byDay.DayName));\r\nsubplot(2,1,2)\r\nplot(byDayTimeCount'); set(gca,<span class=\"string\">'XTick'<\/span>,1:24); xlabel(<span class=\"string\">'Hours by Day of Week'<\/span>);\r\nlegend(<span class=\"string\">'Mon'<\/span>,<span class=\"string\">'Tue'<\/span>,<span class=\"string\">'Wed'<\/span>,<span class=\"string\">'Thu'<\/span>,<span class=\"string\">'Fri'<\/span>,<span class=\"string\">'Sat'<\/span>,<span class=\"string\">'Sun'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'Orientation'<\/span>,<span class=\"string\">'Horizontal'<\/span>,<span class=\"string\">'Location'<\/span>,<span class=\"string\">'SouthOutside'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uberGPS_02.png\" alt=\"\"> <p>It looks like the usage goes up during the weekend (Friday through Sunday) and usage peaks in early hours of the day. San Francisco has a very active night life!<\/p><h4>Where do they go during the weekend?<a name=\"7a03f9c0-335c-4268-b8bb-49d10b7cae46\"><\/a><\/h4><p>Is there a way to figure out where people go during the weekend? Even though the dataset doesn't contain the actual starting and ending points of individual trips, we may still get a sense of how the traffic flows by looking at the first and last points of each record.<\/p><p>We can extract the starting and ending location data for weekend rides. Click <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/getStartEndPoints.m\"><tt>getStartEndPoints.m<\/tt><\/a> to see how it is done. If you would like to run this script, please download <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/districts.xlsx\"><tt>districts.xlsx<\/tt><\/a> as well.<\/p><pre class=\"codeinput\"><span class=\"comment\">% Here we load the preprocessed data |startEnd.mat| to save time and plot<\/span>\r\n<span class=\"comment\">% their starting points.<\/span>\r\n\r\n<span class=\"comment\">% getStartEndPoints % commented out to save time<\/span>\r\nload <span class=\"string\">startEnd.mat<\/span> <span class=\"comment\">% load the preprocessed data instead<\/span>\r\n\r\nfigure\r\nax = usamap(latlim, lonlim);\r\nsetm(ax, <span class=\"string\">'FFaceColor'<\/span>, ocean)\r\ngeoshow(states,<span class=\"string\">'FaceColor'<\/span>,land)\r\ngeoshow(startEnd.StartLat,startEnd.StartLon,<span class=\"string\">'DisplayType'<\/span>,<span class=\"string\">'Point'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'Marker'<\/span>,<span class=\"string\">'.'<\/span>,<span class=\"string\">'MarkerSize'<\/span>,5,<span class=\"string\">'MarkerEdgeColor'<\/span>,[0 0 1])\r\ntitle(<span class=\"string\">'Uber Weekend Rides - Starting Points'<\/span>)\r\nxlabel(<span class=\"string\">'San Francisco'<\/span>)\r\ntextm(37.802069,-122.446618,<span class=\"string\">'Marina'<\/span>)\r\ntextm(37.808376,-122.426105,<span class=\"string\">'Fishermans Wharf'<\/span>)\r\ntextm(37.797322,-122.482409,<span class=\"string\">'Presidio'<\/span>)\r\ntextm(37.774546,-122.412329,<span class=\"string\">'SOMA'<\/span>)\r\ntextm(37.770731,-122.440481,<span class=\"string\">'Haight'<\/span>)\r\ntextm(37.818276,-122.498546,<span class=\"string\">'Golden Gate Bridge'<\/span>)\r\ntextm(37.819632,-122.376065,<span class=\"string\">'Bay Bridge'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uberGPS_03.png\" alt=\"\"> <p>When you plot the longitude and latitude data, you just get messy point clouds and it is hard to see what's going on. Instead, I broke the map of San Francisco into rectangular blocks to approximate its districts. Here is the new plot of starting points by district.<\/p><pre class=\"codeinput\">dist = categories(startEnd.StartDist);\r\ncc = hsv(length(dist));\r\n\r\nfigure\r\nax = usamap(latlim, lonlim);\r\nsetm(ax, <span class=\"string\">'FFaceColor'<\/span>, ocean)\r\ngeoshow(states,<span class=\"string\">'FaceColor'<\/span>,land)\r\n<span class=\"keyword\">for<\/span> i = 1:length(dist)\r\n    inDist = startEnd.StartDist == dist(i);\r\n    geoshow(startEnd.StartLat(inDist),startEnd.StartLon(inDist),<span class=\"keyword\">...<\/span>\r\n        <span class=\"string\">'DisplayType'<\/span>,<span class=\"string\">'Point'<\/span>,<span class=\"string\">'Marker'<\/span>,<span class=\"string\">'.'<\/span>,<span class=\"string\">'MarkerSize'<\/span>,5,<span class=\"string\">'MarkerEdgeColor'<\/span>,cc(i,:))\r\n<span class=\"keyword\">end<\/span>\r\ntitle(<span class=\"string\">'Uber Weekend Rides - Starting Points by District'<\/span>)\r\nxlabel(<span class=\"string\">'San Francisco'<\/span>)\r\ntextm(37.802069,-122.446618,<span class=\"string\">'Marina'<\/span>)\r\ntextm(37.808376,-122.426105,<span class=\"string\">'Fishermans Wharf'<\/span>)\r\ntextm(37.797322,-122.482409,<span class=\"string\">'Presidio'<\/span>)\r\ntextm(37.774546,-122.412329,<span class=\"string\">'SOMA'<\/span>)\r\ntextm(37.770731,-122.440481,<span class=\"string\">'Haight'<\/span>)\r\ntextm(37.818276,-122.498546,<span class=\"string\">'Golden Gate Bridge'<\/span>)\r\ntextm(37.819632,-122.376065,<span class=\"string\">'Bay Bridge'<\/span>)\r\n<\/pre><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uberGPS_04.png\" alt=\"\"> <h4>Visualizing the traffic patterns with Gephi<a name=\"a72339b0-1ad2-4c8e-9c44-9b5aaed88367\"><\/a><\/h4><p>This is a step in the right direction. Now that we have the starting and ending points grouped by districts, we can represent the rides as connections among different districts - this is essentially a graph with districts as nodes and rides as edges. To visualize this graph, we can use a popular social networking analysis tool <a href=\"https:\/\/gephi.github.io\/\">Gephi<\/a>, which was also used in another post, <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=920\">Analyzing Twitter with MATLAB<\/a>.<\/p><p>You can export <tt>StartDist<\/tt> and <tt>EndDist<\/tt> as the edge list to Gephi in CSV format.<\/p><pre class=\"codeinput\">writetable(startEnd(:,{<span class=\"string\">'StartDist'<\/span>,<span class=\"string\">'EndDist'<\/span>}),<span class=\"string\">'edgelist.csv'<\/span>,<span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'WriteVariableNames'<\/span>,false)\r\n<\/pre><p>Once you export the edge list, you can plot the connections (edges) between districts (nodes) in Gephi. Now it is much easier to see where people went during the weekend! To see a bigger image, check out <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uber_graph.pdf\">the PDF version<\/a>.<\/p><p><img decoding=\"async\" vspace=\"5\" hspace=\"5\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uber_graph.png\" alt=\"\"> <\/p><div><ul><li>The size of the district nodes represents their <a href=\"http:\/\/en.wikipedia.org\/wiki\/Degree_(graph_theory)\">in-degrees<\/a>, the number of incoming connections, and you can think of it as measure of popularity as destinations. SOMA, Haight, Mission District, Downtown, and The Castro are the popular locations based on this measure.<\/li><li>The districts are colored based on their <a href=\"http:\/\/en.wikipedia.org\/wiki\/Modularity_(networks)\">modularity<\/a>, which basically means which cluster of nodes they belong to. It looks like people hang around set of districts that are nearby - SOMA, Downtown, Mission District are all located towards the south (green). The Castro, Haight, Western Addition in the center (purple) and it is strongly connected to Richmond and Sunset District in the west. Since those are residential areas, it seems people from those areas hang out in the other districts in the same cluster.<\/li><li>The locals don't seem to go to Fisherman's Wharf or Chinatown in the north (red) very much - they are probably considered not cool because of tourists?<\/li><\/ul><\/div><h4>Summary<a name=\"6771dcf0-ccb5-4ac7-88d7-90f24c1d7abc\"><\/a><\/h4><p>Now you know where to go in San Francisco during the weekend if you want to experience an active night life there. We just looked at the overall weekend data, but you can explore more by time slicing the data to see how the traffic pattern changes based on the time of day or day of the week. You may be able to find traffic congestions by calculating the speed using the timestamps. Try it yourself and share what you find <a href=\"https:\/\/blogs.mathworks.com\/loren\/?p=980#respond\">here!<\/a>.<\/p><script language=\"JavaScript\"> <!-- \r\n    function grabCode_c9e0cc11fe4445e3add9acae666346fc() {\r\n        \/\/ Remember the title so we can use it in the new page\r\n        title = document.title;\r\n\r\n        \/\/ Break up these strings so that their presence\r\n        \/\/ in the Javascript doesn't mess up the search for\r\n        \/\/ the MATLAB code.\r\n        t1='c9e0cc11fe4445e3add9acae666346fc ' + '##### ' + 'SOURCE BEGIN' + ' #####';\r\n        t2='##### ' + 'SOURCE END' + ' #####' + ' c9e0cc11fe4445e3add9acae666346fc';\r\n    \r\n        b=document.getElementsByTagName('body')[0];\r\n        i1=b.innerHTML.indexOf(t1)+t1.length;\r\n        i2=b.innerHTML.indexOf(t2);\r\n \r\n        code_string = b.innerHTML.substring(i1, i2);\r\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\r\n\r\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \r\n        \/\/ in the XML parser.\r\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\r\n        \/\/ doesn't go ahead and substitute the less-than character. \r\n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\r\n\r\n        copyright = 'Copyright 2014 The MathWorks, Inc.';\r\n\r\n        w = window.open();\r\n        d = w.document;\r\n        d.write('<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\\n');\r\n\r\n        d.title = title + ' (MATLAB code)';\r\n        d.close();\r\n    }   \r\n     --> <\/script><p style=\"text-align: right; font-size: xx-small; font-weight:lighter;   font-style: italic; color: gray\"><br><a href=\"javascript:grabCode_c9e0cc11fe4445e3add9acae666346fc()\"><span style=\"font-size: x-small;        font-style: italic;\">Get \r\n      the MATLAB code <noscript>(requires JavaScript)<\/noscript><\/span><\/a><br><br>\r\n      Published with MATLAB&reg; R2014a<br><\/p><\/div><!--\r\nc9e0cc11fe4445e3add9acae666346fc ##### SOURCE BEGIN #####\r\n%% Analyzing Uber Ride Sharing GPS Data\r\n% Many of us carry around smartphones that can track our GPS positions and\r\n% that's an interesting source of data. How can we analyze GPS data in\r\n% MATLAB?\r\n% \r\n% <html>Today's guest blogger, <a\r\n% href=\"\" rel=\"author\"> Toshi\r\n% Takeuchi<\/a>, would like to share an analysis of a public GPS dataset\r\n% from a popular ride sharing service Uber.<\/html>\r\n%\r\n%% Introduction\r\n% <http:\/\/uber.com Uber> is a ride sharing service that connects passengers\r\n% with private drivers through a mobile app and takes care of payment. They\r\n% are in fact so popular that you hear about them in the news due to their\r\n% conflicts with local traffic regulations and taxi business interests.\r\n%\r\n% Uber\u2019s ride sharing GPS data was available publicly on \r\n% infochimps.com, so I used it for this analysis (Unfortunately \r\n% it is not available anymore).  What can we learn from\r\n% this dataset?\r\n%\r\n% <<car.png>>\r\n%\r\n%% Uber anonymized GPS logs\r\n% Let's start by downloading the dataset from the link above (a zipped TSV\r\n% file), which contains the GPS logs taken from the mobile apps in Uber\r\n% cars that were actively transporting passengers in San Francisco. The\r\n% data have been anonymized by removing names, trip start and end points.\r\n% The dates were also substituted. Weekdays and time of day are still\r\n% intact.\r\n% \r\n% For the purpose of this analysis, let's focus on the data captured in the\r\n% city proper and visualize it with\r\n% <https:\/\/www.mathworks.com\/products\/mapping\/ Mapping Toolbox>.\r\n%\r\n% Run the script to load data. Check\r\n% <https:\/\/blogs.mathworks.com\/images\/loren\/2014\/loadData.m |loadData.m|> to\r\n% see the details.\r\n\r\nloadData\r\n\r\n%%\r\n% Overlay the GPS points on the map. \r\nstates = geoshape(shaperead('usastatehi', 'UseGeoCoords', true));\r\nlatlim = [min(T.Lat) max(T.Lat)];\r\nlonlim = [min(T.Lon) max(T.Lon)];\r\nocean = [0.7 0.8 1]; land = [0.9 0.9 0.8];\r\n\r\nfigure\r\nax = usamap(latlim, lonlim);\r\nsetm(ax, 'FFaceColor', ocean)\r\ngeoshow(states,'FaceColor',land)\r\ngeoshow(T.Lat,T.Lon,'DisplayType','Point','Marker','.',...\r\n    'MarkerSize',4,'MarkerEdgeColor',[0 0 1])\r\ntitle('Uber GPS Log Data')\r\nxlabel('San Francisco')\r\ntextm(37.802069,-122.446618,'Marina')\r\ntextm(37.808376,-122.426105,'Fishermans Wharf')\r\ntextm(37.797322,-122.482409,'Presidio')\r\ntextm(37.774546,-122.412329,'SOMA')\r\ntextm(37.770731,-122.440481,'Haight')\r\ntextm(37.818276,-122.498546,'Golden Gate Bridge')\r\ntextm(37.819632,-122.376065,'Bay Bridge')\r\n\r\n%% Does the usage change over time?\r\n% Let's start with a basic question - how does the use of Uber service\r\n% change over time. We can use\r\n% <https:\/\/www.mathworks.com\/help\/stats\/grpstats.html |grpstats|> to\r\n% summarize data grouped by specific categorical values, such as |DayName|\r\n% and |TimeOfDay|, which were added in the data loading process.\r\n%\r\n% Get grouped summaries.\r\nbyDay = grpstats(T(:,{'Lat','Lon','DayName'}),'DayName');\r\nbyDayTime = grpstats(T(:,{'Lat','Lon','TimeOfDay','DayName'}),...\r\n    {'DayName','TimeOfDay'});\r\n%%\r\n% Reshape the count of entries into a 24x7 matrix.\r\nbyDayTimeCount = reshape(byDayTime.GroupCount,24,7)';\r\n\r\n%%\r\n% Plot the data by day of week and by hours per day of week.\r\nfigure\r\nsubplot(2,1,1)\r\nbar(byDay.GroupCount); set(gca,'XTick',1:7,'XTickLabel',cellstr(byDay.DayName));\r\nsubplot(2,1,2)\r\nplot(byDayTimeCount'); set(gca,'XTick',1:24); xlabel('Hours by Day of Week');\r\nlegend('Mon','Tue','Wed','Thu','Fri','Sat','Sun',...\r\n    'Orientation','Horizontal','Location','SouthOutside')\r\n\r\n%%\r\n% It looks like the usage goes up during the weekend (Friday through\r\n% Sunday) and usage peaks in early hours of the day. San Francisco\r\n% has a very active night life!\r\n%\r\n%% Where do they go during the weekend?\r\n% Is there a way to figure out where people go during the weekend? Even\r\n% though the dataset doesn't contain the actual starting and ending points\r\n% of individual trips, we may still get a sense of how the traffic flows by\r\n% looking at the first and last points of each record. \r\n%\r\n% We can extract the starting and ending location data for weekend rides.\r\n% Click <https:\/\/blogs.mathworks.com\/images\/loren\/2014\/getStartEndPoints.m\r\n% |getStartEndPoints.m|> to see how it is done. If you would like to run\r\n% this script, please download\r\n% <https:\/\/blogs.mathworks.com\/images\/loren\/2014\/districts.xlsx\r\n% |districts.xlsx|> as well.\r\n\r\n% Here we load the preprocessed data |startEnd.mat| to save time and plot\r\n% their starting points.\r\n\r\n% getStartEndPoints % commented out to save time\r\nload startEnd.mat % load the preprocessed data instead\r\n\r\nfigure\r\nax = usamap(latlim, lonlim);\r\nsetm(ax, 'FFaceColor', ocean)\r\ngeoshow(states,'FaceColor',land)\r\ngeoshow(startEnd.StartLat,startEnd.StartLon,'DisplayType','Point',...\r\n    'Marker','.','MarkerSize',5,'MarkerEdgeColor',[0 0 1])\r\ntitle('Uber Weekend Rides - Starting Points')\r\nxlabel('San Francisco')\r\ntextm(37.802069,-122.446618,'Marina')\r\ntextm(37.808376,-122.426105,'Fishermans Wharf')\r\ntextm(37.797322,-122.482409,'Presidio')\r\ntextm(37.774546,-122.412329,'SOMA')\r\ntextm(37.770731,-122.440481,'Haight')\r\ntextm(37.818276,-122.498546,'Golden Gate Bridge')\r\ntextm(37.819632,-122.376065,'Bay Bridge')\r\n\r\n%%\r\n% When you plot the longitude and latitude data, you just get messy point\r\n% clouds and it is hard to see what's going on. Instead, I broke the map of\r\n% San Francisco into rectangular blocks to approximate its districts. Here\r\n% is the new plot of starting points by district.\r\n\r\ndist = categories(startEnd.StartDist);\r\ncc = hsv(length(dist));\r\n\r\nfigure\r\nax = usamap(latlim, lonlim);\r\nsetm(ax, 'FFaceColor', ocean)\r\ngeoshow(states,'FaceColor',land)\r\nfor i = 1:length(dist)\r\n    inDist = startEnd.StartDist == dist(i);\r\n    geoshow(startEnd.StartLat(inDist),startEnd.StartLon(inDist),...\r\n        'DisplayType','Point','Marker','.','MarkerSize',5,'MarkerEdgeColor',cc(i,:))\r\nend\r\ntitle('Uber Weekend Rides - Starting Points by District')\r\nxlabel('San Francisco')\r\ntextm(37.802069,-122.446618,'Marina')\r\ntextm(37.808376,-122.426105,'Fishermans Wharf')\r\ntextm(37.797322,-122.482409,'Presidio')\r\ntextm(37.774546,-122.412329,'SOMA')\r\ntextm(37.770731,-122.440481,'Haight')\r\ntextm(37.818276,-122.498546,'Golden Gate Bridge')\r\ntextm(37.819632,-122.376065,'Bay Bridge')\r\n\r\n%% Visualizing the traffic patterns with Gephi\r\n% This is a step in the right direction. Now that we have the starting and\r\n% ending points grouped by districts, we can represent the rides as\r\n% connections among different districts - this is essentially a graph with\r\n% districts as nodes and rides as edges. To visualize this graph, we can\r\n% use a popular social networking analysis tool <https:\/\/gephi.github.io\/\r\n% Gephi>, which was also used in another post,\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=920 Analyzing Twitter with MATLAB>.\r\n%\r\n% You can export |StartDist| and |EndDist| as the edge list to Gephi in CSV\r\n% format. \r\n\r\nwritetable(startEnd(:,{'StartDist','EndDist'}),'edgelist.csv',...\r\n    'WriteVariableNames',false)\r\n\r\n%%\r\n% Once you export the edge list, you can plot the connections (edges)\r\n% between districts (nodes) in Gephi. Now it is much easier to see where\r\n% people went during the weekend! To see a bigger image, check out\r\n% <https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uber_graph.pdf the PDF version>.\r\n% \r\n% <<uber_graph.png>>\r\n%\r\n% * The size of the district nodes represents their\r\n% <http:\/\/en.wikipedia.org\/wiki\/Degree_(graph_theory) in-degrees>, the\r\n% number of incoming connections, and you can think of it as measure of\r\n% popularity as destinations. SOMA, Haight, Mission District, Downtown, and\r\n% The Castro are the popular locations based on this measure.\r\n% * The districts are colored based on their\r\n% <http:\/\/en.wikipedia.org\/wiki\/Modularity_(networks) modularity>, which\r\n% basically means which cluster of nodes they belong to. It looks like\r\n% people hang around set of districts that are nearby - SOMA, Downtown,\r\n% Mission District are all located towards the south (green). The Castro,\r\n% Haight, Western Addition in the center (purple) and it is strongly\r\n% connected to Richmond and Sunset District in the west. Since those are\r\n% residential areas, it seems people from those areas hang out in the other\r\n% districts in the same cluster. \r\n% * The locals don't seem to go to Fisherman's Wharf or Chinatown in the\r\n% north (red) very much - they are probably considered not cool because of\r\n% tourists?\r\n%\r\n%% Summary\r\n% Now you know where to go in San Francisco during the weekend if you want\r\n% to experience an active night life there. We just looked at the overall\r\n% weekend data, but you can explore more by time slicing the data to see\r\n% how the traffic pattern changes based on the time of day or day of the\r\n% week. You may be able to find traffic congestions by calculating the\r\n% speed using the timestamps. Try it yourself and share what you find\r\n% <https:\/\/blogs.mathworks.com\/loren\/?p=980#respond here!>.\r\n\r\n##### SOURCE END ##### c9e0cc11fe4445e3add9acae666346fc\r\n-->","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2014\/uber_graph.png\" onError=\"this.style.display ='none';\" \/><\/div><!--introduction--><p>Many of us carry around smartphones that can track our GPS positions and that's an interesting source of data. How can we analyze GPS data in MATLAB?... <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2014\/09\/06\/analyzing-uber-ride-sharing-gps-data\/\">read more >><\/a><\/p>","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[53,45,61],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/980"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=980"}],"version-history":[{"count":11,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/980\/revisions"}],"predecessor-version":[{"id":984,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/980\/revisions\/984"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}