{"id":1512,"date":"2016-05-27T06:47:13","date_gmt":"2016-05-27T11:47:13","guid":{"rendered":"https:\/\/blogs.mathworks.com\/loren\/?p=1512"},"modified":"2016-11-17T10:33:22","modified_gmt":"2016-11-17T15:33:22","slug":"fighting-crime-with-predictive-policing","status":"publish","type":"post","link":"https:\/\/blogs.mathworks.com\/loren\/2016\/05\/27\/fighting-crime-with-predictive-policing\/","title":{"rendered":"Fighting Crime with Predictive Policing"},"content":{"rendered":"<div class=\"content\"><!--introduction-->Today&#8217;s guest blogger, Toshi, is back again.<\/p>\n<p>I recently noticed that there is a Kaggle competition titled <a href=\"https:\/\/www.kaggle.com\/c\/sf-crime\">San Francisco Crime Classification<\/a> which asks you to predict the category of crimes that occurred in San Franciso from 1\/1\/2003 to 5\/13\/2015 in theSFPD Crime Incident Reporting system. The goal of the competition is to predict the category of crime that occurred based on time and location.<\/p>\n<p>This reminded me of the movie <a href=\"http:\/\/www.imdb.com\/title\/tt0181689\/\">Minority Report<\/a> in which a special unit of police arrests people before they commit crimes, but that&#8217;s <a href=\"https:\/\/en.wikipedia.org\/wiki\/Science_fiction\">SciFi<\/a>. A more realistic approach is to <b>deter crime<\/b> by analyzing the past data to predict when and where crimes take place and deploy law enforcement resources to such hot spots. This approach is known as <a href=\"https:\/\/en.wikipedia.org\/wiki\/Predictive_policing\">predictive policing<\/a>.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/484342181s.jpg\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<p>Let&#8217;s take a look at the SFPD data to see what we can learn from it. This has nothing to do with the goal of the competition, but Kaggle is &#8221; <i>also encouraging you to explore the dataset visually<\/i> &#8220;. So why not?<\/p>\n<p><!--\/introduction--><\/p>\n<h3>Contents<\/h3>\n<div>\n<ul>\n<li><a href=\"#d237cf2a-08ec-423e-b597-8c66fadd89fb\">The SFPD Crime Incident Report Data<\/a><\/li>\n<li><a href=\"#5e115999-0997-488b-af59-e5faa28c67a1\">Crime Heat Map<\/a><\/li>\n<li><a href=\"#cdbd6e25-7242-4749-86b6-a84adf7048a0\">Principal Components Analysis<\/a><\/li>\n<li><a href=\"#5939c406-ea6d-406f-ae9b-6847a2bb5473\">Assault, Robbery and Vehicle Theft<\/a><\/li>\n<li><a href=\"#45ebe22b-153e-4ad5-9bbd-213939201da5\">Grand Theft Auto<\/a><\/li>\n<li><a href=\"#a9010c1d-cc89-4d0a-8eaa-4dbb76846839\">Eureka!<\/a><\/li>\n<li><a href=\"#37860ff2-e94f-409f-ac30-ef80f954bfd6\">Another Kind of Broken Windows<\/a><\/li>\n<li><a href=\"#1e3413d3-060a-4582-985b-922179434d61\">Summary<\/a><\/li>\n<\/ul>\n<\/div>\n<h4>The SFPD Crime Incident Report Data<a name=\"d237cf2a-08ec-423e-b597-8c66fadd89fb\"><\/a><\/h4>\n<p>You need to first download the zipped data files from Kaggle website, unzip and place them into your current folder. Let&#8217;s load the data see what attributes are included.<\/p>\n<pre class=\"codeinput\">T = readtable(<span class=\"string\">'train.csv'<\/span>,<span class=\"string\">'Format'<\/span>,<span class=\"string\">'%D%C%q%C%C%q%q%f%f'<\/span>);   <span class=\"comment\">% load data from csv<\/span>\r\nweek = {<span class=\"string\">'Sunday'<\/span>,<span class=\"string\">'Monday'<\/span>,<span class=\"string\">'Tuesday'<\/span>,<span class=\"string\">'Wednesday'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">        % define order<\/span>\r\n    <span class=\"string\">'Thursday'<\/span>,<span class=\"string\">'Friday'<\/span>,<span class=\"string\">'Saturday'<\/span>};\r\nT.(4) = reordercats(T.(4), week);                           <span class=\"comment\">% reorder categories<\/span>\r\nT.(6) = categorical(T.(6));                                 <span class=\"comment\">% convert to categorical<\/span>\r\nT.Properties.VariableNames                                  <span class=\"comment\">% show variable names<\/span>\r\n<\/pre>\n<pre class=\"codeoutput\">ans = \r\n  Columns 1 through 5\r\n    'Dates'    'Category'    'Descript'    'DayOfWeek'    'PdDistrict'\r\n  Columns 6 through 9\r\n    'Resolution'    'Address'    'X'    'Y'\r\n<\/pre>\n<p>Let&#8217;s also add a new column to assign dates to specific weekly internvals for time series analysis.<\/p>\n<pre class=\"codeinput\">t = datetime(<span class=\"string\">'2003-1-5'<\/span>) + days(0:7:4515);                  <span class=\"comment\">% weekly date intervals<\/span>\r\nweeks = NaT(size(T.Dates));                                 <span class=\"comment\">% empty datetime array<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:length(t) - 1                                     <span class=\"comment\">% loop over weekly intervals<\/span>\r\n    weeks(T.Dates &gt;= t(i) &amp; T.Dates &lt; t(i+1)) = t(i);       <span class=\"comment\">% dates to weekly intervals<\/span>\r\n<span class=\"keyword\">end<\/span>\r\nT.Week = weeks;                                             <span class=\"comment\">% add weekly intervals<\/span>\r\n<\/pre>\n<p>Now let&#8217;s see what is included in &#8216;Category&#8217;. There are 38 categories of crime and non-crime incidents, such as ARSON, ASSAULT, and BAD CHECKS, but which ones should we focus on?<\/p>\n<pre class=\"codeinput\">T.Category = mergecats(T.Category,{<span class=\"string\">'TRESPASS'<\/span>,<span class=\"string\">'TREA'<\/span>});     <span class=\"comment\">% merge mislabeled categories<\/span>\r\ntab = tabulate(T.Category);                                 <span class=\"comment\">% tabulate categories<\/span>\r\n[Count, idx] = sort(cell2mat(tab(:,2)),<span class=\"string\">'descend'<\/span>);          <span class=\"comment\">% sort by category total<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nbar(Count)                                                  <span class=\"comment\">% plot bar chart<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.XTick = 1:size(tab, 1);                                  <span class=\"comment\">% use categories as tick<\/span>\r\nax.XTickLabel = tab(idx,1);                                 <span class=\"comment\">% rename tick labels<\/span>\r\nax.XTickLabelRotation = -90;                                <span class=\"comment\">% rotate text vertically<\/span>\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_01.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<p><b>Broken Windows Theory<\/b><\/p>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Broken_windows_theory\">This theory<\/a>, embraced by New York City Police Commissioner (then NYPD Chief) <a href=\"https:\/\/en.wikipedia.org\/wiki\/William_Bratton\">William Bratton<\/a> in the 1990s, says that leaving broken windows unrepaired leads to more vandalism and greater social disorder in the neighborhood, because it signals that no one cares there. Up to that pointhe NYPD focused on solving serious crimes. However, the theory suggested that cracking down on petty crimes might lead to reduction in more serious ones. While New York saw a drop in the crime rate under Bratton, this theory is not without critics because it also led to the controversial <a href=\"https:\/\/en.wikipedia.org\/wiki\/Stop-and-frisk_in_New_York_City\">stop and frisk<\/a> practice.<\/p>\n<p>Perhaps this theory provides one starting point for our exploration. Does the SFPD data show a correlation between a petty crime like vandalism and other more serious crimes? Let&#8217;s start with bivariate histogram using <tt><a title=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/histogram2.html (link no longer works)\">histogram2<\/a><\/tt> to plot the vandalism incidents by location. Check <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/sfcrime_load_map.m\">sfcrime_load_map.m<\/a><\/tt> to see how you retrieve a raster map from <a title=\"https:\/\/www.mathworks.com\/help\/map\/web-map-service-1.html (link no longer works)\">Web Map Service<\/a>.<\/p>\n<pre class=\"codeinput\">sfcrime_load_map                                            <span class=\"comment\">% load map from WMS<\/span>\r\nvandalism = T(T.Category == <span class=\"string\">'VANDALISM'<\/span>, [1,3:5,8:10]);     <span class=\"comment\">% subset T by category<\/span>\r\nnbins = 100;                                                <span class=\"comment\">% number of bins<\/span>\r\nxbinedges = linspace(lim.lon(1),lim.lon(2),nbins);          <span class=\"comment\">% x bin edges<\/span>\r\nybinedges = linspace(lim.lat(1),lim.lat(2),nbins);          <span class=\"comment\">% y bin edges<\/span>\r\nmap = flipud(A);                                            <span class=\"comment\">% flip image<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\ncolormap <span class=\"string\">cool<\/span>                                               <span class=\"comment\">% set colormap<\/span>\r\nhistogram2(vandalism.X, vandalism.Y, <span class=\"keyword\">...<\/span><span class=\"comment\">                    % plot 3D bivariate<\/span>\r\n    xbinedges, ybinedges, <span class=\"keyword\">...<\/span><span class=\"comment\">                               % histogram<\/span>\r\n    <span class=\"string\">'FaceColor'<\/span>, <span class=\"string\">'flat'<\/span>, <span class=\"keyword\">...<\/span>\r\n    <span class=\"string\">'FaceAlpha'<\/span>, 0.5, <span class=\"string\">'EdgeAlpha'<\/span>, 0.5)\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nimage(lim.lon,lim.lat,map)                                  <span class=\"comment\">% add the map<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.CLim = [0 100];                                          <span class=\"comment\">% color axis scaling<\/span>\r\ntitle({<span class=\"string\">'San Francisco Crime Map'<\/span>; <span class=\"keyword\">...<\/span><span class=\"comment\">                       % add title<\/span>\r\n    <span class=\"string\">'Vandalism: Jan 2003 - May 2015'<\/span>})\r\nzlabel(<span class=\"string\">'Count of Police Reports'<\/span>)                           <span class=\"comment\">% add axis label<\/span>\r\ntext(-122.52,37.82, 300, <span class=\"string\">'Golden Gate Bridge'<\/span>)              <span class=\"comment\">% annotate landmark<\/span>\r\ntext(-122.4,37.82, 200, <span class=\"string\">'Bay Bridge'<\/span>)                       <span class=\"comment\">% annotate landmark<\/span>\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_02.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<p>You can see that those incidents are highly concentrated in several hot spots indicated by the tall magenta bars. There is one particularly tall bar that sticks high above the rest. Where is it? We can plot the top 50 locations of high vandalism incidents with the highest spot marked as &#8220;#1&#8221;. Check <tt><a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/sfcrime_draw_locs.m\">sfcrime_draw_locs.m<\/a><\/tt> to see the details.<\/p>\n<pre class=\"codeinput\">figure                                                      <span class=\"comment\">% new figure<\/span>\r\nusamap(lim.lat, lim.lon);                                   <span class=\"comment\">% set map coordinates<\/span>\r\ngeoshow(A, R)                                               <span class=\"comment\">% display map<\/span>\r\nsfcrime_draw_locs(vandalism,lim.lat,lim.lon,nbins,50,<span class=\"string\">'m'<\/span>)   <span class=\"comment\">% draw top 50 locations<\/span>\r\ntitle({<span class=\"string\">'Vandalism: Jan 2013 - May 2015'<\/span>;<span class=\"string\">'Top 50 Locations'<\/span>})<span class=\"comment\">% add title<\/span>\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_03.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<h4>Crime Heat Map<a name=\"5e115999-0997-488b-af59-e5faa28c67a1\"><\/a><\/h4>\n<p>If we process other categories of crime with the same grid, we can create a matrix of crimes by location and we can plot it as a heat map with <tt><a href=\"https:\/\/www.mathworks.com\/help\/matlab\/ref\/imagesc.html\">imagesc<\/a><\/tt> to make comparison easier.<\/p>\n<p>Bright horizontal stripes indicate a location where different types of crimes are committed. There is one particularly bright stripe with ASSAULT, DRUG\/NARCOTIC, LARCENY\/THEFT, NON-CRIMINAL, OTHER OFFENSES, and WARRANTS. ROBBERY, SUSPICIOUS OCC, and VANDALISM also appear in a lighter shade.<\/p>\n<p>If you look at LARCENY\/THEFT, it doesn&#8217;t necessarily form rows of stripes everywhere it is bright. If you are going to steal, it may be worthwhile to go to places you find high value targets.<\/p>\n<pre class=\"codeinput\">cats = categories(T.Category);                              <span class=\"comment\">% extract categories<\/span>\r\nrawCounts = zeros((nbins-1)^2,length(cats));                <span class=\"comment\">% set up an accumulator<\/span>\r\n<span class=\"keyword\">for<\/span> i = 1:length(cats)                                      <span class=\"comment\">% loop over categories<\/span>\r\n    data = T(T.Category == cats{i}, 8:9);                   <span class=\"comment\">% subset T by category<\/span>\r\n    [N,~,~] = histcounts2(data.X, data.Y, <span class=\"keyword\">...<\/span><span class=\"comment\">               % get bivariate histogram<\/span>\r\n        xbinedges, ybinedges);                              <span class=\"comment\">% bin counts<\/span>\r\n    rawCounts(:,i) = N(:);                                  <span class=\"comment\">% add to accumulator<\/span>\r\n<span class=\"keyword\">end<\/span>                                                         <span class=\"comment\">% as a vector<\/span>\r\n\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nimagesc(rawCounts)                                          <span class=\"comment\">% plot matrix as an image<\/span>\r\nax = gca;                                                   <span class=\"comment\">% get current axes handle<\/span>\r\nax.CLim = [0 200];                                          <span class=\"comment\">% color axis scaling<\/span>\r\nax.XTick = 1:length(cats);                                  <span class=\"comment\">% use categories as tick<\/span>\r\nax.XTickLabel = cats;                                       <span class=\"comment\">% rename tick labels<\/span>\r\nax.XTickLabelRotation = -90;                                <span class=\"comment\">% rotate text vertically<\/span>\r\nylabel(<span class=\"string\">'Locations'<\/span>)                                         <span class=\"comment\">% add axis label<\/span>\r\ntitle(<span class=\"string\">'SF Crime Heat Map by Location'<\/span>)                      <span class=\"comment\">% add title<\/span>\r\ncolorbar                                                    <span class=\"comment\">% add colorbar<\/span>\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_04.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<h4>Principal Components Analysis<a name=\"cdbd6e25-7242-4749-86b6-a84adf7048a0\"><\/a><\/h4>\n<p>We can use <a href=\"https:\/\/www.mathworks.com\/help\/stats\/principal-component-analysis-pca.html\">Principal Component Analysis<\/a> uisng <tt><a href=\"https:\/\/www.mathworks.com\/help\/stats\/pca.html\">pca<\/a><\/tt> to get a better sense of the relationship among different categories and visualize the result with <tt><a href=\"https:\/\/www.mathworks.com\/help\/stats\/biplot.html\">biplot<\/a><\/tt>. We need to use weighted PCA to address the big scale differences among categories. We also need to hide some categories as the output will be too cluttered if we try to display all 38 of them.<\/p>\n<pre class=\"codeinput\">w = 1 .\/ var(rawCounts);                                    <span class=\"comment\">% inverse variable variances<\/span>\r\n[wcoeff, score, latent, tsquared, explained] = <span class=\"keyword\">...<\/span><span class=\"comment\">          % weighted PCA with w<\/span>\r\n    pca(rawCounts, <span class=\"string\">'VariableWeights'<\/span>, w);\r\ncoefforth = diag(sqrt(w)) * wcoeff;                         <span class=\"comment\">% turn wcoeff to orthonormal<\/span>\r\nlabels = cats;                                              <span class=\"comment\">% categories as labels<\/span>\r\nlabels([4,9,10,12,13,15,18,20,21,23,25,27,28,31,32]) = {<span class=\"string\">''<\/span>};<span class=\"comment\">% drop some labels to avoid clutter<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nbiplot(coefforth(:,1:2), <span class=\"string\">'Scores'<\/span>, score(:,1:2), <span class=\"keyword\">...<\/span><span class=\"comment\">        % 2D biplot with the first two comps<\/span>\r\n    <span class=\"string\">'VarLabels'<\/span>, labels)\r\nxlabel(sprintf(<span class=\"string\">'Component 1 (%.2f%%)'<\/span>, explained(1)))       <span class=\"comment\">% add variance explained to x axis label<\/span>\r\nylabel(sprintf(<span class=\"string\">'Component 2 (%.2f%%)'<\/span>, explained(2)))       <span class=\"comment\">% add variance explained to y axis label<\/span>\r\naxis([-0.1 0.6 -0.3 0.4]);                                  <span class=\"comment\">% define axis limits<\/span>\r\ntitle({<span class=\"string\">'Principal Components Analysis'<\/span>; <span class=\"keyword\">...<\/span><span class=\"comment\">                 % add title<\/span>\r\n    sprintf(<span class=\"string\">'Variance Explained %.2f%%'<\/span>,sum(explained(1:2)))})\r\nhtext = findobj(gca,<span class=\"string\">'String'<\/span>,<span class=\"string\">'VEHICLE THEFT'<\/span>);              <span class=\"comment\">% find text object<\/span>\r\nhtext.HorizontalAlignment = <span class=\"string\">'right'<\/span>;                        <span class=\"comment\">% change text alignment<\/span>\r\nhtext = findobj(gca,<span class=\"string\">'String'<\/span>,<span class=\"string\">'ASSAULT'<\/span>);                    <span class=\"comment\">% find text object<\/span>\r\nhtext.Position = [0.2235 0.0909 0];                         <span class=\"comment\">% move label position<\/span>\r\nhtext = findobj(gca,<span class=\"string\">'String'<\/span>,<span class=\"string\">'ROBBERY'<\/span>);                    <span class=\"comment\">% find text object<\/span>\r\nhtext.Position = [0.2148 0.1268 0];                         <span class=\"comment\">% move label position<\/span>\r\nhtext = findobj(gca,<span class=\"string\">'String'<\/span>,<span class=\"string\">'ARSON'<\/span>);                      <span class=\"comment\">% find text object<\/span>\r\nhtext.HorizontalAlignment = <span class=\"string\">'right'<\/span>;                        <span class=\"comment\">% change text alignment<\/span>\r\nhtext = findobj(gca,<span class=\"string\">'String'<\/span>,<span class=\"string\">'EXTORTION'<\/span>);                  <span class=\"comment\">% find text object<\/span>\r\nhtext.HorizontalAlignment = <span class=\"string\">'right'<\/span>;                        <span class=\"comment\">% change text alignment<\/span>\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_05.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<h4>Assault, Robbery and Vehicle Theft<a name=\"5939c406-ea6d-406f-ae9b-6847a2bb5473\"><\/a><\/h4>\n<p>The upper half of the <a href=\"https:\/\/www.mathworks.com\/help\/stats\/biplot.html\">biplot<\/a> seems to show less sophisticated crimes than the lower half, and VANDALISM is more closely related to ARSON, and SUSPICIOUS OCC, and it is also related to LARCENY\/THEFT but LARCENY\/THEFT itself is closer to BAD CHECKS, EXTORTION, FRAUD and SEX OFFENSES FORCIBLE.<\/p>\n<p>You can also see ASSAULT, ROBBERY, and VEHICLE THEFT are also very closely related. Among those, ASSAULT has the largest count of incidents. Maybe that&#8217;s the crime we need to focus on. Let&#8217;s check the top 50 locations for those crimes. As you would expect, you see good overlap of those locations.<\/p>\n<pre class=\"codeinput\">assault = T(T.Category == <span class=\"string\">'ASSAULT'<\/span>, [1,3:5,8:10]);         <span class=\"comment\">% subset T by category<\/span>\r\nvehicle = T(T.Category == <span class=\"string\">'VEHICLE THEFT'<\/span>, [1,3:5,8:10]);   <span class=\"comment\">% subset T by category<\/span>\r\nrobbery = T(T.Category == <span class=\"string\">'ROBBERY'<\/span>, [1,3:5,8:10]);         <span class=\"comment\">% subset T by category<\/span>\r\n\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nusamap(lim.lat, lim.lon);                                   <span class=\"comment\">% set map coordinates<\/span>\r\ngeoshow(A, R)                                               <span class=\"comment\">% display map<\/span>\r\ntopN = 50;                                                  <span class=\"comment\">% get top 50<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nsfcrime_draw_locs(assault,lim.lat,lim.lon,nbins,topN,<span class=\"string\">'r'<\/span>)   <span class=\"comment\">% draw locations in red<\/span>\r\nsfcrime_draw_locs(vehicle,lim.lat,lim.lon,nbins,topN,<span class=\"string\">'g'<\/span>)   <span class=\"comment\">% draw locations in green<\/span>\r\nsfcrime_draw_locs(robbery,lim.lat,lim.lon,nbins,topN,<span class=\"string\">'b'<\/span>)   <span class=\"comment\">% draw locations in blue<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle({<span class=\"string\">'Assault, Robbery, and Vehicle Theft'<\/span>; <span class=\"keyword\">...<\/span><span class=\"comment\">           % add title<\/span>\r\n    sprintf(<span class=\"string\">'Top %d Locations'<\/span>,topN)})\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_06.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<h4>Grand Theft Auto<a name=\"45ebe22b-153e-4ad5-9bbd-213939201da5\"><\/a><\/h4>\n<p>To see if we can use the data for prediction, we need to check the time dimension as well. We can use the weekly interval column to plot the weekly trends. This is strange. VEHICLE THEFT suddenly dropped in 2006! It&#8217;s time to dig deeper.<\/p>\n<pre class=\"codeinput\">figure                                                      <span class=\"comment\">% new figure<\/span>\r\n[G, t] = findgroups(assault.Week);                          <span class=\"comment\">% group by weekly intervals<\/span>\r\nweekly = splitapply(@numel, assault.Week, G);               <span class=\"comment\">% get weekly count<\/span>\r\nplot(t, weekly)                                             <span class=\"comment\">% plot weekly count<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\n[G, t] = findgroups(vehicle.Week);                          <span class=\"comment\">% group by weekly intervals<\/span>\r\nweekly = splitapply(@numel, vehicle.Week, G);               <span class=\"comment\">% get weekly count<\/span>\r\nplot(t, weekly)                                             <span class=\"comment\">% plot weekly count<\/span>\r\n[G, t] = findgroups(robbery.Week);                          <span class=\"comment\">% group by weekly intervals<\/span>\r\nweekly = splitapply(@numel,robbery.Week, G);                <span class=\"comment\">% get weekly count<\/span>\r\nplot(t, weekly)                                             <span class=\"comment\">% plot weekly count<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'ASSAULT, ROBBERY, VEHICLE THEFT - Weekly'<\/span>)           <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count of Incidence Reports'<\/span>)                        <span class=\"comment\">% add axis label<\/span>\r\nlegend(<span class=\"string\">'ASSAULT'<\/span>,<span class=\"string\">'VEHICLE THEFT'<\/span>, <span class=\"string\">'ROBBERY'<\/span>)                <span class=\"comment\">% add legend<\/span>\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_07.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<h4>Eureka!<a name=\"a9010c1d-cc89-4d0a-8eaa-4dbb76846839\"><\/a><\/h4>\n<p>Looking at descriptions, you notice that recovered vehicles are also reported as incidents. For some reason, it was the incidents of recovered vehicles that dropped off since 2006. Such a sudden change is usually caused by a change in reporting criteria, but it looks like half of the stolen cars were often recoveverd eventually in the good old days? Are they still recovered but not reported, or are they no longer recovered? <a href=\"https:\/\/www.bostonglobe.com\/metro\/2013\/03\/20\/stolen-car-rate-plunges\/X3E5DQwKP5p4JonryLxUnO\/story.html\">This Boston Globe article<\/a> mentions that &#8220;Of the 10 vehicles most frequently stolen, all were made before 2007,&#8221; and credit new antitheft devices for the decline, and say those still stolen are shipped overseas (not likely to be recovered).<\/p>\n<p>Anyhow, this time series analysis shows that there is a lot more going on than just time and location in crime. We could deal with the change in car theft reporting by omitting the data prior to 2006, but we would have to redo the heat map and run PCA again. Perhaps the Broken Windows Theory is not that useful as the basis of our analysis.<\/p>\n<pre class=\"codeinput\">isRecovered = strfind(vehicle.Descript,<span class=\"string\">'RECOVERED'<\/span>);        <span class=\"comment\">% find 'RECOVERED' in descrption<\/span>\r\nisRecovered = ~cellfun(@(x) isempty(x), isRecovered);       <span class=\"comment\">% is recovered if not empty<\/span>\r\n[G, t] = findgroups(vehicle.Week(~isRecovered));            <span class=\"comment\">% group by weekly intervals<\/span>\r\nweekly = splitapply(@numel, vehicle.Week(~isRecovered), G); <span class=\"comment\">% get weekly count<\/span>\r\nplot(t, weekly)                                             <span class=\"comment\">% plot weekly count<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\n[G, t] = findgroups(vehicle.Week(isRecovered));             <span class=\"comment\">% group by weekly intervals<\/span>\r\nweekly = splitapply(@numel, vehicle.Week(isRecovered), G);  <span class=\"comment\">% get weekly count<\/span>\r\nplot(t, weekly)                                             <span class=\"comment\">% plot weekly count<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle(<span class=\"string\">'VEHICLE THEFT - Weekly'<\/span>)                             <span class=\"comment\">% add title<\/span>\r\nylabel(<span class=\"string\">'Count of Incidence Reports'<\/span>)                        <span class=\"comment\">% add axis label<\/span>\r\nlegend(<span class=\"string\">'UNRECOVRED'<\/span>, <span class=\"string\">'RECOVERED'<\/span>)                           <span class=\"comment\">% add legend<\/span>\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_08.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<h4>Another Kind of Broken Windows<a name=\"37860ff2-e94f-409f-ac30-ef80f954bfd6\"><\/a><\/h4>\n<p>Loren shared a NY Times article <a href=\"http:\/\/www.nytimes.com\/2016\/04\/25\/us\/san-francisco-torn-as-some-see-street-behavior-worsen.html?_r=0\">San Francisco Torn as Some See \u2018Street Behavior\u2019 Worsen<\/a> with me. It is about the rise of smash-and-grab thefts from vehicles from the perspective of a local resident at Lombard Street, famous for its zigzags. The article says victims are often tourists and out-of-town visitors. LARCENY\/THEFT is clearly on the rise, and it indeed comes mainly from Auto-related thefts.<\/p>\n<pre class=\"codeinput\">larceny = T(T.Category == <span class=\"string\">'LARCENY\/THEFT'<\/span>, [1,3:5,8:10]);   <span class=\"comment\">% subset T by category<\/span>\r\nisAuto = strfind(larceny.Descript,<span class=\"string\">'LOCKED'<\/span>);                <span class=\"comment\">% find 'LOCKED' in descrption<\/span>\r\nisAuto = ~cellfun(@(x) isempty(x), isAuto);                 <span class=\"comment\">% is auto if not empty<\/span>\r\nfigure                                                      <span class=\"comment\">% new figure<\/span>\r\nsubplot(1,2,1)                                              <span class=\"comment\">% subplot 1<\/span>\r\n[G, t] = findgroups(larceny.Week(isAuto));                  <span class=\"comment\">% group by weekly intervals<\/span>\r\nweekly = splitapply(@numel, larceny.Week(isAuto), G);       <span class=\"comment\">% get weekly count<\/span>\r\nplot(t, weekly)                                             <span class=\"comment\">% plot weekly count<\/span>\r\ntitle(<span class=\"string\">'LARCENY\/THEFT, Auto'<\/span>)                                <span class=\"comment\">% add title<\/span>\r\nsubplot(1,2,2)                                              <span class=\"comment\">% subplot 2<\/span>\r\n[G, t] = findgroups(larceny.Week(~isAuto));                 <span class=\"comment\">% group by weekly intervals<\/span>\r\nweekly = splitapply(@numel, larceny.Week(~isAuto), G);      <span class=\"comment\">% get weekly count<\/span>\r\nplot(t, weekly)                                             <span class=\"comment\">% plot weekly count<\/span>\r\ntitle(<span class=\"string\">'LARCENY\/THEFT, Non-Auto'<\/span>)                            <span class=\"comment\">% add title<\/span>\r\nylim([0 500])                                               <span class=\"comment\">% adjust y-axis scale<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_09.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<p>When you map the top 100 locations of car-related LARCENY\/THEFT, Lombard Street doesn&#8217;t make the cut, but the distribution indeed looks different from VEHICLE THEFT. You see several locations near famous tourist attractions like Fisherman&#8217;s Wharf and SoMa, known for the concentration of tech companies. It appears they do go after tourists and business vistors who are not familiar with the lay of the land. Now we found factors that affect one type of crime!<\/p>\n<pre class=\"codeinput\">figure                                                      <span class=\"comment\">% new figure<\/span>\r\nusamap(lim.lat, lim.lon);                                   <span class=\"comment\">% set map coordinates<\/span>\r\ngeoshow(A, R)                                               <span class=\"comment\">% display map<\/span>\r\ntopN = 100;                                                 <span class=\"comment\">% get top 100<\/span>\r\nhold <span class=\"string\">on<\/span>                                                     <span class=\"comment\">% don't overwrite<\/span>\r\nsfcrime_draw_locs(larceny(isAuto,:),lim.lat,lim.lon, <span class=\"keyword\">...<\/span><span class=\"comment\">    % draw locations in red<\/span>\r\n    nbins,topN,<span class=\"string\">'r'<\/span>)\r\nsfcrime_draw_locs(vehicle,lim.lat,lim.lon,nbins,topN,<span class=\"string\">'b'<\/span>)   <span class=\"comment\">% draw locations in blue<\/span>\r\nplotm(37.802139, -122.41874, <span class=\"string\">'+g'<\/span>)                          <span class=\"comment\">% add landmark<\/span>\r\nplotm(37.808119, -122.41790, <span class=\"string\">'+g'<\/span>)                          <span class=\"comment\">% add landmark<\/span>\r\nplotm(37.7808297, -122.4301075, <span class=\"string\">'+g'<\/span>)                       <span class=\"comment\">% add landmark<\/span>\r\nplotm(37.7842048, -122.3969652, <span class=\"string\">'+g'<\/span>)                       <span class=\"comment\">% add landmark<\/span>\r\nplotm(37.7786559, -122.5106296, <span class=\"string\">'+g'<\/span>)                       <span class=\"comment\">% add landmark<\/span>\r\nplotm(37.8038433, -122.4418518, <span class=\"string\">'+g'<\/span>)                       <span class=\"comment\">% add landmark<\/span>\r\nhold <span class=\"string\">off<\/span>                                                    <span class=\"comment\">% restore default<\/span>\r\ntitle({<span class=\"string\">'LARCENY\/THEFT - Auto vs. VEHICLE THEFT'<\/span>; <span class=\"keyword\">...<\/span><span class=\"comment\">        % add title<\/span>\r\n    sprintf(<span class=\"string\">'Top %d Locations'<\/span>,topN)})\r\ntextm(37.802139, -122.41574, <span class=\"string\">'Lombard Street'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">          % annotate landmark<\/span>\r\n    <span class=\"string\">'Color'<\/span>, <span class=\"string\">'g'<\/span>)\r\ntextm(37.8141187, -122.43450, <span class=\"string\">'Fisherman''s Wharf'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">     % annotate landmark<\/span>\r\n    <span class=\"string\">'Color'<\/span>, <span class=\"string\">'g'<\/span>)\r\ntextm(37.7808297, -122.4651075, <span class=\"string\">'Japan Town'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">           % annotate landmark<\/span>\r\n    <span class=\"string\">'Color'<\/span>, <span class=\"string\">'g'<\/span>)\r\ntextm(37.7842048, -122.3949652, <span class=\"string\">'SoMa'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">                 % annotate landmark<\/span>\r\n    <span class=\"string\">'Color'<\/span>, <span class=\"string\">'g'<\/span>)\r\ntextm(37.7786559, -122.5086296, <span class=\"string\">'Sutro Baths'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">          % annotate landmark<\/span>\r\n    <span class=\"string\">'Color'<\/span>, <span class=\"string\">'g'<\/span>)\r\ntextm(37.8088629, -122.4628518, <span class=\"string\">'Marina District'<\/span>, <span class=\"keyword\">...<\/span><span class=\"comment\">      % annotate landmark<\/span>\r\n    <span class=\"string\">'Color'<\/span>, <span class=\"string\">'g'<\/span>)\r\ntextm(37.715564, -122.475662, <span class=\"keyword\">...<\/span><span class=\"comment\">                           % add note<\/span>\r\n    {<span class=\"string\">'Red:  LARCENY\/THEFT - Auto'<\/span>,<span class=\"string\">'Blue: VEHICLE THEFT'<\/span>}, <span class=\"string\">'BackgroundColor'<\/span>,<span class=\"string\">'w'<\/span>)\r\n<\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/predictive_policingLS3_10.png\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<h4>Summary<a name=\"1e3413d3-060a-4582-985b-922179434d61\"><\/a><\/h4>\n<p>We looked at the Broken Windows Theory as a starting point of this exploration, but the SFPD data doesn&#8217;t provide easily detectable correlation among crimes that you would expect based on this theory, and time series analysis shows that there is a lot more going on than just time and location that affects crime. When we focused on specific types of crime that are in decline or on the rise, and cross referenced those with external data sources, we learned a lot more. This points to a potential for enriching this dataset with data from other sources like demographics to improve predictive capability, but it also creates <a href=\"http:\/\/www.nytimes.com\/roomfordebate\/2015\/11\/18\/can-predictive-policing-be-ethical-and-effective\">a dilemma of civil rights violation<\/a> if not done carefully. British Transit Police got creative and did an interesting experiment to <a href=\"https:\/\/www.minnpost.com\/second-opinion\/2013\/05\/brits-test-use-watching-eyes-posters-deter-bike-thieves\">place &#8220;watching eyes&#8221; posters to deter bike thefts<\/a>. This is a good creative use of insights from doing this type of analysis.<\/p>\n<p>By the way, I couldn&#8217;t help playing with the <a href=\"https:\/\/www.mathworks.com\/help\/matlab\/visualize\/defining-scenes-with-camera-graphics.html\">camera object in MATLAB Graphics<\/a>. Here is a fun flyover animation of the San Francisco crime scene! Check out <a href=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/sfcrime_flyover.m\">sfcrime_flyover.m<\/a> for more details.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/rotateVand.gif\" alt=\"\" hspace=\"5\" vspace=\"5\" \/><\/p>\n<p>Hopefully you now see how you can fight crime with data. Download this post in standard MATLAB script (click &#8220;_Get the MATLAB code_&#8221; below) and use it as the starting point for your exploration or even join Kaggle competition. If you find anything interesting, please let us know!<\/p>\n<p><script>\/\/ <![CDATA[\nfunction grabCode_a67d36817d9b4eab82b7e14c6a36016a() {\n        \/\/ Remember the title so we can use it in the new page\n        title = document.title;\n\n        \/\/ Break up these strings so that their presence\n        \/\/ in the Javascript doesn't mess up the search for\n        \/\/ the MATLAB code.\n        t1='a67d36817d9b4eab82b7e14c6a36016a ' + '##### ' + 'SOURCE BEGIN' + ' #####';\n        t2='##### ' + 'SOURCE END' + ' #####' + ' a67d36817d9b4eab82b7e14c6a36016a';\n    \n        b=document.getElementsByTagName('body')[0];\n        i1=b.innerHTML.indexOf(t1)+t1.length;\n        i2=b.innerHTML.indexOf(t2);\n \n        code_string = b.innerHTML.substring(i1, i2);\n        code_string = code_string.replace(\/REPLACE_WITH_DASH_DASH\/g,'--');\n\n        \/\/ Use \/x3C\/g instead of the less-than character to avoid errors \n        \/\/ in the XML parser.\n        \/\/ Use '\\x26#60;' instead of '<' so that the XML parser\n        \/\/ doesn't go ahead and substitute the less-than character. \n        code_string = code_string.replace(\/\\x3C\/g, '\\x26#60;');\n\n        copyright = 'Copyright 2016 The MathWorks, Inc.';\n\n        w = window.open();\n        d = w.document;\n        d.write('\n\n\n\n<pre>\\n');\r\n        d.write(code_string);\r\n\r\n        \/\/ Add copyright line at the bottom if specified.\r\n        if (copyright.length > 0) {\r\n            d.writeln('');\r\n            d.writeln('%%');\r\n            if (copyright.length > 0) {\r\n                d.writeln('% _' + copyright + '_');\r\n            }\r\n        }\r\n\r\n        d.write('<\/pre>\n\n\n\n\n\\n');\n\n        d.title = title + ' (MATLAB code)';\n        d.close();\n    }\n\/\/ ]]><\/script><\/p>\n<p style=\"text-align: right; font-size: xx-small; font-weight: lighter; font-style: italic; color: gray;\">\n<a><span style=\"font-size: x-small; font-style: italic;\">Get<br \/>\nthe MATLAB code<noscript>(requires JavaScript)<\/noscript><\/span><\/a><\/p>\n<p>Published with MATLAB\u00ae R2016a<\/p>\n<\/div>\n<p><!--\na67d36817d9b4eab82b7e14c6a36016a ##### SOURCE BEGIN #####\n%% Fighting Crime with Predictive Policing\n% I recently noticed that there is a Kaggle competition titled <https:\/\/www.kaggle.com\/c\/sf-crime % San Francisco Crime Classification> which asks you to predict the category of\n% crimes that occurred in San Franciso from 1\/1\/2003 to 5\/13\/2015 in theSFPD Crime\n% Incident Reporting system. The goal of the competition is to predict the category\n% of crime that occurred based on time and location.\n%\n% This reminded me of the movie <http:\/\/www.imdb.com\/title\/tt0181689\/ Minority % Report> in which a special unit of police arrests people before they commit\n% crimes, but that's <https:\/\/en.wikipedia.org\/wiki\/Science_fiction SciFi>. A\n% more realistic approach is to *deter crime* by analyzing the past data to predict\n% when and where crimes take place and deploy law enforcement resources to such\n% hot spots. This approach is known as <https:\/\/en.wikipedia.org\/wiki\/Predictive_policing % predictive policing>.\n%\n% <<484342181s.jpg>>\n%\n% Let's take a look at the SFPD data to see what we can learn from it. This\n% has nothing to do with the goal of the competition, but Kaggle is \" _also\n% encouraging you to explore the dataset visually_ \". So why not?\n%% The SFPD Crime Incident Report Data\n% You need to first download the zipped data files from Kaggle website, unzip\n% and place them into your current folder. Let's load the data see what attributes\n% are included.\n\nT = readtable('train.csv','Format','%D%C%q%C%C%q%q%f%f');   % load data from csv\nweek = {'Sunday','Monday','Tuesday','Wednesday', ...        % define order\n'Thursday','Friday','Saturday'};\nT.(4) = reordercats(T.(4), week);                           % reorder categories\nT.(6) = categorical(T.(6));                                 % convert to categorical\nT.Properties.VariableNames                                  % show variable names\n%%\n% Let's also add a new column to assign dates to specific weekly internvals\n% for time series analysis.\n\nt = datetime('2003-1-5') + days(0:7:4515);                  % weekly date intervals\nweeks = NaT(size(T.Dates));                                 % empty datetime array\nfor i = 1:length(t) - 1                                     % loop over weekly intervals\nweeks(T.Dates >= t(i) & T.Dates < t(i+1)) = t(i);       % dates to weekly intervals\nend\nT.Week = weeks;                                             % add weekly intervals\n%%\n% Now let's see what is included in 'Category'. There are 38 categories\n% of crime and non-crime incidents, such as ARSON, ASSAULT, and BAD CHECKS, but\n% which ones should we focus on?\n\nT.Category = mergecats(T.Category,{'TRESPASS','TREA'});     % merge mislabeled categories\ntab = tabulate(T.Category);                                 % tabulate categories\n[Count, idx] = sort(cell2mat(tab(:,2)),'descend');          % sort by category total\nfigure                                                      % new figure\nbar(Count)                                                  % plot bar chart\nax = gca;                                                   % get current axes handle\nax.XTick = 1:size(tab, 1);                                  % use categories as tick\nax.XTickLabel = tab(idx,1);                                 % rename tick labels\nax.XTickLabelRotation = -90;                                % rotate text vertically\n%%\n% *Broken Windows Theory*\n%\n% <https:\/\/en.wikipedia.org\/wiki\/Broken_windows_theory This theory>, embraced\n% by New York City Police Commissioner (then NYPD Chief) <https:\/\/en.wikipedia.org\/wiki\/William_Bratton % William Bratton> in the 1990s, says that leaving broken windows unrepaired leads\n% to more vandalism and greater social disorder in the neighborhood, because it\n% signals that no one cares there. Up to that pointhe NYPD focused on solving\n% serious crimes. However, the theory suggested that cracking down on petty crimes\n% might lead to reduction in more serious ones. While New York saw a drop in the\n% crime rate under Bratton, this theory is not without critics because it also\n% led to the controversial <https:\/\/en.wikipedia.org\/wiki\/Stop-and-frisk_in_New_York_City % stop and frisk> practice.\n%\n% Perhaps this theory provides one starting point for our exploration. Does\n% the SFPD data show a correlation between a petty crime like vandalism and other\n% more serious crimes? Let's start with bivariate histogram using |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/histogram2.html % histogram2>| to plot the vandalism incidents by location. Check |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/sfcrime_load_map.m % sfcrime_load_map.m>| to see how you retrieve a raster map from <https:\/\/www.mathworks.com\/help\/map\/web-map-service-1.html % Web Map Service>.\n\nsfcrime_load_map                                            % load map from WMS\nvandalism = T(T.Category == 'VANDALISM', [1,3:5,8:10]);     % subset T by category\nnbins = 100;                                                % number of bins\nxbinedges = linspace(lim.lon(1),lim.lon(2),nbins);          % x bin edges\nybinedges = linspace(lim.lat(1),lim.lat(2),nbins);          % y bin edges\nmap = flipud(A);                                            % flip image\nfigure                                                      % new figure\ncolormap cool                                               % set colormap\nhistogram2(vandalism.X, vandalism.Y, ...                    % plot 3D bivariate\nxbinedges, ybinedges, ...                               % histogram\n'FaceColor', 'flat', ...\n'FaceAlpha', 0.5, 'EdgeAlpha', 0.5)\nhold on                                                     % don't overwrite\nimage(lim.lon,lim.lat,map)                                  % add the map\nhold off                                                    % restore default\nax = gca;                                                   % get current axes handle\nax.CLim = [0 100];                                          % color axis scaling\ntitle({'San Francisco Crime Map'; ...                       % add title\n'Vandalism: Jan 2003 - May 2015'})\nzlabel('Count of Police Reports')                           % add axis label\ntext(-122.52,37.82, 300, 'Golden Gate Bridge')              % annotate landmark\ntext(-122.4,37.82, 200, 'Bay Bridge')                       % annotate landmark\n%%\n% You can see that those incidents are highly concentrated in several hot\n% spots indicated by the tall magenta bars. There is one particularly tall bar\n% that sticks high above the rest. Where is it? We can plot the top 50 locations\n% of high vandalism incidents with the highest spot marked as \"#1\". Check |<https:\/\/blogs.mathworks.com\/images\/loren\/2016\/sfcrime_draw_locs.m % sfcrime_draw_locs.m>| to see the details.\n\nfigure                                                      % new figure\nusamap(lim.lat, lim.lon);                                   % set map coordinates\ngeoshow(A, R)                                               % display map\nsfcrime_draw_locs(vandalism,lim.lat,lim.lon,nbins,50,'m')   % draw top 50 locations\ntitle({'Vandalism: Jan 2013 - May 2015';'Top 50 Locations'})% add title\n%% Crime Heat Map\n% If we process other categories of crime with the same grid, we can create\n% a matrix of crimes by location and we can plot it as a heat map with |<https:\/\/www.mathworks.com\/help\/matlab\/ref\/imagesc.html % imagesc>| to make comparison easier.\n%\n% Bright horizontal stripes indicate a location where different types of\n% crimes are committed. There is one particularly bright stripe with ASSAULT,\n% DRUG\/NARCOTIC, LARCENY\/THEFT, NON-CRIMINAL, OTHER OFFENSES, and WARRANTS. ROBBERY,\n% SUSPICIOUS OCC, and VANDALISM also appear in a lighter shade.\n%\n% If you look at LARCENY\/THEFT, it doesn't necessarily form rows of stripes\n% everywhere it is bright. If you are going to steal, it may be worthwhile to\n% go to places you find high value targets.\n\ncats = categories(T.Category);                              % extract categories\nrawCounts = zeros((nbins-1)^2,length(cats));                % set up an accumulator\nfor i = 1:length(cats)                                      % loop over categories\ndata = T(T.Category == cats{i}, 8:9);                   % subset T by category\n[N,~,~] = histcounts2(data.X, data.Y, ...               % get bivariate histogram\nxbinedges, ybinedges);                              % bin counts\nrawCounts(:,i) = N(:);                                  % add to accumulator\nend                                                         % as a vector\n\nfigure                                                      % new figure\nimagesc(rawCounts)                                          % plot matrix as an image\nax = gca;                                                   % get current axes handle\nax.CLim = [0 200];                                          % color axis scaling\nax.XTick = 1:length(cats);                                  % use categories as tick\nax.XTickLabel = cats;                                       % rename tick labels\nax.XTickLabelRotation = -90;                                % rotate text vertically\nylabel('Locations')                                         % add axis label\ntitle('SF Crime Heat Map by Location')                      % add title\ncolorbar                                                    % add colorbar\n%% Principal Components Analysis\n% We can use <https:\/\/www.mathworks.com\/help\/stats\/principal-component-analysis-pca.html % Principal Component Analysis> uisng |<https:\/\/www.mathworks.com\/help\/stats\/pca.html % pca>| to get a better sense of the relationship among different categories and\n% visualize the result with |<https:\/\/www.mathworks.com\/help\/stats\/biplot.html % biplot>|. We need to use weighted PCA to address the big scale differences among\n% categories. We also need to hide some categories as the output will be too cluttered\n% if we try to display all 38 of them.\n\nw = 1 .\/ var(rawCounts);                                    % inverse variable variances\n[wcoeff, score, latent, tsquared, explained] = ...          % weighted PCA with w\npca(rawCounts, 'VariableWeights', w);\ncoefforth = diag(sqrt(w)) * wcoeff;                         % turn wcoeff to orthonormal\nlabels = cats;                                              % categories as labels\nlabels([4,9,10,12,13,15,18,20,21,23,25,27,28,31,32]) = {''};% drop some labels to avoid clutter\nfigure                                                      % new figure\nbiplot(coefforth(:,1:2), 'Scores', score(:,1:2), ...        % 2D biplot with the first two comps\n'VarLabels', labels)\nxlabel(sprintf('Component 1 (%.2f%%)', explained(1)))       % add variance explained to x axis label\nylabel(sprintf('Component 2 (%.2f%%)', explained(2)))       % add variance explained to y axis label\naxis([-0.1 0.6 -0.3 0.4]);                                  % define axis limits\ntitle({'Principal Components Analysis'; ...                 % add title\nsprintf('Variance Explained %.2f%%',sum(explained(1:2)))})\nhtext = findobj(gca,'String','VEHICLE THEFT');              % find text object\nhtext.HorizontalAlignment = 'right';                        % change text alignment\nhtext = findobj(gca,'String','ASSAULT');                    % find text object\nhtext.Position = [0.2235 0.0909 0];                         % move label position\nhtext = findobj(gca,'String','ROBBERY');                    % find text object\nhtext.Position = [0.2148 0.1268 0];                         % move label position\nhtext = findobj(gca,'String','ARSON');                      % find text object\nhtext.HorizontalAlignment = 'right';                        % change text alignment\nhtext = findobj(gca,'String','EXTORTION');                  % find text object\nhtext.HorizontalAlignment = 'right';                        % change text alignment\n%% Assault, Robbery and Vehicle Theft\n% The upper half of the <https:\/\/www.mathworks.com\/help\/stats\/biplot.html biplot>\n% seems to show less sophisticated crimes than the lower half, and VANDALISM is\n% more closely related to ARSON, and SUSPICIOUS OCC, and it is also related to\n% LARCENY\/THEFT but LARCENY\/THEFT itself is closer to BAD CHECKS, EXTORTION, FRAUD\n% and SEX OFFENSES FORCIBLE.\n%\n% You can also see ASSAULT, ROBBERY, and VEHICLE THEFT are also very closely\n% related. Among those, ASSAULT has the largest count of incidents. Maybe that's\n% the crime we need to focus on. Let's check the top 50 locations for those crimes.\n% As you would expect, you see good overlap of those locations.\n\nassault = T(T.Category == 'ASSAULT', [1,3:5,8:10]);         % subset T by category\nvehicle = T(T.Category == 'VEHICLE THEFT', [1,3:5,8:10]);   % subset T by category\nrobbery = T(T.Category == 'ROBBERY', [1,3:5,8:10]);         % subset T by category\n\nfigure                                                      % new figure\nusamap(lim.lat, lim.lon);                                   % set map coordinates\ngeoshow(A, R)                                               % display map\ntopN = 50;                                                  % get top 50\nhold on                                                     % don't overwrite\nsfcrime_draw_locs(assault,lim.lat,lim.lon,nbins,topN,'r')   % draw locations in red\nsfcrime_draw_locs(vehicle,lim.lat,lim.lon,nbins,topN,'g')   % draw locations in green\nsfcrime_draw_locs(robbery,lim.lat,lim.lon,nbins,topN,'b')   % draw locations in blue\nhold off                                                    % restore default\ntitle({'Assault, Robbery, and Vehicle Theft'; ...           % add title\nsprintf('Top %d Locations',topN)})\n%% Grand Theft Auto\n% To see if we can use the data for prediction, we need to check the time dimension\n% as well. We can use the weekly interval column to plot the weekly trends. This\n% is strange. VEHICLE THEFT suddenly dropped in 2006! It's time to dig deeper.\n\nfigure                                                      % new figure\n[G, t] = findgroups(assault.Week);                          % group by weekly intervals\nweekly = splitapply(@numel, assault.Week, G);               % get weekly count\nplot(t, weekly)                                             % plot weekly count\nhold on                                                     % don't overwrite\n[G, t] = findgroups(vehicle.Week);                          % group by weekly intervals\nweekly = splitapply(@numel, vehicle.Week, G);               % get weekly count\nplot(t, weekly)                                             % plot weekly count\n[G, t] = findgroups(robbery.Week);                          % group by weekly intervals\nweekly = splitapply(@numel,robbery.Week, G);                % get weekly count\nplot(t, weekly)                                             % plot weekly count\nhold off                                                    % restore default\ntitle('ASSAULT, ROBBERY, VEHICLE THEFT - Weekly')           % add title\nylabel('Count of Incidence Reports')                        % add axis label\nlegend('ASSAULT','VEHICLE THEFT', 'ROBBERY')                % add legend\n%% Eureka!\n% Looking at descriptions, you notice that recovered vehicles are also reported\n% as incidents. For some reason, it was the incidents of recovered vehicles that\n% dropped off since 2006. Such a sudden change is usually caused by a change in\n% reporting criteria, but it looks like half of the stolen cars were often recoveverd\n% eventually in the good old days? Are they still recovered but not reported,\n% or are they no longer recovered? <https:\/\/www.bostonglobe.com\/metro\/2013\/03\/20\/stolen-car-rate-plunges\/X3E5DQwKP5p4JonryLxUnO\/story.html % This Boston Globe article> mentions that \"Of the 10 vehicles most frequently\n% stolen, all were made before 2007,\" and credit new antitheft devices for the\n% decline, and say those still stolen are shipped overseas (not likely to be recovered).\n%\n% Anyhow, this time series analysis shows that there is a lot more going\n% on than just time and location in crime. We could deal with the change in car\n% theft reporting by omitting the data prior to 2006, but we would have to redo\n% the heat map and run PCA again. Perhaps the Broken Windows Theory is not that\n% useful as the basis of our analysis.\n\nisRecovered = strfind(vehicle.Descript,'RECOVERED');        % find 'RECOVERED' in descrption\nisRecovered = ~cellfun(@(x) isempty(x), isRecovered);       % is recovered if not empty\n[G, t] = findgroups(vehicle.Week(~isRecovered));            % group by weekly intervals\nweekly = splitapply(@numel, vehicle.Week(~isRecovered), G); % get weekly count\nplot(t, weekly)                                             % plot weekly count\nhold on                                                     % don't overwrite\n[G, t] = findgroups(vehicle.Week(isRecovered));             % group by weekly intervals\nweekly = splitapply(@numel, vehicle.Week(isRecovered), G);  % get weekly count\nplot(t, weekly)                                             % plot weekly count\nhold off                                                    % restore default\ntitle('VEHICLE THEFT - Weekly')                             % add title\nylabel('Count of Incidence Reports')                        % add axis label\nlegend('UNRECOVRED', 'RECOVERED')                           % add legend\n%% Another Kind of Broken Windows\n% Loren shared a NY Times article <http:\/\/www.nytimes.com\/2016\/04\/25\/us\/san-francisco-torn-as-some-see-street-behavior-worsen.html % San Francisco Torn as Some See \u00e2\u20ac\u02dcStreet Behavior\u00e2\u20ac\u2122 Worsen> with me. It is about\n% the rise of smash-and-grab thefts from vehicles from the perspective of a local\n% resident at Lombard Street, famous for its zigzags. The article says victims\n% are often tourists and out-of-town visitors. LARCENY\/THEFT is clearly on the\n% rise, and it indeed comes mainly from Auto-related thefts.\n\nlarceny = T(T.Category == 'LARCENY\/THEFT', [1,3:5,8:10]);   % subset T by category\nisAuto = strfind(larceny.Descript,'LOCKED');                % find 'LOCKED' in descrption\nisAuto = ~cellfun(@(x) isempty(x), isAuto);                 % is auto if not empty\nfigure                                                      % new figure\nsubplot(1,2,1)                                              % subplot 1\n[G, t] = findgroups(larceny.Week(isAuto));                  % group by weekly intervals\nweekly = splitapply(@numel, larceny.Week(isAuto), G);       % get weekly count\nplot(t, weekly)                                             % plot weekly count\ntitle('LARCENY\/THEFT, Auto')                                % add title\nsubplot(1,2,2)                                              % subplot 2\n[G, t] = findgroups(larceny.Week(~isAuto));                 % group by weekly intervals\nweekly = splitapply(@numel, larceny.Week(~isAuto), G);      % get weekly count\nplot(t, weekly)                                             % plot weekly count\ntitle('LARCENY\/THEFT, Non-Auto')                            % add title\nylim([0 500])                                               % adjust y-axis scale\nhold off                                                    % restore default\n%%\n% When you map the top 100 locations of car-related LARCENY\/THEFT, Lombard\n% Street doesn't make the cut, but the distribution indeed looks different from\n% VEHICLE THEFT. You see several locations near famous tourist attractions like\n% Fisherman's Wharf and SoMa, known for the concentration of tech companies. It\n% appears they do go after tourists and business vistors who are not familiar\n% with the lay of the land. Now we found factors that affect one type of crime!\n\nfigure                                                      % new figure\nusamap(lim.lat, lim.lon);                                   % set map coordinates\ngeoshow(A, R)                                               % display map\ntopN = 100;                                                 % get top 100\nhold on                                                     % don't overwrite\nsfcrime_draw_locs(larceny(isAuto,:),lim.lat,lim.lon, ...    % draw locations in red\nnbins,topN,'r')\nsfcrime_draw_locs(vehicle,lim.lat,lim.lon,nbins,topN,'b')   % draw locations in blue\nplotm(37.802139, -122.41874, '+g')                          % add landmark\nplotm(37.808119, -122.41790, '+g')                          % add landmark\nplotm(37.7808297, -122.4301075, '+g')                       % add landmark\nplotm(37.7842048, -122.3969652, '+g')                       % add landmark\nplotm(37.7786559, -122.5106296, '+g')                       % add landmark\nplotm(37.8038433, -122.4418518, '+g')                       % add landmark\nhold off                                                    % restore default\ntitle({'LARCENY\/THEFT - Auto vs. VEHICLE THEFT'; ...        % add title\nsprintf('Top %d Locations',topN)})\ntextm(37.802139, -122.41574, 'Lombard Street', ...          % annotate landmark\n'Color', 'g')\ntextm(37.8141187, -122.43450, 'Fisherman''s Wharf', ...     % annotate landmark\n'Color', 'g')\ntextm(37.7808297, -122.4651075, 'Japan Town', ...           % annotate landmark\n'Color', 'g')\ntextm(37.7842048, -122.3949652, 'SoMa', ...                 % annotate landmark\n'Color', 'g')\ntextm(37.7786559, -122.5086296, 'Sutro Baths', ...          % annotate landmark\n'Color', 'g')\ntextm(37.8088629, -122.4628518, 'Marina District', ...      % annotate landmark\n'Color', 'g')\ntextm(37.715564, -122.475662, ...                           % add note\n{'Red:  LARCENY\/THEFT - Auto','Blue: VEHICLE THEFT'}, 'BackgroundColor','w')\n%% Summary\n% We looked at the Broken Windows Theory as a starting point of this exploration,\n% but the SFPD data doesn't provide easily detectable correlation among crimes\n% that you would expect based on this theory, and time series analysis shows that\n% there is a lot more going on than just time and location that affects crime.\n% When we focused on specific types of crime that are in decline or on the rise,\n% and cross referenced those with external data sources, we learned a lot more.\n% This points to a potential for enriching this dataset with data from other sources\n% like demographics to improve predictive capability, but it also creates <http:\/\/www.nytimes.com\/roomfordebate\/2015\/11\/18\/can-predictive-policing-be-ethical-and-effective % a dilemma of civil rights violation> if not done carefully. British Transit\n% Police got creative and did an interesting experiment to <https:\/\/www.minnpost.com\/second-opinion\/2013\/05\/brits-test-use-watching-eyes-posters-deter-bike-thieves % place \"watching eyes\" posters to deter bike thefts>. This is a good creative\n% use of insights from doing this type of analysis.\n%\n% By the way, I couldn't help playing with the <https:\/\/www.mathworks.com\/help\/matlab\/visualize\/defining-scenes-with-camera-graphics.html % camera object in MATLAB Graphics>. Here is a fun flyover animation of the San\n% Francisco crime scene! Check out <https:\/\/blogs.mathworks.com\/images\/loren\/2016\/sfcrime_flyover.m % sfcrime_flyover.m> for more details.\n%\n% <<rotateVand.gif>>\n%\n% Hopefully you now see how you can fight crime with data. Download this\n% post in standard MATLAB script (click \"_Get the MATLAB code_\" below) and use\n% it as the starting point for your exploration or even join Kaggle competition.\n% If you find anything interesting, please let us know!\n%\n%\n##### SOURCE END ##### a67d36817d9b4eab82b7e14c6a36016a\n--><\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"overview-image\"><img decoding=\"async\"  class=\"img-responsive\" src=\"https:\/\/blogs.mathworks.com\/images\/loren\/2016\/rotateVand.gif\" onError=\"this.style.display ='none';\" \/><\/div>\n<p>&#8230; <a class=\"read-more\" href=\"https:\/\/blogs.mathworks.com\/loren\/2016\/05\/27\/fighting-crime-with-predictive-policing\/\">read more >><\/a><\/p>\n","protected":false},"author":39,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[61],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1512"}],"collection":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/comments?post=1512"}],"version-history":[{"count":9,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1512\/revisions"}],"predecessor-version":[{"id":2117,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/posts\/1512\/revisions\/2117"}],"wp:attachment":[{"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/media?parent=1512"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/categories?post=1512"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.mathworks.com\/loren\/wp-json\/wp\/v2\/tags?post=1512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}