Where are Chicago aggravated assaults located? Can we predict when and where will an assault crime happen? Figure 1 shows the assault occurrence density using 0.25mi search radius.
In this project, I first show the overlay of the possible factors with my assumed weight and then construct a risk terrain model. This risk terrain model is an overlay of the predictor variables includes proximity variables using euclidean distance and kernel density. The weights of each significant factor are obtained by training a Poisson regression model based on Chicago 2014 assault crime data.
I. Decision Factors and Overlay Map
Based personal life experience, I pick five decision factors that are thought to be most influential to assault rate and choose the weight for each, shown as below:
|Distance to Street Lights Out||0.3|
|Distance to Bars||0.25|
|Distance to Bus Stops||0.2|
|Distance to CBD Area||0.15|
|Distance to Abandoned Buildings||0.1|
I then create a weighted overlay map (figure 2.) to show the percent likelihood of people committing assault. The overlay shows that the percent likelihood decreases as the location radiates outwards from the city center. If you compare it with figure 1, which is the actual assault occurrence pattern, you can tell that this estimation is not accurate. To improve the estimation, I use a Poisson regression model.
II. Poisson Regression Model and Significant Variables
In the Poisson regression, I took proximity factors and density measures as predictor variables and assault count as the response variable. After optimizing the Akaike Information Criterion (AIC), I got a model with 13 significant variables:
Variable names explanation:
- DISTSTLITE: distance to street lights out
- DISTABANB: distance to abandoned buildings
- DISTABANC: distance to abandoned cars
- DISTBARS: distance to bars
- DISTCBD: distance to central business districts (CBD)
- DISTFFOOD: distance to fast food
- DISTGAS: distance to gas station
- DISTGRCRY: distance to grocery stores
- DISTLAUDR: distance to laundromats
- DISTSCHL: distance to schools
- BldgDens: building density
- SchlDens: school density
- LdryDens: laundromats density
In order to visualize which variable brings the most influence to the model, I calculate variables’ standardized coefficients, take the absolute values and plot a bar chart. Ranking from high to low, the bar chart shows that distance to school, CBD and abandoned buildings play the most important role in the model (figure 4).
III. Building a Risk Terrain Model (RTM)
To visualize the assault prediction, I use the overlay approach. The risk terrain overlay is calculated using raster calculator and the formula is variables * exp(“Estimate” column).
The detailed formula is shown below:
Exp(0.696 – “diststlite” * 0.000572 – “distabanb” * 0.000347 – “distabanc” * 0.0000626 + “distbars” * 0.0000269 + “distcbd” * 0.00000106 – “distffood” * 0.000118 – “distgas” * 0.0000415 – “distgrcry” * 0.000244 – “distlaudr” * 0.0000548 – “distschl” * 0.000203 + “bldgDens_r” * 0.000796 – “schlDens_r” * 0.00163 + “ldryDens_r” * 0.00162)
The outcome of this raster calculation is a risk terrain model showing the likelihood of getting assaults in Chicago (figure 5.)
As the RTM map shows, the reder the area is, the higher expected risk of getting an assault. More specifically, if an area has a value of 2.5, it means that area is 2.5 times more likely to get an assault compared to the area with number equal to 1.
The map shows that north and south parts neighborhoods have relatively high expected rates of assaults. It is worth noticing that there are a few outliers, such that the rate jumps from 2.35 to 58.01.
IV. Potential Application
This model can be used for police departments to better dispatch police forces, and it can also help citizens to better protect themselves.