Title: Using Crime to predict crime
1Using Crime to predict crime
2DATA Collection
- Obtained data from the area connect web site
which is set up to provide people a way to
compare different cities crime rates. This data
is based off of 2004 police reports - Used list of largest American urban areas
- collected data for 55
3Variables
- Murder, Forcible Rape, Robbery, Assault,
Burglary, Theft, are the predictor variables. - Auto theft was used as the response.
4Linear analysis.
- Did a simple linear analysis to check if the
different variables were significant in the
correlation. - Only the statistic on robbery was significant on
its own, but the p-value for the Regression is
.0004.
5Using Arc to linearize the Data.
- Arc is a tool that is designed for Linear
analysis of data. - Transformed Data set using these
different transformations - Scatter plot matrix shows clear correlation
between variables
6Experimenting with Joone
- First networks that was tried were done using the
Joone libraries. - Achieved accuracy of getting 60 to pass using a
test criteria of the given being within - (.25
Correct value). - Determined that two layer networks would provide
results with around the same accuracy, but the
network would target the mean of all outputs
rather than our target.
7Problems with Joone
- Joone proved to be too slow, and would often run
out of memory during a job. At times it would
take nearly 20 mn - It was then decided that lens would be the new
choice for the NN.
8Using Lens
Hidden Layer
Input Layer
- Different combinations were tried to find the
optimal network design. - The best combination appeared to be 10 hidden
nodes. A learning rate of .5 and momentium of .15
Output Node
9Other optimizations done
- In order to improve accuracy a different method
was used to normalize the data. Every value
from each of the different crime times was
divided by the largest value in that group. This
caused accuracy form the network to go from 75
pass to 80 pass.
10How the network was run.
- 38 cities were chosen out of the set of 55, at
random with no repeats for 10 different training
sets and 10 different testing sets.
- The network was ran until it reached the global
min for the network. - The subsequent test was ran through the network
- If the out put of the network was with in 1 std
for the Auto theft data then it passed.
11The NN performance
................. original 1303.8003125 found
1091.99575 pass 211.8045625 original
959.89975 found 1106.674875 pass
-146.775125 original 963.501 found 1090.377875
pass -126.876875 original 1150.4005
found 1330.19425 pass
-179.79375 original 843.600875 found
1207.3351875 pass -363.7343125 original
1124.6999375 found 1302.0910625 pass
-177.391125 original 385.7986875 found
547.438375 pass -161.6396875 original
483.8010625 found 1027.8585625 fail
-544.0575 original 1208.101125 found
2663.2506875 fail -1455.1495625 original
742.6986875 found 1062.874 pass
-320.1753125 original 1502.199625 found
1382.2215625 pass 119.9780625 original
309.798875 found 291.4674375 pass
18.3314375 original 687.5995625 found
836.4763125 pass -148.87675 original
838.4005625 found 547.5485625 pass
290.852 original 421.3005625 found 830.647125
pass -409.3465625 original 1512.90125
found 1195.2898125 pass
317.6114375 original 716.1999375 found
962.5335 pass -246.3335625 original
956.6989375 found 1104.6485 pass
-147.9495625 original 376.599375 found
1013.2063125 fail -636.6069375 percent
pass 0.805263157894737 avg error for passing
entries 138.226154967105 avg error for failing
entries 273.235211891447 posfail 37 negfail
37 pospass 155 negpass 151 number of
enties 380
12Using SAS to Analyze data
- SAS is a statistical tool that allows for the
analysis of any data set
- Compared results from SAS with neural network
13 Principle component analysis
-
Correlation Matrix
-
- A B
C D E
F G
-
- A A 1.0000 0.0362 0.6493
0.4389 0.2682 -.0228 0.1588
- B B 0.0362 1.0000 0.3552
0.2232 0.5774 0.1841 0.1951
- C C 0.6493 0.3552 1.0000
0.4956 0.5969 0.2168 0.2621
- D D 0.4389 0.2232 0.4956
1.0000 0.3342 0.2278 -.0147
- E E 0.2682 0.5774 0.5969
0.3342 1.0000 0.5472 0.3807
- F F -.0228 0.1841
0.2168 0.2278 0.5472 1.0000
0.1735 - G G 0.1588 0.1951 0.2621
-.0147 0.3807 0.1735 1.0000 -
- A Murder, B Rape, C Robbery, D
Assault, E Burglary, F Theft, - G Auto Theft
- From the correlation matrix above, Robbery had
the highest correlation values which indicate
that it has the most significance for the data. - This corresponds with the data that we obtained
from our network
14Scatterplot matrix of Autotheft vs Robbery
15What its learned
- That auto theft can be predicted using other
types of crime with fair amount of accuracy. - NNs are cool.
16What else can be done
- Try different targets, possibly multiple.
- Further statistical analysis.
- Expand dataset.