A Report on the Relationship of Crime Data and Education Data in US Counties

Final Project Report
Data Science 1 with R (STAT 301-1)

Author

Divya Gupta

Published

January 5, 2024

Introduction

Crime occurs on a daily basis around the United States. The level of education in counties differs greatly around the country. What are the relationships between crime and education levels within counties and why are they there or not? Using data on different crimes from 2016 and for adults older than 25, the exploration compares the trends with those of educational data for counties in the surrounding years. The crimes explored include an overall crime rate, aggravated assault, arson, burglary, larceny, murder, motor vehicle theft, rape, and robbery.

Data overview & quality

The joined data set contains the fips code, or the specific identity code for each included US county as well as the crime numbers per county and their rates for the year 2016 for the education data and the surrounding years for the crime data. It also includes the education levels and their values for each county. Education levels include less than a high school diploma, only a high school diploma only or some college or associate’s degree. The data set covers over 3,200 counties, and explores 8 different crimes and 4 different education levels.

For links to original crime and education data sets, see the references section.

The quality of the original education data set was good, containing information on multiple years, that were removed from the working data set if not related to the researching question of 2016 and the surrounding years. The educational data contained numerical variables as each education level. There was also the variable containing the county and respective state for each of the data values. This was mirrored in the crime dataset and is how the two were joined for this EDA.

The crime data set included 8 crimes and the crime rate as well as the population of the counties. The 8 crimes are aggravated assault, arson, burglary, larceny, murder, motor vehicle theft, rape, and robbery.

Table 1: Source of Definitions is in the references section.
Crime Definition
Aggravated Assault Intentional assault with the purpose of causing severe bodily harm
Arson Setting fire to property with criminal intent
Burglary Unlawful entry into property with criminal intent
Larceny The unlawful taking of property
Murder Killing another person knowingly
Motor Vehicle Theft Theft of a motor vehicle
Rape Sexual penetration without consent of victim
Robbery The unlawful taking of property through threat or force

The crime data set was qualitatively fair, but needed cleaning. The column names were not set and there were many unknown values as a result. The crime data alone contained the 8 crimes, as well as their rates, population of the counties, and the overall crime rate per 100,000 people in that county. After cleaning, the two sets were joined by their county and state data.

Through the joining of the data sets, missingness was visible.

Table 2: Some of the values that are missing in the crime set that are in the education set.
CountyState
Alabama, AL
Alaska, AK
Aleutian Islands, AK
Chugach Census Area, Alaska, AK
Copper River Census Area, Alaska, AK

We can see in Table 2 all of the counties that are in the education data set and not the crime data set.

There are 150 county observations missing in the crime data set. This missingness, after analyzing the data, is because the education dataset has overall state data. Therefore there are observations such as Alabama, AL and Alaska, AK, etc in the data set that are not in the crime data. Additionally, most of the other missing counties are from Puerto Rico or in Alaska. For the rest of the analysis of this joined data set, these missing values will be included, but shown as unknown and missing.

Table 3: The values that are missing in the education set that are in the crime set.
CountyState
LaSalle County, IL
Wade Hampton Census Area, AK
Shannon County, SD

We can see in Table 3 all of the counties that are in the crime data set that are not in the education data set.

Overall, missingness, due to where the missingess occurs in the USA counties, does not impact the exploratory data analysis greatly.

Explorations

Research Questions

The research questions used to explore the data set were:

Does crime rate and education differ across the US? Does it depend on the region?

What relationship, if any, does crime rate and education level have?

Do specific crimes correlate or trend with specific education level?

To explore these questions, the first analysis must be looked at through maps of the United States.

Overall Map Analysis

Figure 1: Map of crime rate in all US counties per 100,000 people

In Figure 1, it is shown that most of the map is dark, indicating that a majority of counties are not within the 500 to 1000 crimes per 100,000 people mark.

Figure 2: Map of crime rate in US counties with a rate smaller than 500 per 100,000 people

In Figure 2 only counties that have a crime rate of less than 500 are colored. It is evident through this that most counties have less than 500 crimes per 100,000 people in 2016 according to the data. The distribution of these crimes is a higher rate in the south and in some parts of southern california.

Figure 3: Map of crime rate in US counties with a rate larger than 500 per 100,000 people

In Figure 3 only a few counties are displayed, those that have a crime rate of larger than 500.

From this overall map analysis of county data, the missing counties are visible in gray in Figure 1. All 3 maps reveal how crime rate, for the majority of counties, is below 500 crimes per 100,000 people. Now we can explore how that rate changes based on the counties’ population level.

Crime Rate and Population

Figure 4: Plot of crime rate and population

In Figure 4 it is shown how the as the population of a county increases from 0 to 250,000 people, there is a strong positive relationship between population and crime rate. The relationship flattens out and becomes less positive after 250,000 people in a population. This relationship change could be from the presence of outliers that skew the data in larger populations.

Figure 5: Plot of crime rate and population

Figure 5 only contains the populations from 0 to 250,000 people, and it is clear that there is a positive relationship between county population and crime rate per 100,000 people.

With this information, we can explore, univariately, how the different crimes are distributed by their count.

Table 4: The County with the Highest Crime Rate in USA
CountyState crime_rate_per_100000
St. Louis city, MO 1792
Table 5: The County with the Lowest Crime Rate in USA
CountyState crime_rate_per_100000
Essex County, VT 0

Extremes can also be seen in this data. In Table 4, the county with the highest crime rate is St. Louis City, Missouri and the lowest according to Table 5, is Essex County, Vermont.

Table 6: Populations of Highest and Lowest Crime Rate Counties in USA
CountyState crime_rate_per_100000 population
St. Louis city, MO 1792 318416
Essex County, VT 0 6211

Table 6 reveals how St. Louis City and Essex County compare in population. The population of St. Louis City is 51.3 times larger than that of Essex County, inline with the findings from Figure 5 that there is a positive relationship between county population and crime rate.

Count and Rate Distrubtion of Crimes

These 16 figures are the 8 crimes in the data set, arson, burglary, aggravated assault, larceny, murder, motor vehicle theft, rape, and robbery, next to their respective rates.

For arson, the highest amount of counties were below 50 crimes per 100,000 people.

For burglary, the highest amount of counties were at 3000 crimes per 100,000 people.

For aggravated assault, the highest amount of counties were at 500 crimes per 100,00 people.

For larceny, the highest amount of counties were at 15,000 crimes per 100,000 people.

For murder, the highest amount of counties were below 10 crimes per 100,000 people.

For motor vehicle theft, the highest amount of counties were at 700 crimes per 100,000 people.

For rape, the highest amount of counties were at 250 crimes per 100,000 people.

For robbery, the highest amount of counties were under 100 crimes per 100,000 people.

From this, it is shown that for the variables burglary, aggravated assault, larceny, and motor vehicle theft, the distributions of the rates resemble a bell curve and have the highest number of crimes for the most counties. These will be the four main crime variables used in this exploratory data analysis with education data.

Distrubtion of the Rates of Education Levels

These figures are the 4 education level rates in the data set from 2017-21, less than a high school diploma, high school diploma only, some college or associates degree, and bachelor’s degree or higher.

For an education level of less than a high shool diploma, the highest amount of counties had percentage of 8.

For an education level of only a high school diploma, the highest amount of counties had a percentage of 32.

For an education level of some college or an associates degree, the highest amount of counties had a percentage of 32 as well.

For an education level of a bachelor’s degree or higher, the highest amount of counties had a percentage of 19.

Table 7: Counties with the Highest and Lowest Percentages of Less than a High School Diploma Level
CountyState Pct less than HSD, 2017-21 population
Kenedy County, TX 81.5533981 412
Petroleum County, MT 0.6024096 506
Table 8: Counties with the Highest and Lowest Percentages of only a High School Diploma Education Level
CountyState Pct HSD only, 2017-21 population
Forest County, PA 53.393443 7631
Falls Church city, VA 6.547131 13508
Table 9: Counties with the Highest and Lowest Percentages of Some College or Associates Degree Education Level
CountyState Pct some CAD, 2017-21 population
Loving County, TX 76.92308 95
Kenedy County, TX 0.00000 412
Table 10: Counties with the Highest and Lowest Percentages of a Bachelor’s Degree or Higher Education Level
CountyState Pct BD or higher, 2017-21 population
Falls Church city, VA 78.69877 13508
Loving County, TX 0.00000 95

Table 7 reveals that Kenedy County, Texas has the highest percentage of people with less than a high school diploma. Table 9 also reveals that this same county has the lowest percent of people with some college or associate degree.

Table 9 reveals that the county with the highest percentage of some college or associate degree is Loving County, Texas. Loving County is also, in Table 10 the county with the least percentage of people with a bachelor’s degree or higher.

Falls Church city, Virginia is the county with the least percentage of people with a high school degree only according to Table 8 and this same county is the highest percentage of people with a bachelor’s degree or higher in Table 10.

From this, it can be inferred that Falls Church city seems to have the highest level of education from this data in the given US counties. Additionally, it can be drawn that Kenedy County is one of the lowest educated counties in the given US counties according to this data.

Figure 6: Map of Less than a High School Diploma

In Figure 6, the most yellow region, or the region with the highest number of people with less than a high school diploma is in Texas, which is what Table 7 revealed for Kenedy county.

Figure 7: Map of Only a High School Diploma

In Figure 7, the distribution of the percents of high school diplomas only is concentrated in the midwest and along the Appalachian mountains. According to Table 8, the highest percentage county for only a high school diploma is Forest County, Pennsylvania which is shown in the map as yellow.

Figure 8: Map of Some College or Associate’s Degree

In Figure 8, the distribution of the percents of some college or associates degree is not concentrated in one region or place. There is one yellow region with a high percentage over 60% in Texas which corresponds with Table 9 and its display that Loving County, Texas has the largest percent of some college or associates degree.

Figure 9: Map of Bachelor’s Degree or Higher

In Figure 9, the distribution of the percents of a bachelor’s degree or higher has a high percentage concentrated in central Colorado and in the east coast. According to Table 10, the county with the highest bachelor degree or higher percentages is in Falls Church city, VA, on the east coast as seen in this map.

Relationship Between Crime Rate and Education Levels

Figure 10: Education Levels vs Crime Rate

In Figure 10, each of the different education levels is a different color. The x-axis shows the percentages of people in a certain county with that education level. The y-axis then reveals the correlating crime rate for that percentage of that respective education level.

Findings For Each Education Level

Less than High School Degree: As the percentage of having less than a high school degree increases, the crime rate also increases after 30%.

High School Degree only: As the percentage of high school degrees only increases, there is an up and down trend, indicating no strong positive or negative relationship with crime rate.

Some College or Associates Degree: As percentage of some college or associates degree increases, there is a strong negative trend with crime rate. In other words. The data reveals a decrease in overall crime rate with an increase in percentage of people with some college or associates degree.

Bachelor’s Degree or Higher: As percentage of a bachelor’s degree or higher increases, there is a slightly negative trend after 40%. This trend is not as strong as the college/associates degree correlation.

Research Question: What relationship, if any, does crime rate and education level have?

This multivariate analysis plot shows that the strongest relationships between education level and crime rate is for the levels of some college or associates degree and less than a high school degree. This could indicate that less education leads to higher crime rates according to this data Since there isn’t a strong negative relationship for crime rate and Bachelor’s degree or higher, it cannot be said that a higher eduction leads to less crime.

It is also curious that there is a stronger negative relationship for some college or associates degree than there is for a bachelor’s degree or higher.

Relationships of Specific Crimes with Education Levels

Even though two of the education levels have flatter curves and a strong statement about the relationship of a higher education with crime rate cannot be made, explorations of education levels with specific crimes can reveal if different education backgrounds changes the prevalence of different crimes in counties.

Do specific crimes correlate or trend with specific education level?

For this data, a data analysis plot is made for the four levels of education, with all of the crime percentages for each crime visible.

Figure 11: Percent Less than High School Diploma vs. Crimes Rates

In Figure 10, there is a strong positive relationship with the percentage of people with less than a high school diploma and the crime rate. In Figure 11 the crime rate is broken down into specific crimes.

Larceny and Burglary are the most prevalent crimes, for the most part and for most percentages. As the percentage of people with less than a high school diploma increases, the larceny rate decreases after 10% and burglary increases. Additionally, the crimes rape, motor vehicle theft, aggravated assault, and robbery all have an positive upward trend for their rate and the increasing percentage of less than a high school degree.

Figure 12: Percent High School Diploma vs. Crimes

In Figure 10, there is no clear relationship with the percentage of people with less than a high school diploma and the crime rate. In Figure 12 Larceny and Burglary are again the most prevalent crimes. As the percentage of people with only a high school diploma increases, the percentage of larcenies in the county were reportedly lower. The Burglary levels increased slightly, where all other crimes remained low levels.

Figure 13: Percent Some College or Associate’s Degree vs. Crimes

In Figure 10, there is a strong negative relationship with the percentage of people with some college or associates degree and the crime rate. In Figure 13 larceny is the most prevalent crime. It increases with the increased CAD rate, but decreases between the percentages of 30% and 50% then increases again with percentage. All other crimes decreased as the percentage of some college or associate’s degree increased.

Figure 14: Percent Bachelor’s Degree or Higher vs Crimes

In Figure 10, there is a slightly negative but mostly flat relationship with the percentage of people with a bachelor’s degree or higher and the crime rate. In Figure 14 larceny again is the most prevalent crime. Larceny increases as the percentage of bachelor’s degrees or higher increases until 40%. Then the larceny crimes decrease with increasing bachelor’s degrees or higher education rates. Burglary decreases with increasing bachelor’s rates and all other crimes remain consistent.

Interesting Findings

Larceny is the most common crime, which is consistent with logic because it is the least violent, by definition. Although the data reveals that an increased percentage of people with less than a high school diploma increases the prevalence of motor vehicle theft, rape, robbery, and aggravated assault, these crimes did not decrease in prevalence with increased percentages of bachelor’s degree or higher.

In fact, have a higher percentage of people with a bachelor’s degree or higher did not affect the rates of those same crimes at all according to the data set in 2016.

Conclusion

The three initial research questions were:

Does crime rate and education differ across the US? Does it depend on the region?

What relationship, if any, does crime rate and education level have?

Do specific crimes correlate or trend with specific education level?

Through this exploratory data analysis it was shown that crime rate differs greatly across the US and is concentrated a lot in the south region. The regions with the lowest crime rates were the Northeast and North-midwest. This is visible in Figure 2. In Figure 7 it was revealed that the rates of only having a high school diploma depended on the region, with a large population in the midwest, south, and Appalachian regions.

Additionally, the data and plots in Figure 5 demonstrated how an increase in population has a positive relationship with crime rate. Therefore, the data indicates that a larger population does have a trend with a higher crime rate.

The EDA also showed the highest and lowest crime rate counties in Table 4 and Table 5. These are St. Louis City, Missouri and Essex County, Vermont for the highest and lowest crime rates respectively. The positive relationship between population and crime rate was reinforced by these tables, as Essex County had a population 51.3 times smaller than that of St. Louis City.

The relationship between crime rate and education level was determined to be, according to this data set in Figure 10, positive for less than a high school diploma and negative for some college or associates degree. For bachelor’s degree or higher and less than a high school diploma, the relationship was flat, indicating no visible relationship with crime rate.

Looking into specific crimes with each education level it was revealed that larceny was the most common crime, and the most non-violent according to the definitions table. The figures revealed the interesting fact that the crimes motor vehicle theft, rape, robbery, and aggravated assault increased with more people with less than a highschool diploma. But, the crimes did not decreases with a higher percentage of more bachelor’s degrees.

A new research question from this interesting finding would be to find out why the rates of those crimes does not decrease or increase with an increased percentage of bachelor’s degrees or higher education.

What affect in general does having a higher than bachelor’s degree education have on motor vehicle theft, rape, robbery, and aggravated assault?

Does education have an impact on these crimes at all?

The next steps are to find another data set that was collected around 2016 on motor vehicle theft, rape, robbery, and aggravated assault and other variables, possibly political leaders. Then it would be interesting to find correlations with education levels and political leaders to see if through politics, these crime rates have a relationship or not.

References

Crime Data link

Johnson, M. (2016). United States crime rates by county. Kaggle. https://www.kaggle.com/datasets/mikejohnsonjr/united-states-crime-rates-by-county/data

Education Data link

U.S. Department of Agriculture (2023). Educational attainment for adults age 25 and older for the U.S., States, and counties, 1970–2021. USDA Economic Research Service.https://www.ers.usda.gov/data-products/county-level-data-sets/county-level-data-sets-download-data/

Definitions of Crimes link

University of Southern California (2023). Clery Crimes and Definitions. USC Department of Public Safety. https://dps.usc.edu/alerts/clery/crime-definitions/