This site makes it easy for people to analyze crime-related data. Select a page in the Data Tool dropdown to look at the type of data you're interested in. Most of the data on this site is also available to download in full, and for free use (with citation) on openICPSR. There are links to the data download page on the Data tab of this site.

This site is run entirely by me, Jacob Kaplan. I earned both my Master's degree and PhD in the crimininology department at Penn. My primary research focuses on using quasi-experimental designs to quantitatively analyze criminal justice policies. Lately, my research portfolio has been focused on the criminology of place, specifically on how physical security devices - such as outdoor lights - can affect crime. I've also studied a range of topics including whether more police officers can reduce crime, simulating whether firing "bad apples" will substantially reduce complaints against the police, examining public perceptions of the accuracy of forensic evidence, and how decriminalizing marijuana affects serious domestic violence.

I also have significant experience in the programming language R and have written the introductory R book Crime by the Numbers to teach others. I have written several R packages including ones that create dummy rows and columns, predict race from surnames, handle the FBI's crime data API, and read a common data format for government data - fixed-width ASCII files - into R.

If you have a question you can email me at jkkaplan6@gmail.com. I get a lot of requests and unfortunately can't respond to all of them, especially around data quality issues in the underlying data. I will not respond to questions sent to my Penn email.  Questions that you would ask a co-author - such as how to properly handle missing data or how I would approach a research problem - are generally outside the scope of a question that I'll answer. If you require more detailed help than an brief email response or are interested in working with me on data or research, email me to discuss the project, timeline, and other relevant information. Make sure your email specifies what the project is, what part I would do, who else is involved, and a timeline for project completion (or at least my part of the project). 

Yes. It is good practice to cite the data that you use in a project or a paper. For the data that I have released on openICPSR, the terms of use that you agree to while downloading the data require that you cite the data if you use it. Specifically, it states "You agree to reference the recommended bibliographic citation in any publication that employs resources provided by ICPSR." To cite the data, use the citation that is under the "Project Citation:" header on the openICPSR page for the data. For most of the data you can also look on Google Scholar to find the BibTeX, EndNote, RefMan, or RefWorks citation shortcuts there.

No.

No.

Yes. For a number of reasons (unrelated to the quality of the data) I have removed public access to these data.

Some of the data available is very detailed and includes a number of variables about individuals such as the victim and offender of a crime. It may be possible to use this to identify the individuals. Do not do this. It is irresponsible and unethical to try to identify people from this data. It is also against the data use agreement you agree to on openICPSR to download the data. Specifically, it states that "by downloading these data, you hereby agree: " To not use these datasets for investigation of specific RESEARCH SUBJECTS, except when identification is authorized in writing by ICPSR (help@icpsr.umich.edu) [and] To make no use of the identity of any RESEARCH SUBJECT discovered inadvertently, and to advise ICPSR of any such discovery (help@icpsr.umich.edu)". Again, do not even attempt to try to identify any individuals from these data. Identifying individuals may lead to interesting research and papers - that is no excuse, don't do it.

Common data questions/issues

I have already posted my reasons for why this data should not be used in research here. For more extensive documentation on the data's issues, please see this paper by Maltz and Targonski from 2002.

Each year of UCR/NIBRS data has different agencies reporting. The overall trend is that more agencies report over time, though this is not always true (e.g. Florida stopped reporting most data in the 1990s). So if you have data that covers more than one year, make sure you are only analyzing the same agencies each year. Even within the same year some agencies don't report for 12 months, meaning they are not comparable to agencies that do. This likely means removing agencies that don't report every month of every year of your study period.

Negative numbers can happen in UCR data and are correct, not a data issue. UCR data is reported by police monthly and if they discover that a previous month was incorrect they don't alter the incorrect month to remove that reported crime, they report a negative crime in the current month so the annual crime count will be correct. For example, consider an agency that reports a burglary in January and then in June discover that the burglary didn't actually occur. In June they will report -1 burglaries. In practice it's rare to see negative numbers since the number of actual crimes in a month often outweights the number of corrections, but it is possible and is not an error. Converting negative numbers to missing values is not correct. See more on pages 82-83 of the FBI's Manual for UCR data.

 

Like all data, the data on this site and available to download on openICPSR has flaws. This is especially true of the FBI's UCR and NIBRS data available here. The main problems with the data is that it only counts reported crime and that not all agencies report (and those that do may not report all year). However, these are only the technical problems with the data - the bigger issue is in how people use it. Each of these issues can be solved - or at least avoided in conducting research - and the nuances of each dataset can be handled. However, many published articles that I've read either do not address these issues or even acknowledge that they exist. There is excellent documentation on these datasets in academic articles and in the FBI's manual for the data. If you intend to use this data you should read these documents (please don't ask me for links to them beyond what I already provide on this page) carefully and spend time exploring the data yourself. Simply running a regression on the data without fully understanding the data is bad research.

If you believe that you found an error in one of the datasets, please read the FBI's manual on that data to ensure it's not a known issue or is not an error at all. Common incorrect beliefs that something is erroneous are negative numbers in UCR data or Florida not appearing for many years in UCR data. 

I release the data as .dta (Stata) files which has a 32 character limit for column names. Therefore I sometimes have to abbreviate column names to meet this limit. Most misspellings you see in the column names are intentional to meet this requirement.

FIPS codes are US Census unique identifiers (within state) for geographic areas including state, county, and "place" (i.e. city). They are useful when merging with other datasets such as the US Census or other government datasets. These identifiers are not in the UCR data originally so I added them by merging the UCR data to the Law Enforcement Agency Identifiers Crosswalk (LEAIC) that NACJD produced (for more info on this dataset please see the Learning Guide on it that I made for NACJD Here). I merged the LEAIC and UCR data by matching on the ORI (unique agency identifier code) variable.

Data Available for Download


The data available here are the various data sets I have collected and cleaned. Most of these are related to crime, though some are merely government data or data I am interested in. If you use this data you must cite it.


Most of the data sets used for this site have more variables than available here. For a full description of what is available in each data set, click the link to go to the download page on openICPSR. On that page I include a description of the data and what I did to clean it.


Click on the data set name to view citation and link to openICPSR page.


Kaplan, Jacob. Annual Survey of Public Employment & Payroll (ASPEP) 1992-2016. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E101399V6

Kaplan, Jacob. Annual Survey of State Government Finances 1992-2018. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E101880V4

Kaplan, Jacob. Apparent Per Capita Alcohol Consumption: National, State, and Regional Trends 1977-2018. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E105583V5

Kaplan, Jacob. California Department of Corrections Data. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E108602V2

Kaplan, Jacob. California Jail Profile Survey 1995-2020. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E104560V7

Kaplan, Jacob, Hoyos-Torres, Sebastian, Gur, Oren, Concannon, Connor, and Jones, Nick. Coronavirus (COVID-19) in Prisons in the United States, April - June 2020. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2020-06-26. https://doi.org/10.3886/E119901V1

Kaplan, Jacob. Texas Commission on Jail Standards Data 1992-2017. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2018-07-14. https://doi.org/10.3886/E104643V1

Kaplan, Jacob. U.S. Customs and Border Protection Statistics and Summaries. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E109522V5

Kaplan, Jacob. Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1974-2018. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E102263V11

Kaplan, Jacob. Uniform Crime Reporting (UCR) Program Data: Arson 1979-2018. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E103540V9

Kaplan, Jacob. Uniform Crime Reporting (UCR) Program Data: Human Trafficking 2013-2019. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-21. https://doi.org/10.3886/E117974V3

Kaplan, Jacob. Uniform Crime Reporting Program Data: Law Enforcement Officers Killed and Assaulted (LEOKA) 1960-2018. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E102180V10

Kaplan, Jacob. Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest, 1960-2019. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E100707V16

Kaplan, Jacob. Uniform Crime Reporting (UCR) Program Data: Property Stolen and Recovered (Supplement to Return A) 1960-2018. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E105403V6

Kaplan, Jacob. Uniform Crime Reporting (UCR) Program Data: Supplementary Homicide Reports, 1976-2018. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E100699V10

Kaplan, Jacob. United States Governors 1775-2020. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2021-01-16. https://doi.org/10.3886/E102000V3