If you have a question you can email me at I have a lot of data sets available (and some overlap in variables) so please link the openICPSR URL of the data set you have questions about. That'll help me know which data you are asking about and which version of that data you are using (I recommend using the latest version available).

The short answer is no. The long answer is that it takes a very long time to clean data that I would like to add to the site and then make a page to display that data. I will likely keep this site updated but can't guarantee making any changes or adding any specific data just due to the time it takes and the lack of time I have now. For data already on the site I'll make sure to update to the latest year of data available (if I don't please email me to remind me!).


This data is from the Apparent Per Capita Alcohol Consumption: National, State, and Regional Trends 1977-2016 report produced by Sarah P. Haughwout and Dr. Megan E. Slater at the National Institute on Alcohol Abuse and Alcoholism. The Data page of this site has a link to download he data. For the original report, click here

For a complete methodology please see the actual report here. The authors determined the amount of alcohol consumed (originally measured in gallons of ethanol - pure alcohol) through sales data or tax information for each state-year. Population data for how many people age 14 and up live in each state was acquired from the CDC' WONDER data. Ethanol was divided by population to determine per capita consumption.

The original report provides an equation to convert the amount of total ethanol consumed into a "total drinks" variable so I used that to make that variable. For the individual drink categories (beer, shots of liquor, and glasses of wine), it provides an equation to convert the amount of ethanol consumed into the amount of alcohol. I then converted this to number of drinks for each category based on the National Institute of Health's page saying how many ounces of alcohol make up a drink for those categories. Therefore, the sum of the categories is slightly different than the "total drinks" column.


This data is from the FBI's Arrests by Age, Sex, and Race data which is part of the Uniform Crime Reporting (UCR) Program. This data provides the number of arrests that occurred in a city in any given year and breaks that down by age (adult or juvenile), race, and gender.

I try to keep data on this site consistent with the data I have published on openICPSR and I call it 'cannabis' there. I do so only because the programming language Stata limits columns to 32 characters and 'cannabis' is a shorter word than 'marijuana'.

This data does not show arrests broken down by if the arrestee is Hispanic or not.

Border Patrol

See here.


This data is from the FBI's Offenses Known and Clearances by Arrest data which is part of the Uniform Crime Reporting (UCR) Program. This data provides the number of crimes that occurred in a city in any given year and how many of those crimes were cleared.

Negative values reflect adjustments of previous returns. So if the agency reports, for example, a burglary in January then in February discovers that it wasn't actually a burglary, they will record that both as 1 unfounded burglary and -1 actual burglary in February. This is so it "deletes" the erroneously recorded burglary from January. See more on pages 82-83 of the FBI's Manual for UCR data.

Starting in 2013, rape has a new, broader definition in the UCR to include oral and anal penetration (by a body part or object) and allow men to be victims. The previous definition included only forcible intercourse against a woman. As this revised definition is broader than the original one, more rapes are reported ( social changes may also be partly responsible as they could encourage victims to report more). This definitional change makes post-2013 rape data non-comparable to pre-2013 data.

This data doesn't differentiate between a "real zero" and a "not reported zero". If an agency doesn't report any crimes (even if crimes did occur), the data will say zero crimes occurred. Even though the data indicates how many months of the year that agency reported, that doesn't necessarily mean that they reported fully. An agency that reports all 12 months of the year may still report only incomplete data. Agencies can report partial data each month and still be considered to have reported that month. Chicago, for example, reports every month but until the last few years didn't report any rapes.

No, this data only includes the most serious crime in an incident (except for motor vehicle theft which is always included). For incidents where most the one crime happens (for example, a robbery and a murder), only the more serious (murder in this case) will be counted. This is called the Hierarchy Rule. See more on pages 10-12 of the FBI's Manual for UCR data which details the Hierarchy Rule.

Though the Hierarchy Rule does mean this data is an undercount, data from other sources indicate it isn't much of an undercount. The FBI's other data set, the National Inicident-Based Reporting System (NIBRS) contains every crime that occurs in an incident (i.e. it doesn't use the Hierarchy Rule). Using this we can measure how many crimes the Hierarchy Rule excludes (Most major cities do not report to NIBRS so what we find in NIBRS may not apply to them). In over 90% of incidents, only one crime is committed. Additionally, when people talk about "crime" they usually mean murder which, while incomplete to discuss crime, means the UCR data here is accurate on that measure.

A major limitation (in my opinion the most important limitation) to the data here is that it doesn't include crimes not reported to police. Based on victimization surveys that ask people both if they were victimized and if they reported that crime, we know that the majority of crimes are not reported. This probably won't matter when looking at a single city for a short period of time - the population won't change too much so even underreporting of crime will be consistent underreporting. The issue becomes serious when looking at a city with major population changes or comparing multiple cities as their population may have very different reporting practices. There's no easy solution here but it is an important aspect of understanding crime data that you should keep in mind. For a full breakdown of reporting rates broken down by crime and a number of characteristics about the crime and victim (and reasons for not reporting), see Tables 91-105 (pages 98-114) in this report on the National Crime Victmization Survey from 2008.

Using the rate helps deal with population changes that could lead to changes in crime merely because of that change but it isn't without its drawbacks. The main drawback with using a rate is that it assumes equal risk of victimization, which we know isn't correct. For example, when looking at rape, a crime that affects 6 times as many women as men (according to the 2016 National Crime Victimation Survey Table 6, page 9), yet the rate is based on total population in that city (the UCR does not differentiate victims by gender but other data sets, such as NIBRS do, allowing for better rates.). Other crimes require even more granular rates. Murder victims are predominantly young men, but this differs by type of murder - domestic violence victims are mostly women. Also, consider that population comes from those who live in the city and doesn't include people like tourists or people who work in that city but live elsewhere yet can still be victimized in the city. So while rates are probably better than counts as it lets you control for population, consider exactly who that population is, and how risk changes within that population.

Index crimes (sometimes called Part I crimes) are a collection of eight crimes often divided between Violent Index Crimes (murder, rape, robbery, and aggravated assault (assault with a weapon or causing serious bodily injury)) and Property Index Crimes (burglary, theft, motor vehicle theft, and arson (however arson is not available in this data set)). When people discuss "crime" they are often referring to this collection of crimes. One major drawback of this is that it gives equal weight to each crimes. For example, consider if New York City has 100 fewer murders and 100 more thefts this year than last year (and all other crimes didn't change). Their total index crimes would be the same but this year would be far safer than last year. For complete definitions of each crime, please see the FBI's definitions page.

The biggest problem with index crimes is that it is simply the sum of 8 (or 7 since arson data usually isn't available) crimes. Index crimes have a huge range in their seriousness - it includes both murder and theft.This is clearly wrong as 100 murders is more serious than 100 thefts. This is especially a problem as less serious crimes (theft mostly) are far more common than more serious crimes (in 2017 there were 1.25 million violent index crimes in the United States. That same year had 5.5 million thefts.). So index crimes undercount the seriousness of crimes. Looking at total index crimes is, in effect, mostly just looking at theft.

This is especially a problem because it hide trends in violent crimes. San Francisco, as an example, has had a huge increase in index crimes in the last several years. When looking closer, that increase is driven almost entirely by the near doubling of theft since 2011. During the same years, violent crime has stayed fairly steady. So the city isn't getting more dangerous but it appears like it is due to just looking at total index crimes.

While many researchers divide index crimes into violent and nonviolent categories, which helps but even this isn't entirely sufficient. Take Chicago as an example. It is a city infamous for its large number of murders. But as a fraction of index crimes, Chicago has a rounding error worth of murders. Their 653 murders in 2017 is only 0.5% of total index crimes. For violent index crimes, murder makes up 2.2%. What this means is that changes in murder are very difficult to detect. If Chicago had no murders this year, but a less serious crime (such as theft) increased slightly, we couldn't tell from looking at the number of index crimes.


This data comes from the Center for Disease Control and Prevention's (CDC) WONDER data and provides the number of deaths for several cause of death categories for each state.

The following is the CDC's definition of age-adjusted rates from this page.

The rates of almost all causes of disease, injury, and death vary by age. Age adjustment is a technique for "removing" the effects of age from crude rates so as to allow meaningful comparisons across populations with different underlying age structures. For example, comparing the crude rate of heart disease in Florida with that of California is misleading, because the relatively older population in Florida leads to a higher crude death rate, even if the age-specific rates of heart disease in Florida and California were the same. For such a comparison, age-adjusted rates are preferable.

The CDC does not report death counts when there are fewer than 16 deaths in that category. They do this both for confidentiality of the deceased and to avoid the misuse of rates caused by such a small numerator.


This data is from the FBI's Law Enforcement Officers Killed and Assaulted (LEOKA) data which is part of the Uniform Crime Reporting (UCR) Program. This data provides information about how many employees (civilian and officers) are at a given agency. It also says how many officers were assaulted for a number of different categories of assault.

Prior to 1971 the data did not breakdown employees by gender. The years 1960-1970 put the number of total employees in the male employees column (and a value of 0 in the female employees column).


The three categories that say the inmate's Most Serious Charge come from the National Corrections Reporting Program (NCRP) which provides data on how many people are incarcerated, admitted, or released from prison that year. This is divided by the most serious crime they are convicted of, race/ethnicity, and gender. All other categories are from the National Prisoner Statistics (NPS) data which has different information than the NCRP and more years available. Unlike the NCRP, the NPS has totals for the federal prison system, the state prison system, and the combined US as a whole. Some states and some years do not have information for some variables so you will likely see many missing values in this data.

All of the population data comes from the United States Census. For the years 2001-2016, I use the annual American Community Survey which is a census data set that samples 1% of the population. For the other years I use the decennial census and linearly impute for the years between the censuses. As such, please be aware that these population values are only estimates.

I included this because most people incarcerated in prison are between these ages. However, not all are in these age groups meaning that this is almost certainly an over estimate. As such you should use the rates as estimates, NOT precise rates.

As per the National Prisoner Statistics codebook, available to download here

As states and the Federal Bureau of Prisons increased their use of local jails and interstate compacts to house inmates, NPS began asking states to report a count of inmates under the jurisdiction or legal authority of state and federal adult correctional officials in addition to their custody counts. Since 1977, the jurisdiction count has been the preferred measure. This count includes all state and federal inmates held in a public or private prison (custody) and those held in jail facilities either physically located inside or outside of the state of legal responsibility, and other inmates who may be temporarily out to court or in transit from the jurisdiction of legal authority to the custody of a confinement facility outside that jurisdiction. The difference between the total custody count and the jurisdiction count was small (approximately 7,000) when both were first collected in 1977. As more states began to report jurisdiction counts and more states began to rely on local and privately operated facilities to house inmates, the difference increased. At yearend 2016 the jurisdiction population totaled 1,506,800 while the custody population totaled 1,293,887.


All of this data comes from the Department of Education Office of Postsecondary Education which collects crime data from colleges and releases them publicly. Their website is here. While their site does allow you to look at a single school's data, it only shows the prior three years and only as tables. For a comprehensive look at the data codebook, please see their PDF here

As per the Department of Education definitions, available here

Not on Campus: (1) Any building or property owned or controlled by a student organization that is officially recognized by the institution; or (2) Any building or property owned or controlled by an institution that is used in direct support of, or in relation to, the institution's educational purposes, is frequently used by students, and is not within the same reasonably contiguous geographic area of the institution.
On Campus - Total: (1) Any building or property owned or controlled by an institution within the same reasonably contiguous geographic area and used by the institution in direct support of, or in a manner related to, the institution's educational purposes, including residence halls; and (2) Any building or property that is within or reasonably contiguous to paragraph (1) of this definition, that is owned by the institution but controlled by another person, is frequently used by students, and supports institutional purposes (such as a food or other retail vendor).
On Campus - Student Housing: Any student housing facility that is owned or controlled by the institution, or is located on property that is owned or controlled by the institution, and is within the reasonably contiguous geographic area that makes up the campus is considered an on-campus student housing facility.
Public Property: All public property, including thoroughfares, streets, sidewalks, and parking facilities, that is within the campus, or immediately adjacent to and accessible from the campus.
Total: This is the sun of Not on Campus, On Campus - Total, and Public Property.

There are different rules for which offenses are included when an offense is committed and a person is arrested so these categories do not necessarily overlap.

There are different rules for which offenses are included when an offense is committed and a disciplinary actions are taken so these categories do not necessarily overlap. As sexual offenses are not included in the required categories for disciplinary action, they is not available in the data.

No, if a person is arrested and then given disciplinary actions by the scohol, only the arrest is counted.

This is when a person is referred to the school for a "disciplinary action" though the action does not need to actually take place and the data does not specify which action is referred or the outcome of that referral .

  • Sexual Offense - Forcible is the sum of rape and fondling.
  • Sexual Offense - Non-forcible is the sum of incest and statutory rape.
  • Sexual Offense - Total is the sum of Sexual Offense - Forcible and Sexual Offense - Non-forcible.
For definitions of each individual crimes please see the Department of Education's codebook here

No, the hate crime data unwent a series of changes in how the data was collected. The crimes theft, intimidation, and vandalism/destruction of property only started being reported in 2009. Starting in 2014, "gender identity" was added as a possible bias motivation while in the same year the "ethnicity or national origin" bias motivation was split into either "ethnicity" or "national origin" bias motivations. This means that you should be cautious when looking at total hate crime changes as certain crimes/bias motivations were not included until recently.

This data set did not collect information on the number of rape, fondling, incest, or statutory rape crimes until 2014. Instead, it grouped rape and fondling as Sexual Offense - Forcible, and incest and statutory rape as Sexual Offense - Non-forcible.

As per the Department of Education definitions, available here

Dating Violence: Violence committed by a person who is or has been in a social relationship of a romantic or intimate nature with the victim. The existence of such a relationship shall be determined based on the reporting party’s statement and with consideration of the length of the relationship, the type of relationship, and the frequency of interaction between the persons involved in the relationship. For the purposes of this definition—
  • Dating violence includes, but is not limited to, sexual or physical abuse or the threat of such abuse.
  • Dating violence does not include acts covered under the definition of domestic violence.
Domestic Violence: A felony or misdemeanor crime of violence committed—
  • By a current or former spouse or intimate partner of the victim;
  • By a person with whom the victim shares a child in common;
  • By a person who is cohabitating with, or has cohabitated with, the victim as a spouse or intimate partner;
  • By a person similarly situated to a spouse of the victim under the domestic or family violence laws of the jurisdiction
in which the crime of violence occurred, or by any other person against an adult or youth victim who is protected from that person’s acts under the domestic or family violence laws of the jurisdiction in which the crime of violence occurred.
Stalking: Engaging in a course of conduct directed at a specific person that would cause a reasonable person to—
  • Fear for the person’s safety or the safety of others; or
  • Suffer substantial emotional distress.

Using rates is useful as it removes the important influence of the number of people at that school, but has its own serious limiations. Schools with similar number of students may still be very different in their student population and risk of victimization. Consider, for example, two schools which each have 20,000 students. If these two schools are very similar in students, then the rate per 1,000 students could be useful in comparing the schools are the groups are similar. If, however these schools differ on factors such as if the school is urban, whether students commute or live on campus, ages of students, etc, then knowing purely the number of students is not a very useful rate. Also consider that crimes can occur against victims other than students such as faculty or staff so a per 1,000 student rate would overestimate crime by decreasing the denominator.

Like all crime data, this data has a limitation as it is reported offenses only. If likelihood of reporting changes, that will be reflected in changes of reported offenses but we will not be able to tell (based only on this data) whether it was the number of crimes or the likelihood of reporting that changed. This is especially a problem with sexual offenses as they are already were unlikely to be reported and small changes in reporting likelihood can cause a seemingly large change in crimes reported. Also keep in mind that the population included (primarily college students) may have different reporting likelihoods than other populations.