As both a data junky and someone with a perverse fascination with crime statistics, I was naturally drawn to the Uniform Crime Reports that the FBI and DoJ put out every year. Not only do they list arrests by type (with a sort of bizarre level of granularity, actually), but they provide it at the county level, and have done so since 1977. “County-Level” may sound a little broad, but it’s actually a really solid geographic unit for crime data because of its geographical contiguity. One could map of all of the counties in the US, from TIGER files provided by the Census Bureau, and every inch of the country would have relatively reliable offense counts going back 36 years.
But of course, it wouldn’t be government-provided data if it weren’t just a little obtuse. The data is freely available to the public, but you do, however, have to jump through a ton of hoops to get all of it for every year. Also, year-to-year, the data is provided in varying formats. Sometimes it comes in SAS, but it more consistently comes as a flat, non-delimited ASCII text file. And every with every year you get a new codebook, which tells you what each column position in the text file means. I’m not really into spending money on SAS software, so I’m getting down to parsing out each year’s data into a more comprehensible data structure (e.g. JSON) by referencing each year’s code book. You know– for posterity.
It’s also worth noting that Joseph Targonski at the University of Chicago did a fantastic and comprehensive review of these reports in 2012, and “reexamined and recoded missing data in the Uniform Crime Reports (UCR) for the years 1977 to 2000 for all police agencies in the United States.” The only shortcoming of Targonski’s review, for my purposes, is that his recoding drops the granularity of offense types. I have in mind a project that tracks different types of drug offenses over time, so Targonski’s work, as heroic and exhaustive as it is, might not suite my purposes.
I’m currently working on this project over at the ucr-county github repo. By all means help me curate this. Oh, and if anyone knows what the hell happened to 1993, or better yet has the data, hit me up.