Procedure To Clean Fairfax County Police Department Ticket/Arrest Data

a sticker depicting a fog aerosol can has been placed on a faded out parking notice sign.

The weekly ticket/arrests data file for Fairfax County Police Department has changed quite a bit since 2015. For starters, it is no longer uploaded weekly, contains both tabs and spaces, and appears to be cumulative over the entire year to date. Since the file is no longer a fixed-width text file, it made more sense to come up with a clean up procedure which converted the hybrid file into a pure tab-separated file.

Note: This procedure was based published on July 25th, 2017. If new inconsistencies are introduced in subsequently published files, it may be necessary to revise this procedure.

1) Open the most recently downloaded arrest.txt file in Microsoft Notepad.

2) Replace all tabs with [IAMATAB].

(We’re replacing real tabs with virtual ones)

3) Replace all “[IAMATAB] [IAMATAB]” with “[IAMATAB][IAMATAB]”

(This removes tabs separated by a single space in addresses)

4) Replace all ” [IAMATAB]” with “[IAMATAB]”

(You repeat this step dozens of times until all blank spaces BEFORE virtual tabs have been removed).

5) Replace all “[IAMATAB] ” with “, ”

(A virtual tab with a single trailing space gets transformed into a comma with a trailing space)

6) Replace all “[IAMATAB]” with real tabs.

7) Use “Save As” feature to create a copy, such as “arrest-tabbed.txt”, which you should be able to open in your preferred spreadsheet program.

NOTE: This is almost perfect, but there are some inconsistencies which cannot be addressed.

1) Addresses with apartment numbers in them.

Some of the apartment numbers are flanked by tabs, while others are flanked with trailing spaces within the tabs. This inconsistency makes it difficult to deal with at the text processing level, but can be addressed with a spreadsheet formula which concatenates two columns simply enough.

2) Missing header row

This will have to be manually added to the file after it is converted. Again, this is easily done in a spreadsheet program.

3) Incorrect Charge Notation

Some charges were converted into date format (e.g. Drunk in Public, which is §5-1-1, became 5/1/2001) in the FCPD original document, and then subsequently exported. In other words, it’s broken before we even get our hands on it. Since it matches date formats, it’s tricky to modify it without accidentally changing actual dates in our file.

Secret User Agent Man!

torn up fortune cookie message

I was creating a PHP helper class to leverage the NPS Data API recently, and stumbled across an odd situation where I could successfully retrieve the API’s JSON result in Google Chrome with the ModHeader extension— but PHP’s file_get_contents function would fail with a Server 500 error.

After several  Google searches, I stumbled across this post on Stack Overflow which got to the heart of the problem. It turns out PHP’s default user agent string for file_get_results is blank, and some web servers are configured to ignore requests with empty user agent strings. Once I added an Agent String in my PHP’s header request, it started to return results as expected.