Procedure To Clean Fairfax County Police Department Ticket/Arrest Data

The weekly ticket/arrests data file for Fairfax County Police Department has changed quite a bit since 2015. For starters, it is no longer uploaded weekly, contains both tabs and spaces, and appears to be cumulative over the entire year to date. Since the file is no longer a fixed-width text file, it made more sense to come up with a clean up procedure which converted the hybrid file into a pure tab-separated file.

Note: This procedure was based http://www.fairfaxcounty.gov/police/crime/arrest.txt published on July 25th, 2017. If new inconsistencies are introduced in subsequently published files, it may be necessary to revise this procedure.

1) Open the most recently downloaded arrest.txt file in Microsoft Notepad.

2) Replace all tabs with [IAMATAB].

(We’re replacing real tabs with virtual ones)

3) Replace all “[IAMATAB] [IAMATAB]” with “[IAMATAB][IAMATAB]”

(This removes tabs separated by a single space in addresses)

4) Replace all ” [IAMATAB]” with “[IAMATAB]”

(You repeat this step dozens of times until all blank spaces BEFORE virtual tabs have been removed).

5) Replace all “[IAMATAB] ” with “, ”

(A virtual tab with a single trailing space gets transformed into a comma with a trailing space)

6) Replace all “[IAMATAB]” with real tabs.

7) Use “Save As” feature to create a copy, such as “arrest-tabbed.txt”, which you should be able to open in your preferred spreadsheet program.

NOTE: This is almost perfect, but there are some inconsistencies which cannot be addressed.

1) Addresses with apartment numbers in them.

Some of the apartment numbers are flanked by tabs, while others are flanked with trailing spaces within the tabs. This inconsistency makes it difficult to deal with at the text processing level, but can be addressed with a spreadsheet formula which concatenates two columns simply enough.

2) Missing header row

This will have to be manually added to the file after it is converted. Again, this is easily done in a spreadsheet program.

3) Incorrect Charge Notation

Some charges were converted into date format (e.g. Drunk in Public, which is §5-1-1, became 5/1/2001) in the FCPD original document, and then subsequently exported. In other words, it’s broken before we even get our hands on it. Since it matches date formats, it’s tricky to modify it without accidentally changing actual dates in our file.