How to properly modify uploadable file types in WordPress

A dark truck missing a front right tire has left gouge marks in an asphalt surface, and the shredded front right tire sits on the ground beside it.

If you work with WordPress long enough, you will upload a file that is not in WordPress’ default list of allowed file types. The file will upload, vanish suddenly, and you’ll be informed:

“Sorry, this file type is not permitted for security reasons.”

Then you’ll search Google for something like “wordpress file type not permitted for security reasons” and see several blog entries from WordPress “experts” urging you to copy and paste a code snippet that leverages the upload_mimes filter into your theme’s functions.php file.

Too bad that’s poor advice. Continue reading “How to properly modify uploadable file types in WordPress”

Fixed Width Text File Toolkit Update

GitHub contributions snapshot

Just a quick note to let you know I’ve made a minor update to my “Fixed Width Text File Toolkit” repo on GitHub.

If someone combined a bunch of fixed width text files (e.g. jan2010.txt, feb2010.txt, etc.) together into one big file (e.g. 2010.txt), that big file would have a series of header rows repeating throughout the data.  Granted, it’s an edge case, but it isn’t completely impossible.

This update makes the FWTF Toolkit “smart enough” to recognize a header row when it is repeated in the middle of the data, and ignore it automatically behind the scenes.  No configuration required, and it has surprisingly little impact on peformance.

Working With Fixed Width Text Files

"Someday" is the busiest day of the week.

Yesterday I attended GovLoop’s “Open Data Event” and was inspired to revisit a project I’d set aside a few months ago . . . to try a different approach.

For those who don’t know, the Fairfax County Police Department posts a weekly “data dump” of the tickets issued and arrests made.  I know Fairfax Underground has created a search engine based around a collection of this data, but it’s little more than a novelty item to see if anyone you know (e.g. friend, neighbor, blind date, etc.) has been arrested or ticketed.  I haven’t seen anyone plot the original data set on a Google Map, let alone mashup with other data sets such as real estate prices or school test scores. It’s become a curiosity for me, to see if something more significant can be done with that information.

One frustrating challenge with this particular data file is that the data is stored in a fixed width text file— a very generic format, but not as immediately useful as XML or JSON, for instance.  You have to write code that tells your application to use characters 1-40 for column 1 data, use characters 41-60 for column 2 data, and so on . . . plus you have to hard code the names and quantity of the data fields, and that just didn’t seem like a particularly robust situation for me.

I wanted to create a php function that could, for lack of a better term, “read and analyze” any fixed width text file and return the names of the fields as well as the start positions for each field dynamically. I’ve written before about the struggle to come up with reusable algorithms that we can apply to different variations of the same basic principles, but this one took longer than I thought.

I won’t rehash the different approaches and failures, and jump to the approach that finally worked for me.

 /* let's open our fixed width text file */
 $handle = @fopen("arrests.txt", "r");

 if ($handle) {
 // read our first line
 $headerRow = fgets($handle, 4096);
 
 /* parse for header row names, using TWO
  or more spaces for RegEx pattern */
 $headerRowNames = preg_split('/(?:\s\s+|\n|\t)/', $headerRow, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_OFFSET_CAPTURE);

// display array contents for examination
 echo "<pre>";
 var_dump($headerRowNames);
 echo "</pre>";

Basically, we grab the first line of the file and use a regular expression of two or more consecutive spaces to figure out where one header row name ends and the next one begins. Since one of the header rows in my sample file has a space in it (i.e. Charge Description), we can’t use a single space as the delimiter. NOTE: There is an edge case where a header row name could be so long that only a single space separates it from the next header row name; so it’s not 100% “bulletproof”, but this was as close I could get to a solution with only one evening of work.

Here are the results when we display the array:

array(8) {
  [0]=>
  array(2) {
    [0]=>
    string(5) "LName"
    [1]=>
    int(0)
  }
  [1]=>
  array(2) {
    [0]=>
    string(5) "FName"
    [1]=>
    int(40)
  }
  [2]=>
  array(2) {
    [0]=>
    string(5) "MName"
    [1]=>
    int(60)
  }
  [3]=>
  array(2) {
    [0]=>
    string(3) "Age"
    [1]=>
    int(100)
  }
  [4]=>
  array(2) {
    [0]=>
    string(7) "DateArr"
    [1]=>
    int(105)
  }
  [5]=>
  array(2) {
    [0]=>
    string(6) "Charge"
    [1]=>
    int(135)
  }
  [6]=>
  array(2) {
    [0]=>
    string(15) "Charge Descript"
    [1]=>
    int(160)
  }
  [7]=>
  array(2) {
    [0]=>
    string(7) "Address"
    [1]=>
    int(210)
  }
}

This gives us a multidimensional array with both the field names contained in the header row and their offset positions, which will certainly come in handy as we process the remaining lines of the file. Time permitting, I’d like to drop this as a method into a php class, alongside methods to convert the extracted data into either a JSON or XML file. That way, I’ll have a utility class that can add value to other fixed width text files in the future.