CSV Isn’t Versioned — Risk Being Incompatible With Yourself

dataflow Add comments

Consider a simple example:


Bloggins, Scott, "+1-212-555-1212", Engineer, 95060
Clark, Allan, "+1-424-242-2668", FAE, 98107

OK, so that’s fine. Obviously, we wanted a name, number, job code, and zipcode, and when we parse that, we simply say:
lastname = $1
firstname = $2
phone = $3
Job = $4
zipcode = $5

…but shoot, in version 2.0.2, we needed to paste the street address (because we’ll use zipcode to figure out the city/state). Simple enough, we’ll just stick that in:


Bloggins, Scott, "+1-212-555-1212", Engineer, 100 Enterprise Way, 95060
Clark, Allan, "+1-HA-HA-HA-Boot", FAE, 2237 Starbucks Street, 98107

That’s fine, but now the parsing is broken — for example, Scott Bloggins:
Job = Engineer
zipcode = “100 Enterprise Way”

waitaminute. That got all screwed up, and CSV cannot indicate its version number (yes, a commented pre-amble has been discussed, and has screwed up parsers already — abandon all hope yet who there enters)

OK, now we’ll get around that by saying “well, if there are six entries, we’ll treat it like v2.0.2, but 5 entries, v2.0.1”:

Job = $4
if (NF > 5); then
  address = (null)
  zipcode = $5
  address = $5
  zipcode = $6

Tell me that doesn’t get cumbersome soon; besides, it ignores optional content, so if anything is skipped, you eventually have:

Bloggins, Scott,,,,,,96050

Sounding a bit like Clint Eastwood: “Did I type 12, or 17 commas? Do you feel lucky, punk? Do ya?”

This is why XML was invented, has version numbers, optional content, and rock-solid parsing. There’s libraries for this, and the schema is obvious when reading it, plus it still compresses nicely (plaintext with repeated syntactical sugar).

Leave a Reply

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in