Historically, one of the nice things about Unix and Linux is that everything is a file, and files are just sequences of characters. Of course, modern practice is that everything is not a file, and there is a proliferation of files with some imposed structure. However, if you’ve ever worked on old systems where your file access was by the block, you’ll appreciate the Unix-like files. Classic tools like awk
, sed
, and grep
work with this idea. Files are just characters. But this sometimes has its problems. That’s the motivation behind a tool called Miller, and I think it deserves more attention because, for certain tasks, it is a lifesaver.
The Problem
Consider trying to process a comma-delimited file, known as a CSV file. There are a lot of variations to this type of file. Here’s one that defines two “columns.” I’ve deliberately used different line formats as a test, but most often, you get one format for the entire file:
Slot,String A,"Hello" "B",Howdy "C","Hello Hackaday" "D","""Madam, I'm Adam,"" he said." E 100,With some spaces! X,"With a comma, or two, even"
Continue reading “Linux Fu: Miller The Killer Makes CSV No Pest”