This version of the Cookbook (updated to 2021-03-17) and the first 150 posts in the companion blog BASHing data (to 2021-03-24) have been archived in Zenodo and can be downloaded for offline use. Links between the Cookbook and the blog are all local in the archived versions, so you can use both resources without needing to go online.

About this website

A Data Cleaner's Cookbook went online on 23 October 2016. I corrected and updated it frequently over the next three years. At the end of 2019 I began re-organising the site and adding new recipes and examples from the companion blog, BASHing data. The current version of the Cookbook first appeared on 13 January 2020.

If you find mistakes on this website or have suggestions for better recipes, please email me.

Robert Mesibov, West Ulverstone, Tasmania, Australia
Latest update: 2021-03-17

About the companion blog

On the BASHing data blog I write about

The blog posts have more examples and more background information than the Cookbook. If you like data work, keep up with the blog through its RSS feed.

About me

I'm a data auditor and retired scientist, and I've been working with data tables for nearly 50 years. I started with printed columns on paper (and a calculator) before moving to spreadsheets and relational databases (Microsoft Access, Filemaker Pro, MySQL, SQLite).

In 2012 I discovered the AWK language and realised that every processing job I had ever done with data tables could be done faster and more simply on the command line. Since then my data tables have been stored as plain text and managed with command-line tools, especially AWK.

In case you're wondering "Which Linux?", I run MX on my desktop and my work laptop. For years I ran Debian (stable) Xfce, then the antix and Mepis communities put together MX. It's Debian (stable) Xfce nicely supplemented — but not overloaded — with handy new utilities and a solid selection of apps, and still very fast. I highly recommend MX as an all-purpose Linux distro.

Contact me directly if you would like a quote on a data auditing or data cleaning job. Here in Australia, I'm also happy to quote on training data workers (in person) in command-line methods.

About the banner image

The webpage banner shows a detail from a painting by the 17th-century Flemish artist David Rijckaert III. I like the look of concentration on the alchemist's face as he refers to a text. Working with the command line isn't alchemy, but sometimes it seems like magic.

Legal stuff

The text and images on this website are my own work and are copyright under a Creative Commons Attribution-NonCommercial 4.0 International License. You are welcome to use or copy the information and images on this website for non-commercial purposes, but please attribute that use to this source.

Please note that the software commands on this website are provided "as is", without warranty of any kind, express or implied, including fitness for particular purposes. In no event shall the website author be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software commands on this website.