This is the second series (2024 >) of the BASHing data blog. The first series of 200 posts (2018-2022) and this one are companion websites to A Data Cleaner's Cookbook. Like the first series, the current blog is a place for demonstrations and trials of command-line data "ops". The operations might include analysing, archiving, auditing, cleaning, de-duplicating, encoding, entering, migrating, querying, reformatting, reporting, storing etc.

The first BASHing data series and A Data Cleaner's Cookbook are still online, but they are also archived in Zenodo and can be downloaded for offline use.

I'm a data auditor and retired zoologist.

Robert Mesibov, West Ulverstone, Tasmania, Australia

The blog posts on this website are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Data auditing, cleaning and processing

Convert Microsoft serial day numbers to YYYY-MM-DD (2024-02-23)
     Easy, if you remember that 1900-02-29 didn't happen

Finding identifier codes with and without extra characters (2024-02-02)
     A command-line solution for finding near-duplicate values

Characters and encoding

Mojibake with 2 hearts and 52 bytes (2024-02-09)
     Encoding ping-pong between UTF-8 and Windows-1252

Useful programs for command-line data ops

GNU datamash and months (2024-02-16)
     How to help datamash over the month-sorting hurdle