About this blog
This is the second series (2024 >) of the BASHing data blog. The first series of 200 posts (2018-2022) and this one are companion websites to A Data Cleaner's Cookbook. Like the first series, the current blog is a place for demonstrations and trials of command-line data "ops". The operations might include analysing, archiving, auditing, cleaning, de-duplicating, encoding, entering, migrating, querying, reformatting, reporting, storing etc.
The first BASHing data series and A Data Cleaner's Cookbook are still online, but they are also archived in Zenodo and can be downloaded for offline use.
About me
I'm a data auditor and retired zoologist.
Robert Mesibov, West Ulverstone, Tasmania, Australia
robert.mesibov@gmail.com
The blog posts on this website are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Topic categories:
- Data auditing, cleaning and processing
- Characters and encoding
- Data formatting
- Data entry and display
- Useful programs for command-line data ops
- AWK tips and tricks
- Miscellaneous stuff
Posts by category (most recent post first):
Data auditing, cleaning and processing
Extract successive pairs from a list, and rapidly grow a list (2024-05-03)
How to do it, but be careful with the "yes" command
Post- and pre-incrementing (var++ and ++var) with AWK (2024-04-26)
Pre or post? Sometimes it doesn't matter
Finding near-duplicate spelling variants (2024-04-05)
How to search for ä/ae-type duplicates
Table in a PDF to a TSV, on the command line (2024-03-29)
Use the pdftotext utility and clean up with sed and AWK
Finding identifier codes with and without extra characters (2024-02-02)
A command-line solution for finding near-duplicate values
Characters and encoding
Print a character as a variable with BASH printf (2024-03-22)
There's a right way and a wrong way, but both work
Counterfeit spaces: the NBSP menace (2024-03-01)
How to visualise and replace (or delete) NBSPs
Mojibake with 2 hearts and 52 bytes (2024-02-09)
Encoding ping-pong between UTF-8 and Windows-1252
Data formatting
DataMatrix codes and data content (2024-04-19)
Squeezing lots of information into a tiny graphic
CSV to JSON to CSV, awkwardly (2024-04-12)
Recovering CSV data from an awful JSON file
Convert Microsoft serial day numbers to YYYY-MM-DD (2024-02-23)
Easy, if you remember that 1900-02-29 didn't happen
Data entry and display
Mapping with gnuplot, part 5 (2024-03-15)
Building a dialog for choosing data to be mapped
Mapping with gnuplot, part 4 (2024-03-08)
Showing a much-improved way to build a basemap
Useful programs for command-line data ops
GNU datamash and months (2024-02-16)
How to help datamash over the month-sorting hurdle
AWK tips and tricks
AWK one-liners to multi-liners (2024-05-10)
A little-known "pretty print" option
Miscellaneous stuff
⇝ The curious world of UUIDs (2024-05-17) ⇜ LATEST
What they are and how to tinker with them