For a full list of BASHing data blog posts see the index page.     RSS

A data checker's checklist

The BASHing data blog has been in recess while I worked on a new resource for digital data checkers and cleaners. I've now got something like an outline of topics for that resource, which I'm offering below. Comments from readers would be very welcome on things I've left out, and on things with which data workers would firmly disagree. BASHing data, meanwhile, will continue with occasional posts on miscellaneous topics (like next week's post on some spectacular mojibake).

The new resource will help data workers build data tables that cause the least trouble for downstream data users and processing applications. It explains what to look for in a data table but not how to look. There's no code in the new resource, and no software recommendations. The data-working community is very diverse and includes Excel, R, Python and AWK/BASH wizards. Different workers will have different preferred strategies for checking data tables and for cleaning them. Each to their own!

Basic layout

Formatting and characters



Data items

Cross-table and between-table relationships

Last update: 2021-05-12
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License