Data auditing and cleaning

Why bother auditing and cleaning?

Incomplete, inconsistent or otherwise defective data are not really fit for anything but broad-brush summaries. Clean data are fit for any use.

What happens during an audit or a clean?

If you only need an audit, I point out problems in your data and report their nature and their exact locations in the data files.
If you need the data to be cleaned as well as audited, I clean the data and provide an annotated 'before-and-after' report for the particular data items that were edited.

What problems get found or fixed?

I look for duplicate, broken and truncated records; internal data disagreements and unintentionally missing data; character encoding issues and incorrect or inconsistent formatting. These are by far the most common problems in large data sets. I look for other problems on request, such as misuse of data fields, geospatial errors and broken or otherwise defective URLs.

What problems won't get found or fixed?

Please see an accountant for help with checking calculations or spreadsheet formulas, and a marketing specialist for checking customer addresses and other personal contact details.

In what form should data be supplied?

I work with text-only data in TSV or CSV format, exported from spreadsheets or databases. Cleaned data tables will be returned as TSV or CSV files, without formulas or formatting.

How are the data kept secure?

If confidentiality is needed, digital data can be supplied to me encrypted — via email, through a website secured by login or on a DVD or thumb drive. Reports and cleaned data (and the disc or drive, if supplied) are sent to you by the same path or another agreed one.
I audit and clean confidential data on an air-gapped computer (not connected to the Internet or a local network) with encrypted working copies of the data. After completing an audit or clean, I repeatedly overwrite all working copies of the data on the computer with random numbers. I retain encrypted copies of audit and 'before-and-after' reports as commercial records.

How much does it cost?

I charge per-job or per-record, depending on client preference and the nature of the data. As a guideline, a 10000-record table with 50 fields might cost AUD$150 for an audit and AUD$450 for a clean.

What are the legalities?

I prefer to work under an agreed contract for service and a non-disclosure agreement. Templates for both are available on request.

Who am I?

I'm a scientist with almost 50 years of experience in data auditing and improving data quality.
ABN 42 021 773 747 (individual sole trader, not registered for GST)

Back to introduction

Want to audit and clean your own data?
Visit A Data Cleaner's Cookbook