Domain schizophrenia

I found this slightly awkward but useful term in

Chapman, A. D. (2005) Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. [Available here]

where Arthur Chapman credits it to

English, L.P. (1999) Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits. New York: John Wiley & Sons, Inc., 518pp.

Chapman says domain schizophrenia occurs in a database when fields are used for purposes for which they weren’t designed and which end up including data of more than one nature. Chapman gives some examples:


I often find domain schizophrenia when using tally. For example, in a Australian biodiversity database there's a text field for 'Location', with entries like Gympie and 15 km S of Griffith. These verbal locations are useful as checks of the accompanying latitude/longitude data. But one of the data items under 'Location' is:

In a 4m high heathland spp. not sure what it is. It has a weeping habit and tiny flat leaves. Every second year or so it flowers spectaularly - looks like a white waterfall. The beetles completely covered it and stayed for about two days (after which the little flowers were worse for wear. We called these bettles pumpkin beetles when we were kids because of their appearance.

That item topped the list for length in a maxchk on the 'Location' field!

In my humble opinion, it's not the job of a data cleaner to fix domain schizophrenia. If you find it, you could politely tell the data manager or compiler about the field in which it occurs, and recommend splitting the field appropriately. In the Eucalyptus example above, those 'Species' entries could be moved to a 'Species_comment' field and the 'Species' field left blank.