Special topics

Keeping track of changes
Hybrid occurrence tables

Keeping track of changes

The changes script records any edits made to Darwin Core data items. It lists the changes by record ID and field, and shows both the original data item and the edited one.

To use the script, first copy the Darwin Core table to be edited so that it's backed up. For example, an "event.txt" table might be copied to "event-old.txt", or "event-v1.txt" or "event-2024-10-06.txt". Next, make any needed edits to the original table. Run the script with the command

changes [original table] [edited table] [number of ID field]

In the example below, I've made edits in several records in an events table. I copied the original table as "event-old.txt" before editing, and the table after editing was saved as "event.txt". The unique ID field eventID was field 1 in this table. Note that the two edits in eventID "TAS-event-3076" are shown separately.

Besides displaying the changes in a terminal, the changes script also sends the report to a text file named for the edited table and a date/time stamp. In the example above, changes built the table "event.txt-edits-2024-08-15T15:43:58.txt" in the same directory as "event.txt". In a text editor, the changes table looks like this:

The changes script only shows edits within data items. If you are editing a table by adding, deleting or moving fields, or by adding, deleting or moving records, make these changes before doing data item edits and documenting them with changes.

Hybrid occurrence tables

Best practice with occurrence data is either to include all the event data items in each occurrence record, or to separate out the event data items in an event.txt table, and replace them in occurrence.txt with a corresponding eventID entry.

This doesn't always happen. Some Darwin Core datasets have both an event.txt table and an occurrence.txt table, with one or more event fields (such as eventDate) repeated in the occurrences table.

Are the repeated data items the same in both tables for the same eventID? There isn't an easy check to answer this question, but here I'll demonstrate one such check with a pair of fictitious data tables (below). In both tables there are the event data fields decimalLatitude, decimalLongitude, eventDate and recordedBy fields. To save space I've abbreviated some field names, e.g. "decLat".

event.txt:

id	eventID	eventDate	decLat	decLon	recBy
bos1	bos1	2025-03-16	-24.7273	135.6194	B. Tor
bos2	bos2	2025-03-16	-24.7268	135.6201	B. Tor
bos3	bos3	2025-03-17	-24.9113	135.6128	B. Tor \| C. Ham
bos4	bos4	2025-03-18	-24.9106	135.6121	B. Tor \| C. Ham
bos5	bos5	2025-03-19	-24.9097	135.6099	B. Tor \| C. Ham

occurrence.txt:

id	occID	sciName	eventID	decLat	decLon	eventDate	recBy	indCount
pog001	pog001	Pogona vitticeps	bos1	-24.7273	135.6194	2025-03-16	B. Tor	1
pog002	pog002	Varanus giganteus	bos1	-24.7273	135.6194	2025-03-16	B. Tor	1
pog003	pog003	Pogona vitticeps	bos2	-24.7268	135.6201	2025-03-16	B. Tor	2
pog004	pog004	Varanus giganteus	bos2	-24.7268	135.6201	2025-03-16	B. Tor	1
pog005	pog005	Pogona vitticeps	bos3	-24.9113	135.6128	2025-03-17	B. Tor \| C. Ham	1
pog006	pog006	Varanus giganteus	bos3	-24.9113	135.6128	2025-03-17	B. Tor \| C. Ham	1
pog007	pog007	Suta suta	bos3	-24.9113	135.6128	2025-03-17	B. Tor \| C. Ham	3
pog008	pog008	Pogona vitticeps	bos4	-24.9106	135.6121	2025-03-18	B. Tor \| C. Ham	2
pog009	pog009	Varanus giganteus	bos4	-24.9016	135.6121	2025-03-18	B. Tor \| C. Ham	1
pog010	pog010	Suta suta	bos4	-24.9106	135.6121	2025-03-18	B. Tor \| C. Ham	1
pog011	pog011	Aspidites ramsayi	bos4	-24.9106	135.6121	2025-03-18	B. Tor	1
pog012	pog012	Varanus giganteus	bos5	-24.9097	135.6099	2025-03-19	B. Tor \| C. Ham	1

The first step in the check is to build a concordance that gives field numbers for the repeated fields (including eventID), first in event.txt, then in occurrence.txt. Note that I've excluded the id field (field 1), since if it's present it will presumably always be different between the two tables:

awk -F"\t" 'ARGIND==1 && FNR==1 {for (i=2;i<=NF;i++) a[$i]=i} ARGIND==2 && FNR==1 {for (j=2;j<=NF;j++) if ($j in a) print $j FS a[$j] FS j}' event.txt occurrence.txt

Step 2 is to build a tab-separated, combined events+occurences table with just the shared fields plus occurrenceID. Using AWK, this table-building has to be done using the concordance results, and since these can vary from dataset to dataset, the following command is "ad hoc" for my demonstration tables:

awk -F"\t" 'FNR==NR {a[$2]=$2 FS $4 FS $5 FS $3 FS $6; next} $4 in a {print a[$4] FS $2 FS $5 FS $6 FS $7 FS $8}' event.txt occurrence.txt

The third and final step is to work through the combined table and identify data items that differ for the same field and the same eventID. The AWK command that does this is again "ad hoc" because it depends on the placement of fields in the combined table. Since this third command works on the combined table, the step 2 and 3 commands are chained together. Again, I've abbreviated to save space:

awk -F"\t" 'FNR==NR {a[$2]=$2 FS $4 FS $5 FS $3 FS $6; next} $4 in a {print a[$4] FS $2 FS $5 FS $6 FS $7 FS $8}' event.txt occurrence.txt | awk -F"\t" 'BEGIN {print "eventID\toccurID\tfield\tev.txt\toc.txt"} NR==1 {for (i=1;i<=NF;i++) b[i]=$i} NR>1 {for (j=2;j<=5;j++) if ($j != $(j+5)) print $1 FS $6 FS b[j] FS $j FS $(j+5)}'

This method successfully identifies the differing data items and reports their eventID, occurrenceID, fieldname and values in both event.txt and occurrence.txt. However, it's a very fiddly method and depends on careful attention to field numbers. There are probably better ways to do this with higher-level languages, such as Python.