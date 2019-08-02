For a full list of BASHing data blog posts see the index page.

A bulk replacement GUI with YAD

I sometimes need to tidy up data tables containing pseudo-duplicate data items. The example below is from a real-world dataset and is part of a tally of a certain field. The tally function ignores the header and generates a sorted list of data items and their frequencies.

1 P. Fernández, I. Porras & J.A. varela

1 P. Fernández , I. Porras & J.A. Varela

1 P. Fernández & I. Porras & J.A. Varela

1 P. Fernández I. Porras & J.A. Varela

1 P. Fernández, I. Porras & J. A. Varela

7 P. Fernández, I. Porras & J.A Varela

923 P. Fernández, I. Porras & J.A. Varela

2 P. Fernandez, I. Porras & J. Varela

2 P. Férnandez, I. Porras & J. Varela

2 P. Fernández, I. Porras & J.Varela

35 P. Fernández, I. Porras & J. Varela

1 P.Fernández, I. Porras & J. Varela

6 P. Fernández, I. Porras y J.A. Varela

Tidying-up (or "normalising") means that I pick one of the variants as the one to use, or modify it, and with it replace all instances of the variants:

Choose "P. Fernández, I. Porras & J.A. Varela", replace others to get



983 P. Fernández, I. Porras & J.A. Varela

Doing this work on the command line, I found myself making tedium-caused errors, so I wrote a shell script (below) to do the job more visibly in a GUI. I'll demonstrate how the script works using the simple tab-separated file "table":

Doing a tally on field 1 gives this list:

Suppose I want to correct and normalise the Old Storys Creek Road entries. I enter brgy table in the terminal, which is the script's name ("brgy") and the table's name as argument. A YAD window opens on the right side of my desktop:

Using highlight/middle-click-paste, I copy the variants and their frequencies from the tally output in the terminal to the top entry box in the YAD dialog ("Items to be replaced"). I then write a new replacement text in the middle entry box ("Replace with") as I've done here, or highlight/middle-click-paste a replacement text from the top entry box to the middle one, and enter the field number in the bottom entry box:

When I click on the "Replace" button, all the top-box entries are replaced in the table, and a new, time-stamped file is created which backs up the original table. The frequencies in the top entry box are ignored.

I can add other entries to the top entry box from elsewhere in the tally output, because YAD "form" entries are editable. I can also modify a pasted-in replacement text in the middle entry box before hitting the "Replace" button.

After a replacement, the YAD dialog disappears and reappears, blank and ready for more replacements in the selected field. To do replacements in a different field, I quit brgy, do a tally on the other field so I have copy-able text, then re-enter brgy filename.

This GUI method works well in my tidying-up and the progressive backups (time-stamped pre-replacement files) are good insurance. Tidying up is still a tedious job, but that's unavoidable.

The brgy script:

#!/bin/bash



while true; do



choice=$(yad --geometry=400x600+1450+100 --title="" --align=center \

--button="Quit":1 --button="Replace":0 \

--form \

--field="Items to be replaced:":TXT \

--field="Replace with:" \

--field="In field number..." \

"" "" "")



case $? in

1) exit 0;;

0) cp "$1" "$1".$(date +"%Y-%m-%d_%T") && \

awk -v REPL="$(echo "$choice" | cut -d"|" -f2)" \

-v FLD="$(echo "$choice" | cut -d"|" -f3)" \

'BEGIN {FS=OFS="\t"} FNR==NR {a[$0]; next} $FLD in a {$FLD=REPL} 1' \

<(echo -e "$choice" | cut -d"|" -f1 | cut -f2) "$1" > temp && \

mv temp "$1" && \

continue;;

252) exit 0;;

esac



done

exit 0

