For a full list of BASHing data blog posts see the index page.
Data validation on entry with YAD
The best way to clean a data table? Clean the data before it gets entered.
A straightforward method is to build and check a stand-alone lookup list, then enforce the use of that lookup list for data entry. It may not be possible to do this for every field in the data table, but for a field with only a limited number of potential entries, a lookup list can save a lot of data cleaning work in future.
I'm still gobsmacked by the 48 variations on "sea level" I found in an elevation field during a data audit.
For data entry I've mainly used YAD dialogs. The YAD "form" dialog offers two simple ways to bring the authority of a lookup list into data entry, as demonstrated below. For more on the form dialog and how to use it, see this 2014 article on the Linux Rain website.
Drop-down list. Here's a couple of lookup lists, "vegetables" and "fruits":
vegetables:
asparagus
aubergine
broccoli
brussel sprouts
cabbage
cauliflower
cucumber
fruits:
apple
apricot
banana
blackberry
blueberry
cherry
coconut
cranberry
This YAD command turns the items in each lookup list into a drop-down list of mandatory choices:
yad --form --columns=2 --item-separator="," \
--field="Veg":CB --field="Fruit":CB \
"$(paste -s -d"," < vegetables)" "$(paste -s -d"," < fruits)"
--form selects "form" from YAD's range of 17 dialogs. There are 2 fields, placed side-by-side with
--columns=2. Each field has a field label (e.g. --field="Veg" and the field property "CB", which creates a combo box with a drop-down list. Each list is populated from a comma-separated version of the lookup list (e.g. $(paste -s -d"," < vegetables)) and YAD is told that the items in that list are separated by commas (--item-separator=","). YAD automatically adjusts the field window widths and the overall window width to fit the fields and the listed items.
If the data enterer doesn't actually have to choose an item from the drop-down list, the default choice can be a blank line or something like "null" as the first item in the lookup list:
vegetables-null:
null
asparagus
aubergine
broccoli
brussel sprouts
cabbage
cauliflower
cucumber
fruits-blank:
#blank
apple
apricot
banana
blackberry
blueberry
cherry
coconut
cranberry
Completion. If the lookup list is large and the resulting drop-down would be awkwardly long, a risky alternative is to use the "completion" feature of the form dialog. As the user types an entry into the field box, YAD selects and presents the matching entries from the lookup list. Note that the matching is case-insensitive, and that a blank line or "null" isn't needed for the "no choice" case because the default entry for completion is an empty string:
yad --form --columns=2 --item-separator="," \
--field="Veg":CE --field="Fruit":CE \
"$(paste -s -d"," < vegetables)" "$(paste -s -d"," < fruits)"
The only change in the YAD command is the replacement of the field property "CB" with "CE", for entry completion.
The completion alternative is risky because whatever the user enters in the field box will be returned by YAD — there's no way to restrict the user's entries to the ones in the lookup list.
Contingent lookup lists. A limitation of YAD dialogs for data entry is that there's no easy way to make the lookup list for field B contingent on a selection made in field A. My own workaround has been to put each lookup list in a separate YAD dialog. The output from dialog A is used in a shell script to select the lookup list for dialog B. Here's a simple script to demonstrate:
Note: For demo purposes I put the "vegetables" and "fruits" lookup lists in my home directory
#!/bin/bash
choice=$(yad --form --text="Choose food category:" --item-separator="," \
--field="":CB "Vegetables,Fruits" | sed 's/|$//')
case $choice in
Vegetables) foodlist=$(paste -s -d"," < vegetables);;
Fruits) foodlist=$(paste -s -d"," < fruits);;
esac
result=$( yad --form --text="Choose $choice item:" --item-separator="," \
--field="":CB "$foodlist" | sed 's/|$//')
yad --text-info <<<"You chose $result"
exit
First dialog closes, second dialog opens...
Click "OK" or press Enter, and this second dialog closes. A YAD "text-info" window opens...
Last update: 2020-01-13
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License