For a full list of BASHing data blog posts see the index page.
How to flatten ("unpivot") a data table
Flattening a rectangular data table like this one ("demo"):
Product | Jan | Feb | Mar | Apr |
widget | 1519 | 1824 | 2048 | 1953 |
wodget | 68 | 101 | 92 | |
combo | 251,106 | 286,129 | 335,267 | 288,193 |
means converting the table into a list of its component triplets (row, column, value), here separated by pipes:
widget|Jan|1519
widget|Feb|1824
widget|Mar|2048
widget|Apr|1953
wodget|Jan|68
wodget|Feb|nodata
wodget|Mar|101
wodget|Apr|92
combo|Jan|251,106
combo|Feb|286,129
combo|Mar|335,267
combo|Apr|288,193
I gave an example of flattening in an earlier BASHing data post, but in that case I ignored blank data items, i.e. unfilled spaces in the data table. The following code replaces blanks with "nodata" and again uses pipes as separators in its output. It assumes the table has tab-separated fields, i.e. the table is a TSV. See that earlier BASHing data post for an explanation of how the AWK command works.
awk -F"\t" 'NR==1 {for (i=2;i<=NF;i++) col[i]=$i} \
NR>1 {for (j=2;j<=NF;j++) {if ($j=="") $j="nodata"; \
print $1 OFS col[j] OFS $j}}' OFS="|" demo
If you're working with spreadsheets, the table-flattening code can go into a "copa" script. That's my name for a script in which data is copied to a clipboard with the xclip utility, passed from xclip to the shell for processing, then returned to the clipboard for pasting. Since rectangular arrays of data items copied from a spreadsheet automatically have tab-separated fields, the table-flattening code doesn't need to be modified at all.
For example, I can launch the script below ("flatten") with a keyboard shortcut. It's just the AWK command above with stdin as its input ("-") and xclip fore and aft:
#!/bin/bash
xclip -o | awk -F"\t" 'NR==1 {for (i=2;i<=NF;i++) col[i]=$i}
NR>1 {for (j=2;j<=NF;j++) {if ($j=="") $j="nodata";
print $1 OFS col[j] OFS $j}}' OFS="\t" - | xclip -selection clipboard
exit
Notice that the output field separator is now a tab rather than a pipe, to allow easy pasting into a spreadsheet.
Here's "demo" in LibreOffice Calc:
I select the table and copy it to the secondary clipboard (with Ctrl + c):
I launch the "flatten" script, and I now have the flattened table on the clipboard. I can paste it back into the spreadsheet wherever I like, or into a new spreadsheet or text file:
Last update: 2022-03-16
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License