For a full list of BASHing data blog posts, see the index page.
How to delete, insert and replace whole fields
Much as I like using AWK, there are some jobs with data tables that are just as easy to do with simpler command-line tools. Examples would be deleting, inserting and replacing whole fields with cut and paste.
To demonstrate I'll use this tab-separated table, called "table":
fldA | fldB | fldC | fldD | fldE |
7zQF | QVahhp | fTzvhQz | T7T | YE7a |
5SVK | iCAoIm | l8D57IX | m9o | AoBm |
Ej11 | wNpk9M | tmODYMp | 9qc | rOdT |
FJnu | gcHTgJ | 9ZJaTGV | Hn8 | yjFN |
7aeZ | hruMH1 | y1SUsGP | Zf0 | Tu2F |
To delete fields and their associated field separators I use the --complement option for the cut command. For example, to delete fields 2 and 4 from "table":
cut -f2,4 --complement table
A tab character is the default field separator for cut. If the table is comma-separated, I specify that with the -d option:
"table_comma":
fldA,fldB,fldC,fldD,fldE
7zQF,QVahhp,fTzvhQz,T7T,YE7a
5SVK,iCAoIm,l8D57IX,m9o,AoBm
Ej11,wNpk9M,tmODYMp,9qc,rOdT
FJnu,gcHTgJ,9ZJaTGV,Hn8,yjFN
7aeZ,hruMH1,y1SUsGP,Zf0,Tu2F
cut -d',' -f2,4 --complement table_comma
Inserting fields is easy with paste, or a combination of cut and paste. Suppose the new field to be added is the file "insert":
fldNew
WKRr
zL2u
mZZq
tEVO
tokO
In the following commands, the new field is added at the beginning of "table", between fields 2 and 3, and at the end:
paste insert table
paste <(cut -f-2 table) insert <(cut -f3- table)
paste table insert
In that second command, the cut option -f-2 means "all fields up to and including field 2", and -f3- means "all fields starting with field 3"
To do the same with the comma-separated table, I just specify a comma as field separator for cut and paste:
cut and paste can also add a blank field, with a new field name:
paste <(cut -f-2 table) <(echo "fldnew") <(cut -f3- table)
paste -d',' <(cut -d',' -f-2 table_comma) <(echo "fldnew") <(cut -d',' -f3- table_comma)
Fields can be replaced with a combination of deleting and inserting. To replace field 2 with "insert", giving it the new field name "fldB" with the help of sed:
paste <(cut -f1 table) <(sed 's/fldNew/fldB/' insert) <(cut -f3- table)
cut and paste are great, but AWK is the right tool for inserting a field based on other fields in the same table, especially when doing arithmetical or other calculations. In the following two examples I again add a new field to "table" between fields 2 and 3, but this time the new field counts the characters in field 3, and concatenates fields 1 and 4:
awk 'BEGIN {FS=OFS="\t"} NR==1 {$2=$2 FS "fldNew"} NR>1 {$2=$2 FS length($3)} 1' table
awk 'BEGIN {FS=OFS="\t"} NR==1 {$2=$2 FS "fldNew"} NR>1 {$2=$2 FS $1$4} 1' table
The BEGIN statement tells AWK that the input and output field separators are both the tab character. Specifying the output field separator is necessary because AWK will be rebuilding records and needs to know how to do it.
In the header line (NR==1), field 2 is rebuilt as the string "fld2[tab]fldNew" (FS means "field separator").
In the lines after the header (NR>1), AWK rebuilds field 2 as field 2[tab] followed either by the character length of field 3 (length($3)) or a simple concatenation of fields 1 and 4 ($1$4).
The final "1" is an AWK shorthand for "print all lines".
Last update: 2020-01-13
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License