For a full list of BASHing data blog posts see the index page.

Search for (exact) strings; report line, column and context

GNU grep is a great utility but it can only report a search target's line number. Suppose I search for the string "64" in this tab-separated "demo" table with grep's "-n" option:

Fld1	Fld2	Fld3	Fld4	Fld5	Fld6	Fld7	Fld8
001	7b03	020d	71b7	43c4	8ffd	f9352b	102e2d
002	521a	1da1	f9eb	4268	9fa8	fc7357	0a31b8
003	e6c3	0e9b	dc9f	448b	b1c4	7705ca	772ab5
004	36cf	fd59	0c62	4eb6	82d1	e30076	ecedbd
005	15c5	7874	33dc	4b20	b1c4	7a1f3b	8465b0
006	b3fb	5bad	3361	4259	a5b0	30370c	953333
007	15c5	c3d5	33dc	4b20	b1c4	7a1f3b	8465b0
008	7b03	7686	d264	4c34	b0e4	364607	5af668
009	7ee1	8a53	5cc5	4f57	9cf5	ddc735	56eee8
010	bd75	3324	21mz	41b0	b1bc	22964a	b9f2a3
011	15d7	1fb2	7223	4e8f	8f1f	8e6b76	f60cd1
012	c3cc	ef6c	70fb	4a45	9428	f00f73	07e92d
013	9ab4	991c	0bd7	4f3c	badf	ee145b	5a6d17
014	6ad5	8395	19aa	43c4	9cea	3a3c90	e84150
015	607c	3753	8a69	44bf	b41f	ddb1eb	4a42ff
016	7b03	f067	71b7	43c4	8ffd	f9352b	102e2d
017	05f1	89f3	5067	6712	b1c4	3b5245	4c4e35
018	e20d	5346	71a8	4b26	a31d	ab914d	e39049
019	15c5	52bd	33dc	4b20	b1c4	7a1f3b	8465b0
020	e917	b879	08dd	4387	b520	814a8a	10717b

OK, "64" appears on lines 9 and 11, but grep has left it up to me to figure out which fields contain "64". Because field location is often important in my data work, I wrote a function ("fldgrep") that searches for an exact string and returns the string's line and field location (field number and field name) plus the data item containing the string, with the string coloured red:

The function "fldgrep" is actually a single AWK command, although fairly complicated, that works on tab-separated data tables:

fldgrep() { awk -F"\t" -v target="$1" -v blue="\x1b[1;34m" -v red="\x1b[1;31m" -v reset="\x1b[0m" 'NR==1 {for (i=1;i<=NF;i++) a[i]=$i} NR>1 {for (j=1;j<=NF;j++) if ($j ~ target) {n=split($j,m,target,sep); printf("%s","line " blue NR reset ", field " blue j reset " (" blue a[j] reset "): "); for (k=1;k<=n;k++) printf("%s", m[k] red sep[k] reset); print ""}}' "$2"; }

"fldgrep" is explained in the next section. Below are a couple of examples of "fldgrep" in use.

Multiple appearances on one line:

Multiple appearances in one field:

awk -F"\t"
Invokes AWK and tells it that the field separator is the tab character

-v target="$1"
Assigns the first argument of the function (the target string) to the AWK variable "target"

-v blue="\x1b[1;34m"
Assigns the ANSI color escape for bold blue to the AWK variable "blue"

-v red="\x1b[1;31m"
Assigns the ANSI color escape for bold red to the AWK variable "red"

-v reset="\x1b[0m"
Assigns the ANSI color escape for no coloring to the AWK variable "reset"

NR==1
Tells AWK to do a particular action with the table's header line

for (i=1;i<=NF;i++)
The action with the header line is to loop through each of the fields, and

a[i]=$i
add each entry in the header line to an array "a" with the field number as index string and the field contents as value string

NR>1
The remaining actions in the command get done line by line after the header line

for (j=1;j<=NF;j++)
Loop through each of the fields in the line

if ($j ~ target)
Check if the target string is part of the entry in that field, and if yes, do the following four actions

n=split($j,m,target,sep)
The first action is to split the field using the target string as the separator. Put the non-target strings in the array "m" and the adjacent separator (target) in the array "sep". Tally up the number of non-target strings in the variable "n"

printf("%s","line " blue NR reset ", field " blue j reset " (" blue a[j] reset "): ")
The second action (for each field containing the target string) begins with printfing some text (see screenshots above) with the line number, field number and field name highlighted in blue

for (k=1;k<=n;k++)
The third action begins by looping through the non-target strings found by split

printf("%s", m[k] red sep[k] reset)
For each of the non-target strings, printf the non-target string and the red-highlighted target string

print ""
The last action for each field containing the target string is to print nothing and move to the next line

"$2"
This is the second argument for the function, and is the name of the file on which AWK is operating

Last update: 2022-03-09
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License