banner

For a list of BASHing data 2 blog posts see the index page.    RSS


Find the first, last, nth and first+last occurrence of a string

There's more than one way to do the jobs listed in the title of this post, but I'll show here the ones I've found to be easiest. The demonstration file ("csv-recs") is shown below, and the search target is "b6f3":

TransID,Date,PersonID
0036,2024-05-05,a4e1
0037,2024-05-05,b6f3
0038,2024-05-05,g9h0
0039,2024-05-06,a4e1
0040,2024-05-06,g9h0
0041,2024-05-07,b6f3
0042,2024-05-07,a4e1
0043,2024-05-07,a4e1
0044,2024-05-07,b6f3
0045,2024-05-08,g9h0


First occurrence. GNU grep has an -m [n] option that allows it to exit after finding the first n occurrences of a target:

grep -m 1 "b6f3" csv-recs

grep1

Last occurrence. The easiest way to do this for small- and medium-sized files is to first reverse the file with tac, then grep for the first occurrence:

tac csv-recs | grep -m 1 "b6f3"

grep2

nth occurrence. I use this AWK command, explained in a previous BASHing data 2 post:

awk '/b6f3/ && ++c==[n]' csv-recs

awk1

A neat feature of this AWK construction is that it can be used for multiple targets. Here I search simultaneously for the second occurrence of "b6f3" and the first occurrence of "g9h0":

awk '(/b6f3/ && ++c==2)||(/g9h0/ && ++d==1)' csv-recs

awk2

If the file is very large, getting AWK to quit when it's done its job and found the nth occurrence is a good idea:

awk '/[search pattern]/ && ++c==[n] {print; exit}' large-file


First and last occurrence. You can string the two grep commands together to get first and last occurrences:

grep -m 1 "b6f3" csv-recs; tac csv-recs | grep -m 1 "b6f3"

grep3

but there are fancier methods. One of the most cryptic I've seen appeared in a Stack Overflow answer. Here's the code adapted to my example file:

awk '/b6f3/ {x=$0} x && !i {print; i=1} END {print x}' csv-recs

awk3

The first time AWK finds "b6f3" it sets "x" equal to the whole line, as it will do every time it finds that target as it works through the file. But while still on that first-occurrence line, AWK moves to the next condition/action, which in plain language says if the variable "x" is set and "i" isn't, print the line and set the variable "i" equal to 1 (and 1 is always true).
 
Because "x" was set in the preceding action but "i" hasn't yet been set, AWK prints the first-occurrence line. The next time AWK finds "b6f3", "i" has been set and so nothing is printed, but "x" stores the line. This processing continues right through the file, and in the END statement AWK prints the last line in which it found "b6f3".
 
Setting "i" equal to 1 is fairly arbitrary. You get exactly the same result if i=17 or i="baloney" in this command, because 17 and baloney are always true.


Many thanks to Sundeep Agarwal for helpful comments on a draft of this post!


Last update: 2024-06-21
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License