For a list of BASHing data 2 blog posts see the index page.
Replace the last N occurrences of a pattern in a string
This post was inspired by a question from a regular reader (more on that question below). Suppose I have the text file below ("democats") and I want to replace the last "cat" with "CAT" in each line:
I see 1 cat, 2 cats, 3 cats, 4 cats in the yard.
I see 1 cat, 2 cats, 3 cats in the yard.
I see 1 cat, 2 cats in the yard.
I see 1 cat in the yard.
I see no felines in the yard.
One method would be to reverse the line, replace the first "tac" with "TAC" with sed, then reverse the line again:
rev democats | sed 's/tac/TAC/' | rev
sed can also do the replacement without reversing the string:
sed -E 's/(.*)cat/\1CAT/' democats
The -E option lets me replace the characters in parentheses ((.*)) with the same items as a backreference (\1). sed does greedy matching by default, so it matches everything in the line up to the last match of "(zero or more characters)cat", then repeats everything it found up to the last "cat", followed by "CAT".
The question from my reader, though, was something like "How can I replace the last N occurrences in the string?", which is a harder problem.
Suppose N is 3. I could chain together 3 sed commands so that the last "cat" is replaced after each step:
sed -E 's/(.*)cat/\1CAT/;s/(.*)cat/\1CAT/;s/(.*)cat/\1CAT/' democats
But a more general solution would be to chain N sed commands in a shell function like this:
catcount [filename] [N number]
catcount() { sed -E "$(for i in $(seq 1 "$2")}; do echo -n "s/(.*)cat/\1CAT/;"; done)" "$1"; }
The for loop uses echo to build a chain of sed replacement commands like the one shown above, with N repeats. The reason for using echo rather than printf is that "\1" is treated by printf as the octal number "1", which corresponds to the control character "start of heading". To escape the escape "\", two more "\" would be needed:
$ printf "\1A\n"
A
$ printf "\\1A\n"
A
$ printf "\\\1A\n"
\1A
echo copies the backreference faithfully:
$ echo "\1A"
\1A
echo -n avoids the newline at the end of each "s/(.*)cat/\1CAT/;", although that doesn't actually affect how sed processes the commands.
Next post:
2025-01-10 A Unicode normalisation problem
Last update: 2025-01-03
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License