
For a list of BASHing data 2 blog posts see the index page.
Another tricky formatting problem
The file below, "problem", is a much-simplified version of a big text file I was working with. I've kept the big file's structure, though: an index line (here, a number) followed by one or more pairs of related lines.
1
apple
is a red fruit
cabbage
is a green vegetable
2
carrot
is an orange vegetable
3
pear
is a pear-shaped fruit
banana
is a yellow fruit
celery
is another green vegetable
The goal is to reformat "problem" into this:
1 apple is a red fruit
1 cabbage is a green vegetable
2 carrot is an orange vegetable
3 pear is a pear-shaped fruit
3 banana is a yellow fruit
3 celery is a another green vegetable
The simplest solution I came up with:
awk '/^[0-9]/ {num=$0; next} {getline fol; print num,$0,fol}' problem

The "awk" here is GNU AWK and it works line by line. It first looks to see if the line begins with a number (/^[0-9]/). If it does, AWK stores that whole line in the variable "num" and moves to the next line (num=$0; next).
If that next line doesn't begin with a number, AWK does something complicated (getline fol; print num,$0,fol). First it stores the following line in the variable "fol", using the "getline" function to read and store that following line. It then prints "num", the current line and the following line (as "fol"), all separated by a single space, which is the AWK default.
The variable "fol" is reset with each "getline" action, and the variable "num" is reset every time AWK finds a line beginning with a number.
Notice that AWK ignores the following line in "line-by-line" processing after it's been read by "getline". But if you ask AWK to print the line number it's currently working on (FNR), it returns the number of that following line, not the number of the line on which "getline" acted (see below). In other words, "getline" resets the value of the FNR variable.

Next post:
2025-09-26 My shell and my browser don't understand each other
Last update: 2025-09-19
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License