banner

For a list of BASHing data 2 blog posts see the index page.    RSS


Another tricky formatting problem

The file below, "problem", is a much-simplified version of a big text file I was working with. I've kept the big file's structure, though: an index line (here, a number) followed by one or more pairs of related lines.

1
apple
is a red fruit
cabbage
is a green vegetable
2
carrot
is an orange vegetable
3
pear
is a pear-shaped fruit
banana
is a yellow fruit
celery
is another green vegetable

The goal is to reformat "problem" into this:

1 apple is a red fruit
1 cabbage is a green vegetable
2 carrot is an orange vegetable
3 pear is a pear-shaped fruit
3 banana is a yellow fruit
3 celery is a another green vegetable

The simplest solution I came up with:

awk '/^[0-9]/ {num=$0; next} {getline fol; print num,$0,fol}' problem

reformat1

The "awk" here is GNU AWK and it works line by line. It first looks to see if the line begins with a number (/^[0-9]/). If it does, AWK stores that whole line in the variable "num" and moves to the next line (num=$0; next).

If that next line doesn't begin with a number, AWK does something complicated (getline fol; print num,$0,fol). First it stores the following line in the variable "fol", using the "getline" function to read and store that following line. It then prints "num", the current line and the following line (as "fol"), all separated by a single space, which is the AWK default.

The variable "fol" is reset with each "getline" action, and the variable "num" is reset every time AWK finds a line beginning with a number.

Notice that AWK ignores the following line in "line-by-line" processing after it's been read by "getline". But if you ask AWK to print the line number it's currently working on (FNR), it returns the number of that following line, not the number of the line on which "getline" acted (see below). In other words, "getline" resets the value of the FNR variable.

reformat2

Next post:
2025-09-26   My shell and my browser don't understand each other


Last update: 2025-09-19
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License