banner

For a list of BASHing data 2 blog posts see the index page.    RSS


AWK's view of existence

AWK — by which I mean GNU AWK, or "gawk" — sometimes doesn't distinguish between a zero and an empty string. I blogged about this peculiarity in 2020 but this post dives a bit deeper.

I'll start with "test1", a 5-line file with nothing at all after the pipe on line 3:

-2|-2
-1|-1
0|
1|1
2|2

test1,1

When I ask AWK to print a line (the default action) if the first field ($1) exists, it doesn't print the third line, because "0" is synonymous with "false". The same happens when the condition is that the second field exists ($2), because there's nothing at all in that field on the third line. However, when the condition is that the whole line exists ($0), the third line gets printed.

In clearer language, "exists" here means "is not the empty string and is not zero". But note what happened with that third command. The whole line exists (it has a field separator and a newline) and AWK will happily count the fields in that third line:

test1,2

Inverting the conditions gives the opposite results, as you might expect:

test1,3

Adding zero to a field will encourage AWK to regard the field's content as a number, but that doesn't apply to "existence-testing" because zero=false:

test1,4

However, when used with (for example) a comparison operator, a zero springs into existence as "true":

test1,5

Things get a little more unexpected when AWK counts, because in AWK shorthand a count can be a condition: it can exist or not. Below I'm pre-incrementing with the counter "c", and note the difference when the order of the conditions changes:

test1,6

In the first case (++c && $1) the count always happens (there is always a line) but is only printed for lines with existing first fields. In the second case ($1 && ++c) the count only happens if field 1 exists.

AWK has a built-in function typeof, and the type of "0" is a string-representing-a-number, or strnum, while the type of nothing (the empty string) is "string":

test1,7

In summary, AWK sees zero as a string-representing-a-number — but only sometimes. I've been caught by this duality when writing code to detect or count genuinely empty fields. The correct way to do this is to define "empty" as "containing only the empty string":

test1,8

Next post:
2025-02-14   Four exercises with data art


Last update: 2025-02-07
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License