For a full list of BASHing data blog posts see the index page.
The easy-going syntax of AWK commands
An endearing feature of AWK is the flexibility of its syntax. Some other languages have very strict rules about how to write commands, and if you disobey the rules, you get error messages.
Cartoon by Jeff Keacher (2008), used with permission.
AWK also has syntactical rules, but they're not overly strict. Below are some examples of AWK's laid-back approach to formulating commands. The AWK being put through its paces is GNU AWK (gawk) version 4.
Designate an input field separator. In the screenshot below I'm feeding AWK a string containing spaces and tabs, both of which are default field separators for AWK. If I simply ask AWK to print the second field (print $2) I get "bbb" because the space between "aaa" and "bbb" is a field separator.
I can designate a tab as the input field separator in any of 4 different ways, as shown, to get the tab-separated second field "ccc ddd". The shorthand -F "\t" and the variable assignment -v FS="\t" both go in front of the AWK command. BEGIN {FS="\t"} goes inside the command, and FS="\t" comes after the command as a sort of pseudo-argument.
Use spaces or not, and break where you like. The next screenshot shows that AWK can be pretty relaxed about spaces in and around a command, although it's important to have at least one space between the word "awk" and the single quote that starts the command proper. Individual commands and parts of a command can be finished off with a semicolon (see 4th command), but a semicolon is optional if there's no reason to separate that part of the command from another action.
AWK is also relaxed about line breaks in a command. As shown below, you can split an AWK command across lines using [space]backslash at any space or logical break.
Assign a value to a variable. In the next screenshot, AWK is adding the value of the variable "n" to the number (100) in the shell variable "num". There are 4 different ways to assign a value to "n", as shown: with the "-v" option before the command, in a BEGIN statement, in the action part of a command, and after the command but before the string (or file) which AWK is processing.
Putting the variable assignment after the command can be handy if you're processing more than one file. In the example shown below, I want to change the "ccc" string in fileA to "ggg", and the "ccc" string in fileB to "hhh". I can do the substitution with a single command by assigning the replacement string to a variable "delta", then defining "delta" differently before processing each file.
The same "different variable value for different files" trick applies to the AWK field separator variable, FS, as demonstrated in the next screenshot:
The situation is more complicated if a variable has already been assigned a value by the shell. AWK needs to convert the shell variable to one of its own variables before processing begins. The usual way to do this is shown below in the first command, where the conversion from zaz (shell) to ZAZ (AWK) is done with the "-v" option. It's also possible to do the conversion after the command but before the string (or file) is actually processed, as demonstrated in the second command.
Designate an output field separator. I can print AWK's output either with the same field separator as in the input or with a different field separator. Again I have a variety of ways available to do this, as shown in the next screenshot. Several of those ways are based on putting the output field separator I want in the AWK variable "OFS", which like the input field separator "FS" can be specified in several different places in the command.
Note that a comma between items to be printed (as in print $1,$3) means use the OFS, and if no OFS is defined, use a single space.
Addressing multiple files. The FNR==NR trick was explained in an earlier BASHing data post. AWK counts all the lines it processes in the variable NR, and all the lines in the current file in the variable FNR. These two counts will only be equal for the first file being processed, so FNR==NR can be used as a condition for an action to be done with the first file only, like this:
awk 'FNR==NR [do something to file1] [do something else to file2] file1 file2
In the example below, I'm changing "2" to "X" in fileE and "2" to "Y" in fileF:
There are 2 alternative ways to treat multiple files separately, and both of these will also work with more than 2 files. The first is to use the AWK variable FILENAME as a condition:
The second method is easier to type. It uses another AWK variable, ARGIND, which counts the file arguments after an AWK command:
Last update: 2020-02-26
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License