For a full list of BASHing data blog posts see the index page.
Reverse or shuffle a string in a particular field
The idea for this post comes from a 2017 Stack Overflow question. The OP wanted to reverse the string in column 3 of a space-separated table (here called "table") without disturbing the other columns:
ABC DEF GATTAG GHK
ABC DEF GGCGTC GHK
ABC DEF AATTCC GHK
One way to do this is with the shell tools cut, paste and rev:
paste -d" " <(cut -d" " -f1,2 table) \
<(cut -d" " -f3 table | rev) \
<(cut -d" " -f4 table)
In other words, first divide the table "vertically" with cut into parts not to be modified (fields 1,2 and 4) and parts to be modified (field 3). Reverse field 3 with rev, then re-assemble the table with paste -d" ", using a space as separator.
AWK wizard Ed Morton suggested a "horizontal", line-by-line AWK method as a solution, but here's an AWK that's maybe a bit simpler and definitely more general:
awk '{printf("%s %s ",$1,$2); \
n=split($3,a,""); \
for (i=n;i>=1;i--) printf("%s",a[i]); \
printf (" %s\n",$4)}' table
AWK prints fields 1 and 2 with internal and trailing space, then splits field 3 into "n" pieces using the empty string as split-separator (n=split($3,a,"")). Each character in the field 3 string is stored in the array "a". To print field 3 in reverse string order, AWK works through a for loop starting with the "nth" character (i=n), ending with the first character (i>=1) and working backwards (i--). Finally, AWK prints field 4 with leading space and trailing newline. This method will work for strings of any length and composition, including strings containing spaces:
How about shuffling the field 3 strings? That's straightforward but a little more complicated with shell tools:
paste -d " " <(cut -d" " -f1,2 table) \
<(cut -d" " -f3 table | while read line; \
do fold -w1 <<<"$line" \
| shuf | tr -d "\n"; echo; done) \
<(cut -d" " -f4 table)
Same vertical slicing of the table as in the first problem, but this time field 3 is fed line by line to commands using a while read loop. The first command folds the string into a one-character-wide column (-w1 option), then shuffles the column, then rebuilds the column as a string (tr -d "\n"), then follows the re-built string with a newline using echo.
There are ways to generate random character orders in AWK, but they're not easy to understand and have fairly tedious syntax. I actually like the simplicity of fold/shuf/tr, so I've popped that series of shell commands into an AWK command:
awk '{"echo "$3" | fold -w1 | shuf | tr -d \x22\n\x22" \
|& getline $3; print}' table
This is a GNU AWK (gawk) special construction: "[shell command]" $n |& getline $n. The quoted shell command is called by AWK to do something with field "$n", and the result is stored in "$n", transforming it. The GNU AWK manual describes this construction here.
Once field 3 has been shuffled, the line is printed with the default AWK separator (space) using print. A trick I've used here to avoid an AWK syntax error is to replace the double quotes around "\n" in the tr command with their hexadecimal values.
Last update: 2021-07-07
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License