banner

For a full list of BASHing data blog posts see the index page.     RSS


Reverse or shuffle a string in a particular field

The idea for this post comes from a 2017 Stack Overflow question. The OP wanted to reverse the string in column 3 of a space-separated table (here called "table") without disturbing the other columns:

ABC DEF GATTAG GHK
ABC DEF GGCGTC GHK
ABC DEF AATTCC GHK

One way to do this is with the shell tools cut, paste and rev:

paste -d" " <(cut -d" " -f1,2 table) \
<(cut -d" " -f3 table | rev) \
<(cut -d" " -f4 table)

revshuf1

In other words, first divide the table "vertically" with cut into parts not to be modified (fields 1,2 and 4) and parts to be modified (field 3). Reverse field 3 with rev, then re-assemble the table with paste -d" ", using a space as separator.

AWK wizard Ed Morton suggested a "horizontal", line-by-line AWK method as a solution, but here's an AWK that's maybe a bit simpler and definitely more general:

awk '{printf("%s %s ",$1,$2); \
n=split($3,a,""); \
for (i=n;i>=1;i--) printf("%s",a[i]); \
printf (" %s\n",$4)}' table

revshuf2

AWK prints fields 1 and 2 with internal and trailing space, then splits field 3 into "n" pieces using the empty string as split-separator (n=split($3,a,"")). Each character in the field 3 string is stored in the array "a". To print field 3 in reverse string order, AWK works through a for loop starting with the "nth" character (i=n), ending with the first character (i>=1) and working backwards (i--). Finally, AWK prints field 4 with leading space and trailing newline. This method will work for strings of any length and composition, including strings containing spaces:

revshuf2a

How about shuffling the field 3 strings? That's straightforward but a little more complicated with shell tools:

paste -d " " <(cut -d" " -f1,2 table) \
<(cut -d" " -f3 table | while read line; \
do fold -w1 <<<"$line" \
| shuf | tr -d "\n"; echo; done) \
<(cut -d" " -f4 table)

revshuf3

Same vertical slicing of the table as in the first problem, but this time field 3 is fed line by line to commands using a while read loop. The first command folds the string into a one-character-wide column (-w1 option), then shuffles the column, then rebuilds the column as a string (tr -d "\n"), then follows the re-built string with a newline using echo.

There are ways to generate random character orders in AWK, but they're not easy to understand and have fairly tedious syntax. I actually like the simplicity of fold/shuf/tr, so I've popped that series of shell commands into an AWK command:

awk '{"echo "$3" | fold -w1 | shuf | tr -d \x22\n\x22" \
|& getline $3; print}' table

revshuf4

This is a GNU AWK (gawk) special construction: "[shell command]" $n |& getline $n. The quoted shell command is called by AWK to do something with field "$n", and the result is stored in "$n", transforming it. The GNU AWK manual describes this construction here.
 
Once field 3 has been shuffled, the line is printed with the default AWK separator (space) using print. A trick I've used here to avoid an AWK syntax error is to replace the double quotes around "\n" in the tr command with their hexadecimal values.


Last update: 2021-07-07
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License