banner

For a full list of BASHing data blog posts, see the index page.     RSS


Working around the BASH brace expansion rule

Brace expansion in BASH is a neat way to build a Cartesian product, like all the combinations of a set of first names and a set of last names. Just put the sets inside curly braces as comma-separated lists. In the example below I'm separating the two parts of the product with a whitespace:

printf "%s\n" {Ann,Burt,Sal,Tom}" "{deJong,Jones,Schmidt}

brace1

Unfortunately, you can't replace what's inside those curly braces with anything other than a comma-separated list when doing brace expansion on strings. You can't put the list in a variable or in some kind of command substitution, because BASH has a rule that says that brace expansion is done first when executing a command. If you have a variable inside those curly braces, the brace expansion engine will look at it bewildered and pass it on unchanged.

There are two fairly simple alternatives to the brace expansion rule if you want the Cartesian product of sets of strings. One is based on nested loops, the other on an interesting substitution. To show how these work, I'll first put newline-separated name lists into the files "firsts" and "lasts":

firsts:
 
Ann
Burt
Sal
Tom
 
 
lasts:
 
deJong
Jones
Schmidt


Nested loops. This method abandons brace expansion and uses for loops. An outer loop goes through each of the first names and an inner loop goes through each of the last names for each of the first names:

for i in $(cat firsts); do for j in $(cat lasts); do printf "%s\n" "$i $j"; done; done

brace2

The nested-loop alternative has the disadvantage that a for loop splits on whitespace, so that strings with spaces get split into their components. Suppose I modify "firsts" and "lasts" to include names with spaces, like this:

firstsA:
 
Ann Marie
Burt
Sal
Tom
 
 
lastsA:
 
de Jong
Jones
Schmidt

Running the nested-loop command on these files gives me this:

brace4

A workaround for this problem is to temporarily set the BASH built-in variable IFS to a newline:

brace5

and afterwards unset IFS.


Substitution. This second alternative isn't subject to the strings-with-spaces problem. I'll start with the names-with-spaces versions, "firstsA" and "lastsA". If I number all the names consecutively with nl -w1, I get a list of names from the two files with a tab separating number and name:

nl -w1 firstsA lastsA

brace6

Next, I'll do the brace expansion with the appropriate number sequences instead of names, and a tab character as separator. The numbers 1-4 will come from "firstsA", and 5-7 from "lastsA":

printf "%s\n" {1..4}$'\t'{5..7}

brace7

Combining these two operations, I can replace numbers with names using AWK:

awk -F"\t" 'FNR==NR {a[$1]=$2; next} {print a[$1]" "a[$2]}' \
<(nl -w1 firstsA lastsA) <(printf "%s\n" {1..4}$'\t'{5..7})

brace8

The AWK command works with two redirections: one from the concatenated, numbered listing of all names, the other from the brace expansion command. Note that both output files use a tab as field separator, so AWK is given that fact before the command begins: -F"\t".
 
The first part of the command creates an array "a" from the numbered list. The array elements are indexed with the first field (a number) and the value of each element is the second field (the associated name).
 
When AWK finishes with the list, it moves to the second file, which is the output from brace expansion of the numbers. Here it goes through the file and prints the name which is array-associated with the number in the first field, then a space, then the name associated with the number in the second field.
 
The FNR==NR condition is nicely explained this way, from a Stack Overflow post by Tom Fenech:
 
In awk, FNR refers to the record number (typically the line number) in the current file and NR refers to the total record number. The operator == is a comparison operator, which returns true when the two surrounding operands are equal. This means that the condition NR==FNR is only true for the first file, as FNR resets back to 1 for the first line of each file but NR keeps on increasing. This pattern is typically used to perform actions on only the first file. The next inside the block means any further commands are skipped, so they are only run on files other than the first. The condition FNR==NR compares the same two operands as NR==FNR, so it behaves in the same way.

In this substitution alternative, is there a way to insert the numbers needed in the brace expansion (like 1-4 and 5-7) programmatically? Not that I know of. I can find the numbers with various BASH commands. However, I can't insert the output of those commands into the brace expansion, either as redirections or as variables, because BASH does brace expansion first. Grrr.


Last update: 2019-06-14
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License