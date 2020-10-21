For a full list of BASHing data blog posts see the index page.

How to use flags in AWK (revisited)

Flags in AWK are variables which are set to either true or false. They're handy for defining ranges over which AWK can act, as shown below. The AWK used here is GNU AWK 4 (gawk 4).

Sometimes flags aren't needed. I'll demonstrate with a simple text file called "demo", which has 6 lines with 3 comma-separated letters on each line:

a,b,c

b,d,b

c,j,k

x,e,d

s,r,x

m,n,o

Here are 3 operations on "demo" which don't require flags:

Print the line with 'j' as second letter:

awk -F"," '$2=="j"' demo

Print all lines up to, but not including, the line with "j" as second letter:

awk -F"," '$2=="j" {exit} 1' demo

Print all lines up to and including the line with "j" as second letter:

awk -F"," '$2=="j" {print; exit} 1' demo

Flag on. Continuing with "demo", here are some simple uses for a flag:

Print all lines starting with the line that has "j" as second letter:

awk -F"," '$2=="j" {f=1} f' demo

Print all lines starting just after the line with "j" as second letter:

awk -F"," '$2=="j" {f=1; next} f' demo

awk -F"," 'f; $2=="j" {f=1}' demo

Flag on, flag off. Still with "demo", some commands that involve unsetting a flag:

Print all lines from the first line with "c" as third letter to the first line with "s" as first letter, inclusive:

awk -F"," '$3=="c" {f=1} $1=="s" {print; f=0} f' demo

Print all lines from the first line with "c" as third letter up to, but not including, the first line with "s" as first letter:

awk -F"," '$3=="c" {f=1} $1=="s" {f=0} f' demo

Print all lines between the first line with "c" as third letter and the first line with "s' as first letter:

awk -F"," '$3=="c" {f=1; next} $1=="s" {f=0} f' demo

awk -F"," '$1=="s" {f=0}; f; $3=="c" {f=1}' demo

On/off, on/off. Flags can be turned on and off repeatedly as AWK processes a file. For a demonstration, here's a list of fruit names in a file called "fruit":

pear

apple

cherry

orange

lemon

raspberry

apple

loquat

feijoa

orange

loquat

Print all lines from "apple" to "orange", inclusive:

awk '/apple/ {f=1} /orange/ {print; f=0} f' fruit

Print all lines from "apple" to "orange", but not including "apple" or "orange":

awk '/orange/ {f=0}; f; /apple/ {f=1}' fruit

Counting the on/off's. Still with "fruit", two ingenious commands based on a suggestion from developer waldner:

Print the lines between the first "apple" and "orange", but not the second, and vice-versa:

awk '/orange/ {f=0}; f && c==1; /apple/ {f=1; c++}' fruit

awk '/orange/ {f=0}; f && c==2; /apple/ {f=1; c++}' fruit

A two flags trick. The flag commands shown above are OK for finding lines between a first starting pattern and a first ending pattern. If the situation is more complicated, as in this list of fruit names (a file named "tricky"), things get tricky:

pear

apple

apple

cherry

orange

orange

lemon

raspberry

apple

strawberry

apple

loquat

feijoa

orange

loquat

The usual commands won't work for finding just the names between the closest-occurring "apple" and "orange". For example:

AWK has followed its instructions, and returned both the second "apple" in line 3 and the "strawberry" and "apple" in lines 9 and 10. To get just the names between the closest-occurring "apple" and "orange", two flags can be used:

awk '/orange/ {f=g=0}; f && g; /apple/ && !f {f=1; next}; /apple/ && f {g=1}' tricky

Here a line is printed only if both flags, f and g, are on. Note that this particular trick will suit this particular file, but it isn't a general solution. Two general solutions were offered by AWK wizards Ed Morton and "pk" when I posted the problem on the comp.lang.awk forum. As applied to "tricky", both solutions accumulate lines between "apple" and "orange" in a variable. Here's Morton's solution:

awk '/orange/ {if (f) printf "%s", buf; f=0}; f {buf=buf $0 ORS}; /apple/ {buf=""; f=1}' tricky

If "apple" is matched, a flag is turned on and the "buf" variable is emptied. After "apple" has been matched, the next lines (not matching "orange" or "apple") are added to "buf" because f is true, and are separated with the output record separator (ORS, here a newline). If "orange" is matched and f is true (because it has been preceded by "apple"), the contents of "buf" are printed and the flag is turned off.

The general solution from "pk" looks like this as applied to "tricky" (split over two lines for clarity):

awk '/apple/ {f=1; b=s=""; next}; /orange/ && f {f=0; print b; b=s=""; next}; \

f {b=b s $0; s=RS}' tricky

This works like Morton's solution, but uses a different order of instructions and sets the record separator as a variable.

