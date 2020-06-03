For a full list of BASHing data blog posts see the index page.

Join consecutive lines if condition applies

I recently looked at a TSV that had hundreds (!) of embedded newlines. Fortunately each of the "real" lines began with a serial number, and the breaks between lines were "clean" — no characters or spaces were lost or added. Below is a simplified file of this kind, "fruits". Each of the broken lines ends in a single space:

1 apple 2 pear 3 cherry, grape 4 either banana or apple or apricot 5 grape 6 watermelon, mango

The coding task is to join consecutive lines if the second (or later) line doesn't begin with a number, to get this:

1 apple

2 pear

3 cherry, grape

4 either banana or apple or apricot

5 grape

6 watermelon, mango

Two practical solutions are based either on AWK or on the all-on-one-line trick described in an earlier BASHing data post.

AWK

tail -n +2 <(awk '{if (/^[0-9]+\t/) {print m; m=$0} else {m=m$0}} END {print m}' fruits)

All-on-one-line

paste -s -d $'\x1b' fruits | sed -E 's/\x1b([0-9]+\t)/

\1/g;s/\x1b//g'

