Broken records - 2
Back to too many fields...
Too few fields
A common cause of too few fields is the splitting of a record onto 2 or more lines, with the split happening either between or within fields. In file1 the split is between fields:
$ cat file1
aaa bbb ccc ddd
eee
fff ggg hhh
iii jjj kkk lll
mmm nnn ooo ppp
qqq rrr sss ttt
$ broken file1
1 1
1 3
4 4
and in file2 the split is within a field:
$ cat file2
aaa bbb ccc ddd
eee ff
f ggg hhh
iii jjj kkk lll
mmm nnn ooo ppp
qqq rrr sss ttt
$ broken file2
1 2
1 3
4 4
If the split is across 2 successive lines, a search for short fields with 'either/or' syntax will find the adjoining pair of lines:
$ awk -F"\t" 'NF==1 || NF==3 {print NR": "$0}' file1
2: eee
3: fff ggg hhh
$ awk -F"\t" 'NF==2 || NF==3 {print NR": "$0}' file2
2: eee ff
3: f ggg hhh
Joining 2 successive lines is most easily done with sed:
$ sed '2N;s/\n/\t/' file1
aaa bbb ccc ddd
eee fff ggg hhh
iii jjj kkk lll
mmm nnn ooo ppp
qqq rrr sss ttt
$ sed '2N;s/\n//' file2
aaa bbb ccc ddd
eee fff ggg hhh
iii jjj kkk lll
mmm nnn ooo ppp
qqq rrr sss ttt
and joins with sed can be 'ganged':
$ sed '311N;4065N;4067N;8339N;s/\n/\t/' table
If the pieces of the split record have been shuffled after the split, some fancy AWK work may be required to rejoin them. In the command used here, AWK makes 2 passes through the file. In the first pass it stores the trailing piece 'fff' on line 5 in a variable, and in the next pass it appends a tab and the variable to line 2, and doesn't print line 5:
$ cat file3
aaa bbb ccc
ddd eee
ggg hhh iii
jjj kkk lll
fff
mmm nnn ooo
$ awk 'FNR==NR {if (NR==5) var=$0; next} FNR==2 {print $0"\t"var; next} FNR==5 {next} 1' file3 file3
aaa bbb ccc
ddd eee fff
ggg hhh iii
jjj kkk lll
mmm nnn ooo
Back to too many fields...