banner

For a full list of BASHing data blog posts, see the index page.     RSS


How to delete, insert and replace whole fields

Much as I like using AWK, there are some jobs with data tables that are just as easy to do with simpler command-line tools. Examples would be deleting, inserting and replacing whole fields with cut and paste.

To demonstrate I'll use this tab-separated table, called "table":

fldAfldBfldCfldDfldE
7zQFQVahhpfTzvhQzT7TYE7a
5SVKiCAoIml8D57IXm9oAoBm
Ej11wNpk9MtmODYMp9qcrOdT
FJnugcHTgJ9ZJaTGVHn8yjFN
7aeZhruMH1y1SUsGPZf0Tu2F

To delete fields and their associated field separators I use the --complement option for the cut command. For example, to delete fields 2 and 4 from "table":

cut -f2,4 --complement table

delins1

A tab character is the default field separator for cut. If the table is comma-separated, I specify that with the -d option:

"table_comma":
 
fldA,fldB,fldC,fldD,fldE
7zQF,QVahhp,fTzvhQz,T7T,YE7a
5SVK,iCAoIm,l8D57IX,m9o,AoBm
Ej11,wNpk9M,tmODYMp,9qc,rOdT
FJnu,gcHTgJ,9ZJaTGV,Hn8,yjFN
7aeZ,hruMH1,y1SUsGP,Zf0,Tu2F
 
cut -d',' -f2,4 --complement table_comma

delins2

Inserting fields is easy with paste, or a combination of cut and paste. Suppose the new field to be added is the file "insert":

fldNew
WKRr
zL2u
mZZq
tEVO
tokO

In the following commands, the new field is added at the beginning of "table", between fields 2 and 3, and at the end:

paste insert table
paste <(cut -f-2 table) insert <(cut -f3- table)
paste table insert

delins3

In that second command, the cut option -f-2 means "all fields up to and including field 2", and -f3- means "all fields starting with field 3"

To do the same with the comma-separated table, I just specify a comma as field separator for cut and paste:

delins4

cut and paste can also add a blank field, with a new field name:

paste <(cut -f-2 table) <(echo "fldnew") <(cut -f3- table)
paste -d',' <(cut -d',' -f-2 table_comma) <(echo "fldnew") <(cut -d',' -f3- table_comma)

delins5

Fields can be replaced with a combination of deleting and inserting. To replace field 2 with "insert", giving it the new field name "fldB" with the help of sed:

paste <(cut -f1 table) <(sed 's/fldNew/fldB/' insert) <(cut -f3- table)

delins5a

cut and paste are great, but AWK is the right tool for inserting a field based on other fields in the same table, especially when doing arithmetical or other calculations. In the following two examples I again add a new field to "table" between fields 2 and 3, but this time the new field counts the characters in field 3, and concatenates fields 1 and 4:

awk 'BEGIN {FS=OFS="\t"} NR==1 {$2=$2 FS "fldNew"} NR>1 {$2=$2 FS length($3)} 1' table
awk 'BEGIN {FS=OFS="\t"} NR==1 {$2=$2 FS "fldNew"} NR>1 {$2=$2 FS $1$4} 1' table

delins6

The BEGIN statement tells AWK that the input and output field separators are both the tab character. Specifying the output field separator is necessary because AWK will be rebuilding records and needs to know how to do it.
 
In the header line (NR==1), field 2 is rebuilt as the string "fld2[tab]fldNew" (FS means "field separator").
 
In the lines after the header (NR>1), AWK rebuilds field 2 as field 2[tab] followed either by the character length of field 3 (length($3)) or a simple concatenation of fields 1 and 4 ($1$4).
 
The final "1" is an AWK shorthand for "print all lines".


Last update: 2020-01-13
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License