banner

For a full list of BASHing data blog posts see the index page.  RSS


Search for (exact) strings; report line, column and context

GNU grep is a great utility but it can only report a search target's line number. Suppose I search for the string "64" in this tab-separated "demo" table with grep's "-n" option:

Fld1Fld2Fld3Fld4Fld5Fld6Fld7Fld8
0017b03020d71b743c48ffdf9352b102e2d
002521a1da1f9eb42689fa8fc73570a31b8
003e6c30e9bdc9f448bb1c47705ca772ab5
00436cffd590c624eb682d1e30076ecedbd
00515c5787433dc4b20b1c47a1f3b8465b0
006b3fb5bad33614259a5b030370c953333
00715c5c3d533dc4b20b1c47a1f3b8465b0
0087b037686d2644c34b0e43646075af668
0097ee18a535cc54f579cf5ddc73556eee8
010bd75332421mz41b0b1bc22964ab9f2a3
01115d71fb272234e8f8f1f8e6b76f60cd1
012c3ccef6c70fb4a459428f00f7307e92d
0139ab4991c0bd74f3cbadfee145b5a6d17
0146ad5839519aa43c49cea3a3c90e84150
015607c37538a6944bfb41fddb1eb4a42ff
0167b03f06771b743c48ffdf9352b102e2d
01705f189f350676712b1c43b52454c4e35
018e20d534671a84b26a31dab914de39049
01915c552bd33dc4b20b1c47a1f3b8465b0
020e917b87908dd4387b520814a8a10717b

fldgrep1

OK, "64" appears on lines 9 and 11, but grep has left it up to me to figure out which fields contain "64". Because field location is often important in my data work, I wrote a function ("fldgrep") that searches for an exact string and returns the string's line and field location (field number and field name) plus the data item containing the string, with the string coloured red:

fldgrep2

The function "fldgrep" is actually a single AWK command, although fairly complicated, that works on tab-separated data tables:

fldgrep() { awk -F"\t" -v target="$1" -v blue="\x1b[1;34m" -v red="\x1b[1;31m" -v reset="\x1b[0m" 'NR==1 {for (i=1;i<=NF;i++) a[i]=$i} NR>1 {for (j=1;j<=NF;j++) if ($j ~ target) {n=split($j,m,target,sep); printf("%s","line " blue NR reset ", field " blue j reset " (" blue a[j] reset "): "); for (k=1;k<=n;k++) printf("%s", m[k] red sep[k] reset); print ""}}' "$2"; }

"fldgrep" is explained in the next section. Below are a couple of examples of "fldgrep" in use.

Multiple appearances on one line:

fldgrep3

Multiple appearances in one field:

fldgrep4

awk -F"\t"
Invokes AWK and tells it that the field separator is the tab character
 
-v target="$1"
Assigns the first argument of the function (the target string) to the AWK variable "target"
 
-v blue="\x1b[1;34m"
Assigns the ANSI color escape for bold blue to the AWK variable "blue"
 
-v red="\x1b[1;31m"
Assigns the ANSI color escape for bold red to the AWK variable "red"
 
-v reset="\x1b[0m"
Assigns the ANSI color escape for no coloring to the AWK variable "reset"
 
NR==1
Tells AWK to do a particular action with the table's header line
 
for (i=1;i<=NF;i++)
The action with the header line is to loop through each of the fields, and
 
a[i]=$i
add each entry in the header line to an array "a" with the field number as index string and the field contents as value string
 
NR>1
The remaining actions in the command get done line by line after the header line
 
for (j=1;j<=NF;j++)
Loop through each of the fields in the line
 
if ($j ~ target)
Check if the target string is part of the entry in that field, and if yes, do the following four actions
 
n=split($j,m,target,sep)
The first action is to split the field using the target string as the separator. Put the non-target strings in the array "m" and the adjacent separator (target) in the array "sep". Tally up the number of non-target strings in the variable "n"
 
printf("%s","line " blue NR reset ", field " blue j reset " (" blue a[j] reset "): ")
The second action (for each field containing the target string) begins with printfing some text (see screenshots above) with the line number, field number and field name highlighted in blue
 
for (k=1;k<=n;k++)
The third action begins by looping through the non-target strings found by split
 
printf("%s", m[k] red sep[k] reset)
For each of the non-target strings, printf the non-target string and the red-highlighted target string
 
print ""
The last action for each field containing the target string is to print nothing and move to the next line
 
"$2"
This is the second argument for the function, and is the name of the file on which AWK is operating


Last update: 2022-03-09
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License