banner

For a list of BASHing data 2 blog posts see the index page.    RSS


Similar to "most distant pair of points", sort of

I had a long, sorted, tab-separated list of spatially clustered sampling sites and their latitudes and longitudes, and I wanted to check that none of the distances within a cluster was larger than a few hundred metres. Each cluster had three sites, and the coordinates were latitude first:

sample data

My solution used everyday shell tools and AWK. To see the solution's structure more clearly, here's a simplified data file, "demo" (below). The aim is to get sums for each of the three pairs within a letter group, like "A1+A2", "A1+A3" and "A2+A3", and see which sum is biggest.

A1 87
A2 71
A3 32
B1 7
B2 55
B3 55
C1 23
C2 39
C3 30

I'll first put each of the letter groups on a single line with paste:

demo1

Now I can use a line-based processor like AWK to get the sums:

demo2

I finish with a sort in decreasing numerical order:

demo3

paste - - - < demo | awk '{print $1"+"$3,($2+$4); print $1"+"$5,($2+$6); print $3"+"$5,($4+$6)}' | sort -nrk2

The default field separator in AWK is one or more spaces, and "spaces" includes the tabs generated by paste.

I did something similar for the sites check, but the AWK command was considerably more complicated. It included my user-defined AWK function "distance", which prints the number of kilometres between two latitude/longitude pairs to one decimal place:

function distance(lat1,lat2,lon1,lon2) {printf("%0.1f\n",sqrt(((lat2-lat1)*111.32)^2 + ((lon2-lon1)*111.32*cos(lat2*(pi/180)))^2))}

The Euclidean distance calculation I use is described in this BASHing data post.

The full one-liner working on the sites list ("sitelist") was

awk 'function distance(lat1,lat2,lon1,lon2) {printf("%0.1f\n",sqrt(((lat2-lat1)*111.32)^2 + ((lon2-lon1)*111.32*cos(lat2*(pi/180)))^2))} BEGIN {pi=3.14159} {printf("%s/%s\t",$1,$4); distance($2,$5,$3,$6); printf("%s/%s\t",$1,$7); distance($2,$8,$3,$9); printf("%s/%s\t",$4,$7); distance($5,$8,$6,$9)}' <(paste - - - < sitelist) | sort -nrk2 | less

Here's a screenshot of the top results:

sites check

There's something seriously wrong here, which turned out to be one-digit-off data entry errors. The HS18-C-L3 and HS18-C-P3 sites had longitude 147.3839, which should have been 146.3839. HS01-C-L1 and HS01-C-P1 had latitude 36.9923, which should have been 36.8823.


Next post:
2025-12-05   Where there's a shell, there's a (usually simpler) way


Last update: 2025-11-28
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License