For a full list of *BASHing data* blog posts see the index page.

# Making a transect into a point and circle

In a *BASHing data* blog post last month I featured the WKT format, which can be used to describe a straight-line transect from latitude/longitude point 1 to latitude/longitude point 2:

LINESTRING (longitude1 latitude1, longitude2 latitude2)

Another way to describe a straight-line transect is with its midway point plus the radius of a circle which includes the whole of the transect. In the Darwin Core scheme for recording biological data, that midway point is at *decimalLatitude* and *decimalLongitude*, and the circle's radius, or half the length of the transect, is the *coordinateUncertaintyInMeters*.

Given the LINESTRING WKT for a transect, you can calculate the midway point and half the transect length in a single AWK command, as explained below.

As a demonstration file ("wkts") I'll use 10 marine sampling transects from a GBIF dataset:

LINESTRING (-42.98106 -60.74496, -42.98813 -60.74046)

LINESTRING (-44.99019 -62.15421, -44.98773 -62.14981)

LINESTRING (-44.99531 -62.28379, -45.00127 -62.28109)

LINESTRING (-45.64496 -62.66805, -45.64785 -62.66019)

LINESTRING (-46.52195 -60.36142, -46.53308 -60.35873)

LINESTRING (-46.53158 -60.36383, -46.53119 -60.35797)

LINESTRING (-46.68411 -60.35385, -46.68745 -60.35828)

LINESTRING (-46.68423 -60.35437, -46.68761 -60.35889)

LINESTRING (-46.68984 -60.22065, -46.69883 -60.21816)

LINESTRING (-46.76719 -60.32237, -46.77263 -60.32624)

The command I'll use prints *decimalLatitude*, *decimalLongitude* and *coordinateUncertaintyInMeters* as tab-separated items. For display purposes I'll put the results on a new line and slightly indent them. Here's that command and its output with "wkts":

awk -F"\t" 'BEGIN {pi=3.14159} \

{printf("%s\n",$1); \

sub(","," ",$1); \

split($1,a,"[()]"); \

split(a[2],b," "); \

$2=sprintf("%.5f",(b[2]+b[4])/2); \

$3=sprintf("%.5f",(b[1]+b[3])/2); \

$4=sprintf("%.0f",0.5*sqrt(((b[2]-b[4])*111320)^2 + ((b[1]-b[3])*111320*cos(b[2]*(pi/180)))^2)); \

print " " $2 FS $3 FS $4}' wkts

**-F"\t"** AWK is told that the input field separator is a tab. Since there are no tabs in "wkts", the whole WKT becomes field 1.

**BEGIN {pi=3.14159}** AWK doesn't have *pi* as an internal constant, so it's added here as a variable in a BEGIN statement.

**{printf("%s\n",$1);**AWK's first action is to **printf** the WKT, followed by a newline. In a tab-separated data table, it would instead be followed by a tab.

**sub(","," ",$1);** Here AWK replaces the comma in the LINESTRING with a plain space, modifying the field.

**split($1,a,"[()]");**The modified field 1 is **split** using either "(" or ")" as string separators, with the pieces going into array "a". The first piece (a[1]) is "LINESTRING ", the second (a[2]) for the first line of "wkts" is "-44.99019 -62.15421 -44.98773 -62.14981" and the third (a[3]) is empty.

**split(a[2],b," "); **The string of longitudes and latitudes (a[2]) is **split** using one or more spaces as separator and the pieces stored in array "b". For the first line of "wkts", b[1] is "-44.99019", b[2] is "-62.15421", b[3] is "-44.98773" and b[4] is "-62.14981".

**$2=sprintf("%.5f",(b[2]+b[4])/2);** A second field is defined as the average of the two latitudes, **printf**-formatted to 5 decimal places.

**$3=sprintf("%.5f",(b[1]+b[3])/2); ** A third field is defined as the average of the two longitudes, again **printf**-formatted to 5 decimal places.

**$4=sprintf("%.0f",0.5*sqrt(((b[2]-b[4])*111320)^2 + ((b[1]-b[3])*111320*cos(b[2]*(pi/180)))^2)); ** A fourth field is defined by a complex calculation whose result is **printf**-formatted to be rounded with no decimal places. The calculation of the Euclidean distance between two latitude/longitude points is explained in this *BASHing data* post from 2018. The "111320" figure for meters per degree is based on this Wikipedia article on decimal degrees.

**print " " $2 FS $3 FS $4}**The three new fields are printed for each WKT with a leading space and tab separation.

There are a couple of caveats that come with this command. The first is that you should allow for the uncertainty in the WKT's two latitude/longitude points by extending the calculated uncertainty for the transect. This might be only 10 m for a good GPS reading, but I would be conservative and add 30 m to the half-length of the transect (and round up to the nearest 10 m).

The second warning is that you can't do "exact" calculations with decimal numbers on a computer, because computers calculate in binary. The calculated midpoint is an approximation and is probably good to a rounded-off 4 decimal places. How you would do that rounding is another choice; I always "round to even", but you might prefer another method. In any case, the location's uncertainty will probably be much larger than a small rounding bias or error.

Last update: 2021-12-22

The blog posts on this website are licensed under a

Creative Commons Attribution-NonCommercial 4.0 International License