banner

For a list of BASHing data 2 blog posts see the index page.    RSS


Converting lat/lon from DMS to DD without screaming

Some of the datasets I audit have coordinates in two formats, namely degree-minute-second (DMS) and decimal degrees (DD). The DMS data are the originals in a single data field, and the DD data are split in the dataset between decimal latitude and decimal longitude fields.

Checking that DMS-to-DD was done correctly could be easy but usually isn't. The problem is that the DMS might be in any one of many formats:

Working with all the possible variations is enough to give you the screaming meemies. What's needed on the command line is a magic program that picks out just the numbers and the directions in the DMS and ignores punctuation and spaces, like this:

numbers-only

One such magic program is GNU AWK. Using AWK's FPAT variable, you can break up a string into fields where the fields match a regex-able pattern. Looking at the DMS above, the patterns to match are one or more digits, one or more digits separated by a ".", and the direction letters N, E, W and S. Those three patterns can be matched with a single regex, although the "." needs some escaping:

awk -v FPAT="[0-9]+|[0-9]+\\\.[0-9]+|[NEWS]"

In the next screenshot I get AWK to print each of the eight fields on a new line:

fields

Next I'll munge the number fields to make DD coordinates to four decimal places, ignoring direction for the moment:

awk -v FPAT="[0-9]+|[0-9]+\\\.[0-9]+|[NEWS]" '{printf("%0.4f %0.4f\n",($1 +($2/60)+($3/3600)),($5+($6/60)+($7/3600)))}'

DD-start

Adding a direction means changing the command to allow for negative latitudes (S) and negative longitudes (W). I do that by first storing the raw DD latitude in "lat" and raw DD longitude in "lon". I then test to see if field 4 is "S" or field 8 is "W", and prepend a hyphen-minus (as the hexadecimal byte 2D) if that's true. Here's the command stored as a function, and in the screenshot the function is tested on a couple of DMS strings.

DMStoDD4() { awk -v FPAT="[0-9]+|[0-9]+\\\.[0-9]+|[NEWS]" '{lat=sprintf("%0.4f",($1+($2/60)+($3/3600))); lon=sprintf("%0.4f",($5+($6/60)+($7/3600))); {if ($4=="S") printf("\x2d%0.4f ",lat); else printf("%0.4f ",lat)}; {if ($8=="W") printf("\x2d%0.4f\n",lon); else printf("%0.4f\n",lon)}}'; }

DMStoDD4

And here's a couple of ugly strings to challenge DMStoDD4:

dms3=$(printf "27\u00ba 28\u2019 30.9\u201d S - 153\u1d52 1\u0027 46.000001\u0027\u0027 E")
dms4=$(printf "27\u02da 28 min 30.9\u2019\u2019 S, 153 d 01 m 46.0 s E")

DMStoDD4-test

DMStoDD4 works well for the DMS coordinates I usually see, but it will fail if the decimal separator is a comma rather than a ".", and if the direction letters are other than NEWS, as in the Spanish "Oeste" for "west".


Next post:
2025-08-29   Filling down blanks in multiple fields


Last update: 2025-08-22
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License