
For a list of BASHing data 2 blog posts see the index page.
Converting lat/lon from DMS to DD without screaming
Some of the datasets I audit have coordinates in two formats, namely degree-minute-second (DMS) and decimal degrees (DD). The DMS data are the originals in a single data field, and the DD data are split in the dataset between decimal latitude and decimal longitude fields.
Checking that DMS-to-DD was done correctly could be easy but usually isn't. The problem is that the DMS might be in any one of many formats:
- the degree number might be followed by the degree symbol (°), the masculine ordinal symbol (º), the modifier small letter o (ᵒ), the ring above character (˚), the letter "d" or even an asterisk (*)
- minutes might be followed by an apostrophe ('), a right single quote (’) or the letter "m"
- seconds might be followed by a quote ("), a right double quotation mark (”), two apostrophes (''), two right single quotes (’’) or the letter "s"
- the seconds item might be a whole number (like "31") or a decimal (like "30.975")
- spacing between DMS items can be variable, from no space to one or more spaces
- there may or may not be a comma between the latitude and longitude elements of the DMS
Working with all the possible variations is enough to give you the screaming meemies. What's needed on the command line is a magic program that picks out just the numbers and the directions in the DMS and ignores punctuation and spaces, like this:

One such magic program is GNU AWK. Using AWK's FPAT variable, you can break up a string into fields where the fields match a regex-able pattern. Looking at the DMS above, the patterns to match are one or more digits, one or more digits separated by a ".", and the direction letters N, E, W and S. Those three patterns can be matched with a single regex, although the "." needs some escaping:
awk -v FPAT="[0-9]+|[0-9]+\\\.[0-9]+|[NEWS]"
In the next screenshot I get AWK to print each of the eight fields on a new line:

Next I'll munge the number fields to make DD coordinates to four decimal places, ignoring direction for the moment:
awk -v FPAT="[0-9]+|[0-9]+\\\.[0-9]+|[NEWS]" '{printf("%0.4f %0.4f\n",($1 +($2/60)+($3/3600)),($5+($6/60)+($7/3600)))}'

Adding a direction means changing the command to allow for negative latitudes (S) and negative longitudes (W). I do that by first storing the raw DD latitude in "lat" and raw DD longitude in "lon". I then test to see if field 4 is "S" or field 8 is "W", and prepend a hyphen-minus (as the hexadecimal byte 2D) if that's true. Here's the command stored as a function, and in the screenshot the function is tested on a couple of DMS strings.
DMStoDD4() { awk -v FPAT="[0-9]+|[0-9]+\\\.[0-9]+|[NEWS]" '{lat=sprintf("%0.4f",($1+($2/60)+($3/3600))); lon=sprintf("%0.4f",($5+($6/60)+($7/3600))); {if ($4=="S") printf("\x2d%0.4f ",lat); else printf("%0.4f ",lat)}; {if ($8=="W") printf("\x2d%0.4f\n",lon); else printf("%0.4f\n",lon)}}'; }

And here's a couple of ugly strings to challenge DMStoDD4:
dms3=$(printf "27\u00ba 28\u2019 30.9\u201d S - 153\u1d52 1\u0027 46.000001\u0027\u0027 E")
dms4=$(printf "27\u02da 28 min 30.9\u2019\u2019 S, 153 d 01 m 46.0 s E")

DMStoDD4 works well for the DMS coordinates I usually see, but it will fail if the decimal separator is a comma rather than a ".", and if the direction letters are other than NEWS, as in the Spanish "Oeste" for "west".
Next post:
2025-08-29 Filling down blanks in multiple fields
Last update: 2025-08-22
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License