BASHing data 2

BASHing data 2 https://www.datafix.com.au/BASHing2/index.html Data ops on the Linux command line Data processing en-us Replace the last N occurrences of a pattern in a string https://www.datafix.com.au/BASHing2/2025-01-03.html Fri, 03 Jan 2025 00:00:00 GMT Drive sed with a for loop A Unicode normalisation problem https://www.datafix.com.au/BASHing2/2025-01-10.html Fri, 10 Jan 2025 00:00:00 GMT How to get rid of full-width characters MAD about the median https://www.datafix.com.au/BASHing2/2025-01-17.html Fri, 17 Jan 2025 00:00:00 GMT That's Median Absolute Deviation, a useful statistic Adding the missing keys and values in a key-value series https://www.datafix.com.au/BASHing2/2025-01-24.html Fri, 24 Jan 2025 00:00:00 GMT Easily done with an AWK array The ìèñëèâñüêå mystery https://www.datafix.com.au/BASHing2/2025-01-31.html Fri, 31 Jan 2025 00:00:00 GMT The killer was... the Microsoft Corporation AWK's view of existence https://www.datafix.com.au/BASHing2/2025-02-07.html Fri, 07 Feb 2025 00:00:00 GMT AWK — by which I mean GNU AWK, or "gawk" — sometimes doesn't distinguish between a zero and an empty string. I blogged about this peculiarity in 2020 but this post dives a bit deeper. Four exercises with data art https://www.datafix.com.au/BASHing2/2025-02-14.html Fri, 14 Feb 2025 00:00:00 GMT Colorful fun with PPMs Extract the year from a date string without using the date command https://www.datafix.com.au/BASHing2/2025-02-21.html Fri, 21 Feb 2025 00:00:00 GMT Demonstrating 2 methods and 4 variations for each Does string A contain string B? Ask AWK's index https://www.datafix.com.au/BASHing2/2025-02-28.html Fri, 28 Feb 2025 00:00:00 GMT Demonstrating a handy use for the index function Permutations and combinations of pairs with AWK https://www.datafix.com.au/BASHing2/2025-03-07.html Fri, 07 Mar 2025 00:00:00 GMT Easy ways to get results with and without repetition Find all data points "X" km or less from a given point https://www.datafix.com.au/BASHing2/2025-03-14.html Fri, 14 Mar 2025 00:00:00 GMT I have a table with about 10000 records of millipede occurrences in Tasmania. Almost every record has the latitude and longitude of the point where a particular species of millipede was recorded, although in a few cases the lat/lon entries are blank (inadequately georeferenced occurrences, like locality = "Tasmania"). Occasionally I need to identify any records that are 5 (or 10) km or less from a particular point in the Tasmanian landscape. Previously, finding those records meant importing the latest occurrences table into a GIS program, adding the point of interest, buffering the point with a 5 (or 10) km circle, then intersecting the records layer with the circle; i.e. "cookie-cutting" out the millipede records found within the circle. As a final step I'd export the cookie-cut records to a new text file. To save time and avoid opening a GUI GIS program I wrote a simple shell script ("circle") with a YAD entry dialog. This post explains how the script works. New code for my translation box https://www.datafix.com.au/BASHing2/2025-03-21.html Fri, 21 Mar 2025 00:00:00 GMT Back in 2017 I blogged about a script for quick-and-easy translation. I highlighted a block of non-English text in an application window (for example, on a webpage in my browser) and launched the script with a keyboard shortcut. A YAD dialog popped up next to the application, and a second or two later an English translation appeared in the YAD window. The YAD dialog had automatically gained focus and I could dismiss it by pressing Esc. I've used the script a lot over the past 8 years, but YAD has changed a little and I wanted to have both the original and the translated text together in the YAD window for copying. The revised script shown here does that and has another useful feature. Text processing with xargs and jot https://www.datafix.com.au/BASHing2/2025-03-28.html Fri, 28 Mar 2025 00:00:00 GMT The xargs and jot commands are mainly useful for jobs other than manipulating text. I'll feed a list of files to xargs to get another command to do something with each of the files, and I'll call on jot to generate some random numbers. Nevertheless, just as GNU datamash can do text processing as well as descriptive statistics, both xargs and jot have niche uses for the handling of plain text, as shown in this post. Rename time-series files for chronological sorting https://www.datafix.com.au/BASHing2/2025-04-04.html Fri, 04 Apr 2025 00:00:00 GMT What I call "time-series files" are documents like monthly financial statements, monthly newsletters and quarterly reports. I don't modify them after I receive them, and they get filed and archived in an appropriate folder, like "[Organisation name]-newletters". Often the time-series file doesn't have an informative name. My last quarterly water bill from TasWater, for example, arrived as "document15176075861143268989.pdf" (I am not making this up). Even when the filename is informative, the date component might be less than obvious: "Smalltown Tennis Club Dec 24 news.pdf". I've recently gotten into the habit of renaming time-series files to make them chronologically sortable by filename, as in "2024-12-18-TasWater-bill". I wanted to do the same for the contents of folders-ful of legacy documents, so I turned to the command line and "date -r ', which returns the last-modified date of a file or folder. A launcher for occasionally used applications https://www.datafix.com.au/BASHing2/2025-04-11.html Fri, 11 Apr 2025 00:00:00 GMT There are no launch icons on my desktop because I prefer using keyboard shortcuts (memorised) to launch frequently used applications and scripts. To start applications that I use rarely, I turn to the Xfce applications menu, which I open with the Super (Windows) key or by right-clicking on the desktop. There I navigate through lists and sub-lists to find the application I want. For a half-dozen occasionally used applications I have a mouse-less alternative. It's the script "occapps" (shown here) and it's launched from the keyboard. Before writing the script I looked at the many existing application launchers for Linux. Most of them have a search box in which you type the first few letters of the application name. That doesn't seem efficient in my case, because I'm only interested in a few applications that aren't already tied to keyboard shortcuts. The "occapps" script is very simple and easily editable if I need to change an application or command. Extreme reformatting: a vertical calendar https://www.datafix.com.au/BASHing2/2025-04-18.html Fri, 19 Apr 2025 00:00:00 GMT I've used the "cal" program for many years, especially with the "-3" option, which returns the current month and the two on either side. I now use a homemade vertical calendar built from "ncal", with the current date highlighted. How to add trailing spaces and zeroes https://www.datafix.com.au/BASHing2/2025-04-25.html Fri, 25 Apr 2025 00:00:00 GMT It's pretty easy to add leading spaces or zeroes with printf, since leading spaces and zeroes are a kind of right-justifying. How about left-justifying? Can you add trailing spaces and zeroes just as easily? Spaces yes, zeroes no.