For a full list of BASHing data blog posts see the index page.     RSS

Add leading zeroes that aren't really leading

A leading zero can be a useful addition to a number string, and there are several ways to add one or more leading zeroes on the command line. The addition is a little less straightforward if the leading zero sits inside a non-numeric string. This post deals with a couple of such cases.


Image from here

Fixed-length strings like motor vehicle registration codes often contain leading zeroes. Suppose I want to generate a list of the first 11 codes in the series shown in the image above, namely TUS000 to TUS010. A simple method is to use BASH brace expansion:

for i in {000..10}; do printf "%s\n" "TUS$i"; done


Notice that BASH doesn't need a special option for leading zeroes. I write "000" as the starting number in the expansion, and BASH takes the hint.

The seq command can also do leading zeroes with its -w option. Adding a string before each number is a bit trickier:

echo -e "TUS$(seq -s "\nTUS" -w 000 10)"


"TUS" is added to the second and subsequent numbers by making a newline and "TUS" the separator (-s) between the numbers generated by seq.

AWK is usually my first choice for reformatting existing strings so that internal numbers have leading zeroes, for example converting YYYY-M(M)-D(D) dates to ISO 8601 format YYYY-MM-DD:


awk -F"\t" 'NR==1 {print} \
> NR>1 {split($2,a,"-"); split($3,b,"-"); \
> printf("%s\t%s-%02d-%02d\t%s-%02d-%02d\n", \
> $1,a[1],a[2],a[3],b[1],b[2],b[3])}' file

From the second line onwards in "file" (NR>1), AWK first splits the date strings on the "-" character and stores each of the pieces in an array ("a" for field 2, "b" for field 3). printf then formats each date string piece by piece, with the month and day pieces formatted as 2-character numbers with a zero in front where needed.

In the example below, a suitable choice of field separators makes the reformatting job easier:


awk -F"[X:]" '{printf("%sX%02d:%s\n",$1,$2,$3)}' list

The field separators are "X" and ":", which means the model numbers (1, 5, 10) are isolated as field 2 in each line. The field-separating characters "X" and ":" are put back in the line using the printf formatting instruction.

Last update: 2019-09-13
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License