banner

For a full list of BASHing data blog posts, see the index page.     RSS


Making pictures with data

A little-appreciated feature of the command-line program ImageMagick is that IM can display data bytes as image bytes.

To demonstrate, I'll build the plain-text file "demo1" with 50 consecutive repetitions of a 36-character string:

for i in {1..50}; do printf "A picture is worth a thousand words.";done | paste -s > demo1

IMdata1

Each of the 36 characters in that string, including the blanks and the final ".", is a single byte in the ASCII and UTF-8 character encodings. For example, the capital "A" is 01000001, or 41 in hexadecimal notation:

IMdata2

In the RGB colour system used for digital displays, each of the three colour channels — R, G and B — contains one byte of information when building a pixel. If I assign the hex value 41 to each of the three channels, I get a rather somber gray colour. That's the colour of a capital "A" in a grayscale image, as explained below:

IMdata3

The file "demo1" contains exactly 36 x 50, or 1800 single-byte characters, or 1800 bytes. I'll use the IM command convert -depth 8 -size 36x50 gray:demo1 out1.png to build an image 36x50 pixels, or 1800 pixels in all. The "depth" options says that each of the 3 channels will be specified with 8 bits of information, or 1 byte. "gray:demo1" means that IM will treat "demo1" as raw bytes for pixel-building, and use the same byte 3 times for the R, G and B values. IM will then export the image as "out1.png". I'll have trouble seeing such a tiny image, so I'll pass that file to another IM command to scale the image up 500%, to 180x250 pixels. Finally, I'll display the scaled-up image in an IM window with the "display" command. Here's the full command chain:

convert -depth 8 -size 36x50 gray:demo1 out1.png; convert -scale 500% out1.png 500out1.png; display 500out1.png

and here's the result:

IMdata4

You're seeing 36 vertical lines, one for each character, because the 36 characters are repeated 50 times in a 36x50 frame. The leftmost line is the capital "A", with RGB colour #414141. I can break up the vertical lines by choosing another frame size adding up to 1800 pixels, for example 45x40:

IMdata5

I can do this in colour, too, but this time IM will use a different byte for each RGB channel. The first pixel, then, will be built using "A", "[blank]" and "p", which have hex values 41, 20 and 70. The next pixel will be built with "i", "c" and "t", and so on. Here's the first pixel's colour:

IMdata6

Because I'm using three characters per pixel, the image size will be 1/3 of the grayscale one, for example 12x50. "gray" in the command needs to be changed to "rgb", and I'll boost the scale-up, too. In the screenshot below, only the top part of the IM window is shown. Note the "A[blank]p" colour in the leftmost column:

convert -depth 8 -size 12x50 rgb:demo1 out3.png; convert -scale 800% out3.png 500out3.png; display 500out3.png

IMdata7

Here's a prettier version, with the pre-scale-up frame set to 40x15 pixels:

IMdata8

IM will return an error message unless the number of data bytes it's given is in agreement with the number of pixels. For the "gray" option, that means 1 byte per pixel, and for "rgb" 3 bytes per pixel. When picturing data this way, you may need to add a byte or two. For example, I have a text file "synspe" with a total of 366181 characters, and 366181 isn't evenly divisible by 3. If I add a couple of "a" characters at the end of the file with sed, the new characters total is 366183, which is 3 times 122061:

IMdata9

The 122061 total pixels can be framed by finding suitable whole-number factors . I've picked 667x183, and please note in the following command that the filename is followed by "[0]". This stops IM from throwing up an error message and building two images instead of one:

convert -depth 8 -size 667x183 rgb:synspe-2[0] out4.png; display out4.png

IMdata10

That filename[0] workaround is curious. I understand it's meant to specify the first layer in a multi-layered image, but with my raw bytes as input (and no image format declared in a header) there aren't multiple layers. I get an error message and multiple outputs with some input files but not others of the same byte size. Puzzling...

So, what good is "data imaging"?

In the "demo1" screenshots above, the regularity of the pattern in the data is reflected in the regularity of the image. A dataset with completely random characters shouldn't show any obvious regularities, and data imaging has been used as a visual check for randomness. It's also possible to store text data in an image (as above), but getting the original bytes back from a compressed file format plus a header, like PNG, isn't easy.

"Data imaging", as I see it, is a simple kind of data art, and the IM tricks shown above can be fun to tinker with. Below are variations on the "synspe" text-data image, made with GIMP tools.

IMdata11

Last update: 2019-04-14