banner

For a list of BASHing data 2 blog posts see the index page.    RSS


Archiving images: TIFF vs PPM

images1

The best way to archive images is to preserve them in an analog format. The Lascaux cave painting shown above is ca 17000 years old, and some cave paintings in Borneo may be more than twice that age. Paintings on papyrus and silk can be and have been preserved for many centuries.

Preserving either born-digital images or digital representations of analog images isn't as simple as slipping a USB stick into a cave and sealing the entrance. The technology for recovering the digital data on that USB drive may not be around in future. For this reason, digital archivists see archiving as an endless task. Every X years the archived digital content needs to be transferred to the most durable and most easily readable storage medium of that time.

In other words, digital archiving isn't "store and forget". It's "store and hope that future archivists can pass on the data, without loss or corruption, to archivists in their future".

Archiving images in this way is harder than archiving plain text, because the information content is so much bigger. Even compressed as a JPEG, the image on the left is 42 KB, while the text block on the right, about the same size to our eyes as the image, is only 0.96 KB as characters in UTF-8 encoding.

images2

One method for archiving digital images is to store them as a pixel map in TIFF files. For an excellent overview of what a TIFF file is and how it's structured, see this page at fileformat.info. The command-line tool exifprobe will report the metadata in the TIFF as well as the begining of the image data proper. Here's part of the very detailed exifprobe output on the uncompressed TIFF version of the 325x325 pixel cat image shown above (the cat's name is Jez):

images3

The TIFF image file "Jez.tif" is 321450 bytes, of which 316875 is image data; the remaining 4575 bytes is mostly readable metadata. Note that you can't easily read the image data as text, as it follows the somewhat complicated structuring specified for the TIFF format.

An alternative is the PPM format, which has a much smaller header with metadata but a much larger image data section. "Jez.ppm" is 1204560 bytes as an ASCII file, and here's the entire header:

P3
325 325
255

"P3" identifies the file as a PPM ASCII file with RGB color. The width is "325" pixels, the height is "325" pixels and "255" is the maximum color value (i.e., R, G and B values each in the range 0-255).

Following this introduction is the image data, with each pixel mapped as 3 color values, from the top left pixel to the bottom right pixel. In "Jez.ppm" the top left pixel has the RGB values 74 56 36, the next to the right has 78 60 40, and so on. Keeping to the recommended maximum of 70 characters per line for PPM text, the beginning of "Jez.ppm" could look like this:

P3
325 325
255
74 56 36  78 60 40  78 60 40  75 57 37  74 56 34  73 55 33  71 53 31

but the layout of the header and the RGB values isn't critical for most PPM-reading programs, like ImageMagick:

images4 images5

I wonder which format, TIFF or PPM, is better for the long term (centuries)? I suspect that an ASCII PPM would be easier to decode by a future archivist, but that's unknowable. A digital "Rosetta stone" with TIFF, PPM and other digital representations of the same image would help the archivist understand the different image formats. It would be even more helpful if the "Rosetta stone" image files were accompanied by an analog version printed on a durable material.


Last update: 2024-07-04
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License