For a list of BASHing data 2 blog posts see the index page.
DataMatrix codes and data content
The collection manager at a US museum recently shared her experience with very small stick-on labels. The labels needed to have a barcode for machine scanning and the corresponding printed code for eyeball scanning. The code would represent a unique ID and would have a millions-of-IDs capacity.
Her solution was a 5-character ASCII code, base36 encoding of decimal numbers, and DataMatrix barcoding:
If you save the image above as a local file, you can read the barcode using the CLI tool dmtxread, from the dmtx-utils package (website):
There are two easy ways on the command line to convert that base36 code to decimal:
And a simple decimal-to-base36 function was suggested by Nicholas Dunbar:
function decimal_to_base36(){
BASE36=($(echo {0..9} {A..Z}));
arg1=$@;
for i in $(bc <<< "obase=36; $arg1"); do
echo -n ${BASE36[$(( 10#$i ))]}
done && echo
}
But getting back to the dmtxread output, what are "data codewords" and "error codewords" in that barcode, and could the barcode be made even smaller if only 5 characters are being coded?
Two excellent resources on DataMatrix barcoding are its Wikipedia page and the GS1 guide. The museum's barcode is 12 rows x 12 columns. The data region in the barcode is the central 10 rows x 10 columns, which (according to the standard for square DataMatrix barcodes) can hold 5 data codewords and 7 error-correcting codewords. Each data or error-correcting codeword is 1 byte.
The "encoding" of bytes into black-and-white squares (1 bit each) in the barcode is complicated, and interested readers should check out the relevant section of the Wikipedia article.
The minimum size of a DataMatrix barcode is 10x10 with a data region of 8x8 and only 3 data codewords allowed (plus 5 error-correcting codewords) (See Table 1-1 in the GS1 guide). The 12x12 barcode chosen by the museum is the smallest that could be used for a 5-character ASCII code.
Could you get millions of ID codes out of 3 characters? No. The ASCII set used for DataMatrix barcoding is the one specified in the ISO/IEC 646 standard, whose code page is shown in this Wikipedia article. There are 94 printable characters on the page (plus invisible control characters), so the upper limit is 94 x 94 x 94 = 830,584 3-character ID codes. Here's what the 3-character code "W6x" looks like when barcoded as a minimal 10x10 matrix and displayed with feh:
Restricting 3-character ID codes to the 36 base36 characters (0-9, A-Z) means even fewer possibilities: 36 x 36 x 36 = 46,656. Going to 5 characters increases the coding set to 36 x 36 x 36 x 36 x 36 = 60,466,176 possible combinations.
Last update: 2024-04-19
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License