If you've used Semaphor, you've probably noticed a different kind of optical code used for setting up accounts on mobile devices. It looks kind of like a QR code, but it's made out of the triangular shapes that make up the SpiderOak product branding design. It looks like this:
But how does that work? How does the receiver detect and decode the symbols? What does it actually contain? Let's dive in by first exploring the fundamentals of optical scanning. Then we'll get into all the gritty details of the Semaphor Code.
Before we look at 2D codes, let's talk about how one-dimensional codes work. For example, the ubiquitous Universal Product Code, or UPC. It encodes a 12 digit number as a series of light and dark bars. When a UPC is scanned, what's actually happening is a line is scanned across the image, and a sequence of light/dark segments is reported.
If we treat the black bars as a high level and white as low, the signal returned as we scan left-to-right across that line will look like this:
At the beginning and end are a marker pattern. This delimits the bounaries of the code and gives a reference point to calculate the pulse resolution. Once you have the boundaries and the resolution, you can turn the signal into a sequence of bits and then into the numbers in the UPC. Here's how those regions look:
The start/end markers are an important feature, but it's also important that the start/end marker pattern doesn't occur in the data. If it did, the data could be mistaken for the start/end and you'd get a garbled result. Also, the 12th digit is actually a check digit used to detect a misread code. There are a lot more interesting details I won't go into but if you're really interested, Wikipedia has a good explanation.
Let's move on to 2D codes and explore a QR code. Here's a QR code that encodes the same 12 digits as the above UPC. It has many of the same features.
The three boxes in the corners are locators, and they're kind of clever. The thing that surprised me when learning about QR (by reading the commonly used and cleverly named Zebra Crossing barcode scanning library) is that 2D codes are built on the fundamentals of 1D scanning. You start by just scanning line-by-line down your image. Those square-in-a-square locators will allow you to find their center point looking for a 1:1:1:3:1:1:1 pattern of alternating black and white in the horizontal and vertical directions. Now that you know where the corners are, you can map a grid and read the data. And just like UPC, those locators don't appear in the data.
QR also has error correction (in the form of variable Reed-Solomon encoding) that unlike UPC's check digit can recover from damaged data. This is why you see a lot of QR codes with logos in the middle - they exploit this self-correction. And there are a lot more details you can learn about in the QR Code Wikipedia page.
Now that you know what to look for, you might notice some of these same features in the image at the head of this article. Here it is in detail:
The image is a 9x9 grid of cells with three corner locators, an empty cell in the middle, and an empty cell next to the third locator. Each cell is divided corner-to-corner into four triangles. Since any of the four triangles can be on or off, that gives us 16 possible symbols:
The hourglass symbols (13 and 14) are used as locators because they have an interesting property - if you scan two perpendicular lines through the center, the vertical line will be one color and the horizontal line will be the opposite color.
Just like the squares of a QR code, we can use this to determine the location of the corners and thus the dimensions of the grid. Since this hourglass property could be true of other symbol combinations (like any of #5-8 corner-to-corner) we additionally check for an empty space in the center of the image to confirm. This isn't foolproof, so the scanner will try several different options if more than three locators are found. The scan is ultimately confirmed by successful data decoding, which is described below.
Once we have three locators, we can form a grid and sample the four triangles in each cell in a similar fashion to the QR code.
So how does this array of symbols become bytes? It's not as straightforward as a sequence of bars or squares, and involves a bit of math.
The data region contains 76 symbols out of the 81 tiles. The data is encoded using only use symbols 1-12 to avoid confusion with the locators. Symbol 15 (all black) is unused.
Twelve symbols is an awkward amount for encoding binary data. It's somewhere between three bits (eight symbols) and four bits (sixteen symbols). Using eight symbols would waste a third of our symbol space. We know intuitively that those other four symbols encode some information, but how do we encode a partial number of bits?
The solution we've used is to combine symbols into "words" that encode something closer to a whole number of bits. Our encoding uses words of four symbols, which encode 124 possibilities, or a little more than 14 bits. So rather than getting 12 bits out of 4 symbols, we're getting 14.
Why not keep going and use even larger words? Well, we want the number of words to divide evenly into 76 symbols, and we want to limit error propagation within a word. One misread or damaged symbol will cascade and cause errors in later decoded bits. If you extended the idea to the entire array of symbols, you'd basically have arithmetic coding - near-perfect encoding efficiency but it requires total symbol accuracy. It's just not feasible for an optical code.
So our 76 symbols turn into 266 bits - 33 bytes and two bits. The extra two bits are thrown away, then the 33 bytes are run through a Reed Solomon error correcting algorithm. 12 bytes are used for parity checking, leaving 21 data bytes.
What's actually stored in those bytes? The first byte is the type of code. We've only ever used 0 and 1 - login and peer fingerprint verification, respectively. The login code stores a "rendezvous" code for key sharing. The peer fingerprint is just an account key fingerprint used to verify a peer's keys, way back when Semaphor used to do that.
Oh, and what does the empty cell next to the third locator do? Nothing. It's reserved space intended to indicate a future version of the code.
As you can see, optical encoding of data has unique challenges, but it's not as hard as it seems. It requires only the methodical application of simple pattern recognition and a little bit of information theory. Thanks for reading. :)