Magic floating-point numbers: NaNs

After following Laurie Kirk down a rabbit hole on subnormal numbers in the IEEE 754 float specification, I stumbled upon other interesting properties of floating-point numbers, specifically how NaNs (Not a Number) are represented in binary. After more than 10 years of scientific computing and data science, I thought there was nothing about floats that could surprise me, but oh, was I wrong. Let’s see if I can surprise you. I’ve built the computer-science equivalent of a magic trick to showcase these properties.

The magic trick

The trick works in two stages:

You choose a phrase of your liking. With a special Python function, you can convert it into a numpy array of NaNs. It’s a normal array. It’s normal NaNs. Your phrase is nowhere to be seen.
You send the numpy array to an API endpoint at https://magicfloat.sauerburger.io/unravel. Using advanced magic (knowledge of IEEE 754), I can unravel your secrets by looking at the array of NaNs.

Step one: Enchanting your phrase

import numpy as np

def enchant(phrase: str) -> np.ndarray:
    return np.frombuffer(b"".join([
        bytes([x]) + b"\xff\x80\x7f" for x in phrase.encode("utf-8")
    ]), dtype=np.float32)

If you call that with "Computers are fun!", you get a numpy array of floats with no signs of the phrase. It seems the message is gone.

box = enchant("Computers are fun!")
>>> box
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan], dtype=float32)

>>> box.shape
(18,)

>>> box.dtype
dtype('float32')

Step two: Open the magic NaN array

I’m providing an API endpoint at https://magicfloat.sauerburger.io/unravel that takes the binary version of the numpy array and responds with your original phrase. The following function does the necessary encoding and request handling.

import requests

def unravel(box: np.ndarray) -> str:
    if box.dtype != np.float32:
        raise ValueError("Magic box must be float32.")
    response = requests.post("https://magicfloat.sauerburger.io/unravel", data=box.tobytes())
    if not response.ok:
        raise RuntimeError("The planets don't seem to align: %s" % response.text)
    return response.text

If we continue the example from above, we get: drum roll

>>> unravel(box)
'Computers are fun!'

How does it work?

Floating-point numbers are represented using three components,

the sign of the numbers,
the exponent used with base 2, and
the fractional part of the number, the mantissa.

In memory, they are arranged as follows. The order might be different depending on the endianness of your platform.

Sign

Biased exponent (8 bits)

Mantissa (23 bits)

A few combinations of bits have a special meaning, such as +inf, -inf, and NaN. When the exponent is all-ones (as shown in the chart), it represents one of the aforementioned three cases.

Sign	Biased exponent	Mantissa	Special meaning
0	all ones: `1111 1111`	all zero	`+inf`
1	all ones: `1111 1111`	all zero	`-inf`
any	all ones: `1111 1111`	at least one bit not zero	`NaN`

We observe that +inf and -inf each have a unique binary representation. However, for NaN, we have 2^24 - 1 possible binary representations. For my little magic trick, I pack one UTF-8 encoded byte in each 32-bit float number. I invite you to discover the details yourself.