After following Laurie Kirk down a rabbit hole on subnormal numbers in the IEEE 754 float specification, I stumbled upon other interesting properties of floating-point numbers, specifically how NaNs (Not a Number) are represented in binary. After more than 10 years of scientific computing and data science, I thought there was nothing about floats that could surprise me, but oh, was I wrong. Let’s see if I can surprise you. I’ve built the computer-science equivalent of a magic trick to showcase these properties.
The magic trick
The trick works in two stages:
- You choose a phrase of your liking. With a special Python function, you can convert it into a numpy array of
NaN
s. It’s a normal array. It’s normal NaNs. Your phrase is nowhere to be seen. - You send the numpy array to an API endpoint at
https://magicfloat.sauerburger.io/unravel
. Using advanced magic (knowledge of IEEE 754), I can unravel your secrets by looking at the array of NaNs.
Step one: Enchanting your phrase
import numpy as np
def enchant(phrase: str) -> np.ndarray:
return np.frombuffer(b"".join([
bytes([x]) + b"\xff\x80\x7f" for x in phrase.encode("utf-8")
]), dtype=np.float32)
If you call that with "Computers are fun!"
, you get a numpy array of floats with no signs of the phrase. It seems the message is gone.
box = enchant("Computers are fun!")
>>> box
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan], dtype=float32)
>>> box.shape
(18,)
>>> box.dtype
dtype('float32')
Step two: Open the magic NaN array
I’m providing an API endpoint at https://magicfloat.sauerburger.io/unravel that takes the binary version of the numpy array and responds with your original phrase. The following function does the necessary encoding and request handling.
import requests
def unravel(box: np.ndarray) -> str:
if box.dtype != np.float32:
raise ValueError("Magic box must be float32.")
response = requests.post("https://magicfloat.sauerburger.io/unravel", data=box.tobytes())
if not response.ok:
raise RuntimeError("The planets don't seem to align: %s" % response.text)
return response.text
If we continue the example from above, we get: drum roll
>>> unravel(box)
'Computers are fun!'
How does it work?
Floating-point numbers are represented using three components,
- the sign of the numbers,
- the exponent used with base 2, and
- the fractional part of the number, the mantissa.
In memory, they are arranged as follows. The order might be different depending on the endianness of your platform.
x | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | |
Sign | Biased exponent (8 bits) | Mantissa (23 bits) |
A few combinations of bits have a special meaning, such as +inf
, -inf
, and NaN
.
When the exponent is all-ones (as shown in the chart), it represents one of the aforementioned three cases.
Sign | Biased exponent | Mantissa | Special meaning |
---|---|---|---|
0 | all ones: 1111 1111 |
all zero | +inf |
1 | all ones: 1111 1111 |
all zero | -inf |
any | all ones: 1111 1111 |
at least one bit not zero | NaN |
We observe that +inf
and -inf
each have a unique binary representation.
However, for NaN
, we have 2^24 - 1 possible binary representations.
For my little magic trick, I pack one UTF-8 encoded byte in each 32-bit float number.
I invite you to discover the details yourself.
This might also interest you