The interplantary filesystem (IPFS) is an interesting idea of a distributed web based on content addressing. In short, it’s a peer-to-peer network, objects can be located by their hash value. Public gateways provide easy access from regular client-server HTTP. The system has a bold name and some people even call it Web 3.0. When I first encountered the system, I was a bit overwhelmed by the multitude of different hash formats and their usage. This article serves as a summary to dissect commonly encountered hash patterns.
For this article, I’m using Python’s string and binary string notation.
The following links provide additional help, in case you get stuck decoding an identifier yourself:
Content IDs
Type 0, starting with Qm
: CID v0
Consider the following example:
$ echo "Hello Frank" | ipfs add -q --cid-version 0
QmRQvNf2KugVgH7EpX4WviAvaWeQ3pRLGD4ibXmZTk2tGW
"QmRQvNf2KugVgH7EpX4WviAvaWeQ3pRLGD4ibXmZTk2tGW"
= base58(b"\x12\x20\x2d\xab\x18\x99\x83\x24\xa8\xac...")
^ ^ '--------------------------------'
| | = sha2-256(content)
| |
| *-- Length in bytes of following hash
*------ Hash identifier: 0x12 = sha2-256
For CID v0 sha2-256 is the only allowed hash function.
The content
here is not the string Hello Frank
but, the
Protocol Buffer encoded
UnixFS node of the Protocol Buffer encoded Directed Acyclic Graph, i.e. Merkle Tree,
of the content. In this case it’s:
content
= b"\x0a\x12\x08\x02\x12\x0c\x48\x65\x6c\x6c\x6f\x20\x46\x72\x61\x6e\x6b\x0a\x18\x0c"
= PBNode(Data=b"\x08\x02\x12\x0cHello Frank\n\x18\x0c")
= PBNode(
Data=unixfs(
Data=b"Hello Frank\n",
Type=File,
filesize=12
)
)
In this example, the functions
denote the Protocol Buffer serialization. For larger files or even directories the Merkle Tree is significantly more complex.
The ipfs
command line tool allows you to extract the unhashed, binary content with
ipfs dag get --output-codec=dag-pb QmRQvNf2KugVgH7EpX4WviAvaWeQ3pRLGD4ibXmZTk2tGW
CIDv0 always start with Qm...
and are 46 characters long.
Type 1, starting with bafk
: CID v1
$ echo "Hello Frank" | ipfs add -q --cid-version 1
bafkreiedi665akdjnucmzn4562yfdgducj3a2at4uryksgvmykfwponjnu
"bafkreiedi665akdjnucmzn4562yfdgducj3a2at4uryksgvmykfwponjnu"
= multibase("base32", b"\x01\x55\x12\x20\x83\x47\xbd\xd0\x28\x69\x6d\x04...")
= "b" + base32(b"\x01\x55\x12\x20\x83\x47\xbd\xd0\x28\x69\x6d\x04...")
^ ^ ^ ^ ^ '--------------------------------'
| | | | | = sha2-256(b"Hello Frank\n")
| | | | |
| | | | *-- Length in bytes of following hash
| | | *------ Hash identifier: 0x12 = sha2-256
| | *---------- Encoded content identifier: 0x55 = raw ipld data
| *-------------- CID version: 1
*------------------------------ Multibase prefix signifying base32 encoding
In this case, the format of the CID is a bit more complex, but content passed to
the hash function is just our data string b"Hello Frank\n"
.
Have a look at alternative multibase prefixes and alternative mutlicodec identifiers.
Multibase base32 encoded CID v1 with raw ipld data always start with bafk...
.
Multibase base32 encoded CID v1 with raw ipld data hashed with sha2-256 always start with bafkreia...
.
Peer IDs
All peer IDs are derived from the node’s public key.
Peer ID starting with Qm
"QmUEMvxS2e7iDrereVYc5SWPauXPyNwxcy9BXZrC1QTcHE"
= base58(b"\x12\x20\x57\x89\xaa\x4d\xcc\x6a\x9f\xd0...")
^ ^ '--------------------------------'
| | = sha2-256(crypto_pb(public key)
| |
| *-- Length in bytes of following hash
*------ Hash identifier: \x12 = sha2-256
The
crypto_pb
denotes Protocol Buffer serialization encoding the raw public key and key algorithm (RSA, Ed22519, …) identifier.
The entry in the Distributed Hash Table (DHT) of the Kademlia algoritm is given by
sha2-256(b"\x12\x20\x57\x89\xaa\x4d\xcc\x6a\x9f\xd0...")
Peer ID starting with 12D3KooW...
"12D3KooWRpLBJ9qhpztw7BsNHHSJYykk6KLU98XXTW2RH5VahVPk"
= base58(b"\x00\x24\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e...")
^ ^ '----------------------------------------'
| | = crypto_pb(public key)
| |
| *--- Length of payload: 0x24 = 36 bytes
*------ raw
The
crypto_pb
denotes Protocol Buffer serialization encoding the raw public key and key algorithm (RSA, Ed22519, …) identifier.
The entry in the Distributed Hash Table (DHT) of the Kademlia algoritm is given by
sha2-256(b"\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e...")
Every 256-bit Ed22519 public key with advanced inline encoding starts with 12DKooW...
.
IPNS
Now let’s look at the last type of identifier, the encoded public keys used for IPNS.
"k51qzi5uqu5dm3vyitxyk83l6xzeu13y15329dsteml9ddloi9rffof2upca1z"
= multibase("base36", b"\x01\x72\x00\x24\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e...")
= "k" + base36(b"\x01\x72\x00\x24\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e...")
^ ^ ^ '------------------------------------------------'
| | | = multicodec("raw", crypto_pb(public key))
| | |
| | *---------- Encoded content identifier: 0x72 = libp2p-key
| *-------------- CID version
*------------------------------ Multibase prefix signifying base36 encoding
b"\x00\x24\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e..."
^ ^ '----------------------------------------'
| | = crypto_pb(public key)
| |
| *-- Length encoded data: 0x24
*------ Raw binary codec
The
crypto_pb
denotes Protocol Buffer serialization encoding the raw public key and key algorithm (RSA, Ed22519, …) identifier.
This might also interest you