The interplantary filesystem (IPFS) is an interesting idea of a distributed web based on content addressing. In short, it’s a peer-to-peer network, objects can be located by their hash value. Public gateways provide easy access from regular client-server HTTP. The system has a bold name and some people even call it Web 3.0. When I first encountered the system, I was a bit overwhelmed by the multitude of different hash formats and their usage. This article serves as a summary to dissect commonly encountered hash patterns.

For this article, I’m using Python’s string and binary string notation.

The following links provide additional help, in case you get stuck decoding an identifier yourself:

Content IDs

Type 0, starting with Qm: CID v0

Consider the following example:

$ echo "Hello Frank" | ipfs add -q --cid-version 0
QmRQvNf2KugVgH7EpX4WviAvaWeQ3pRLGD4ibXmZTk2tGW
  "QmRQvNf2KugVgH7EpX4WviAvaWeQ3pRLGD4ibXmZTk2tGW"
= base58(b"\x12\x20\x2d\xab\x18\x99\x83\x24\xa8\xac...")
             ^   ^  '--------------------------------'
             |   |         = sha2-256(content)
             |   |
             |   *-- Length in bytes of following hash
             *------ Hash identifier: 0x12 = sha2-256

For CID v0 sha2-256 is the only allowed hash function.

The content here is not the string Hello Frank but, the Protocol Buffer encoded UnixFS node of the Protocol Buffer encoded Directed Acyclic Graph, i.e. Merkle Tree, of the content. In this case it’s:

  content
= b"\x0a\x12\x08\x02\x12\x0c\x48\x65\x6c\x6c\x6f\x20\x46\x72\x61\x6e\x6b\x0a\x18\x0c"
= PBNode(Data=b"\x08\x02\x12\x0cHello Frank\n\x18\x0c")
= PBNode(
    Data=unixfs(
      Data=b"Hello Frank\n",
      Type=File,
      filesize=12
    )
  )

In this example, the functions

denote the Protocol Buffer serialization. For larger files or even directories the Merkle Tree is significantly more complex.

The ipfs command line tool allows you to extract the unhashed, binary content with

ipfs dag get --output-codec=dag-pb QmRQvNf2KugVgH7EpX4WviAvaWeQ3pRLGD4ibXmZTk2tGW

CIDv0 always start with Qm... and are 46 characters long.

Type 1, starting with bafk: CID v1

$ echo "Hello Frank" | ipfs add -q --cid-version 1
bafkreiedi665akdjnucmzn4562yfdgducj3a2at4uryksgvmykfwponjnu
  "bafkreiedi665akdjnucmzn4562yfdgducj3a2at4uryksgvmykfwponjnu"
= multibase("base32", b"\x01\x55\x12\x20\x83\x47\xbd\xd0\x28\x69\x6d\x04...")
= "b" + base32(b"\x01\x55\x12\x20\x83\x47\xbd\xd0\x28\x69\x6d\x04...")
   ^               ^   ^   ^   ^  '--------------------------------'
   |               |   |   |   |     = sha2-256(b"Hello Frank\n")
   |               |   |   |   |
   |               |   |   |   *-- Length in bytes of following hash
   |               |   |   *------ Hash identifier: 0x12 = sha2-256
   |               |   *---------- Encoded content identifier: 0x55 = raw ipld data
   |               *-------------- CID version: 1
   *------------------------------ Multibase prefix signifying base32 encoding

In this case, the format of the CID is a bit more complex, but content passed to the hash function is just our data string b"Hello Frank\n".

Have a look at alternative multibase prefixes and alternative mutlicodec identifiers.

Multibase base32 encoded CID v1 with raw ipld data always start with bafk....

Multibase base32 encoded CID v1 with raw ipld data hashed with sha2-256 always start with bafkreia....

Peer IDs

All peer IDs are derived from the node’s public key.

Peer ID starting with Qm

  "QmUEMvxS2e7iDrereVYc5SWPauXPyNwxcy9BXZrC1QTcHE"
= base58(b"\x12\x20\x57\x89\xaa\x4d\xcc\x6a\x9f\xd0...")
             ^   ^  '--------------------------------'
             |   |       = sha2-256(crypto_pb(public key)
             |   |
             |   *-- Length in bytes of following hash
             *------ Hash identifier: \x12 = sha2-256

The crypto_pb denotes Protocol Buffer serialization encoding the raw public key and key algorithm (RSA, Ed22519, …) identifier.

The entry in the Distributed Hash Table (DHT) of the Kademlia algoritm is given by

sha2-256(b"\x12\x20\x57\x89\xaa\x4d\xcc\x6a\x9f\xd0...")

Peer ID starting with 12D3KooW...

  "12D3KooWRpLBJ9qhpztw7BsNHHSJYykk6KLU98XXTW2RH5VahVPk"
= base58(b"\x00\x24\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e...")
             ^   ^  '----------------------------------------'
             |   |         = crypto_pb(public key)
             |   |
             |   *--- Length of payload: 0x24 = 36 bytes
             *------ raw 

The crypto_pb denotes Protocol Buffer serialization encoding the raw public key and key algorithm (RSA, Ed22519, …) identifier.

The entry in the Distributed Hash Table (DHT) of the Kademlia algoritm is given by

sha2-256(b"\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e...")

Every 256-bit Ed22519 public key with advanced inline encoding starts with 12DKooW....

IPNS

Now let’s look at the last type of identifier, the encoded public keys used for IPNS.

  "k51qzi5uqu5dm3vyitxyk83l6xzeu13y15329dsteml9ddloi9rffof2upca1z"
= multibase("base36", b"\x01\x72\x00\x24\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e...")
= "k" + base36(b"\x01\x72\x00\x24\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e...")
   ^               ^   ^  '------------------------------------------------'
   |               |   |       = multicodec("raw", crypto_pb(public key))
   |               |   |
   |               |   *---------- Encoded content identifier: 0x72 = libp2p-key
   |               *-------------- CID version
   *------------------------------ Multibase prefix signifying base36 encoding
b"\x00\x24\x08\x01\x12\x20\xed\xb8\xbd\x7c\x46\x2e..."
    ^   ^  '----------------------------------------'
    |   |           = crypto_pb(public key)
    |   |
    |   *-- Length encoded data: 0x24
    *------ Raw binary codec

The crypto_pb denotes Protocol Buffer serialization encoding the raw public key and key algorithm (RSA, Ed22519, …) identifier.