ROOT.TString as Python dict key

Python dictionaries allow access to its data items via keys. If you store two numbers with the same key, the last call will overwrite the former number. For a research project, I worked on a script, that had a list of ROOT objects, possibly with duplicates. The objects are uniquely identified by their name. The name is returned by a custom method called getName(). The task was to eliminate all duplicates. So, In a single line of Python code, I put all the objects in a dictionary with the object’s names as keys, thus removing all duplicates.

unique_objs = {obj.getName(): obj for obj in all_objs}

The unit tests for this part of the script failed. It looked as if there are still duplicates in unique_objs. So I started pdb to have a look at what is going on. The output was something like this:

(Pdb) print(unique_objs)
{'ObjectName1': <ROOT.TObject object ("TObject") at 0x40e70a0>,
 'ObjectName1': <ROOT.TObject object ("TObject") at 0x40f6860>}

How can this be? Two distinct objects with two different locations in memory stored with the same key? Which object does unique_objs["ObjectName1"] return? Well… neither:

(Pdb) print(unique_objs["ObjectName1"])
*** KeyError: 'ObjectName1'

The problem here is that the keys are not plain python strings. The method getName() that I have used to retrieve the key names returns them as TStrings–ROOT’s own string representation. To reproduce the issue, consider the following example.

import ROOT

# First, create two identical TStrings
s1 = ROOT.TString("some_string")
s2 = ROOT.TString("some_string")

# Check that they are identical
assert s1 == s2

# Use them as dict keys
d = {s1: 0, s2: 0}

# Inspect the result
print(d)

If you run this, you see a dictionary with seemingly two identical dictionary keys.

$ python3 tstring_key.py
{'some_string': 0, 'some_string': 0}

How does that work, even though s1 == s2?

Dictionary keys must be hashable objects. Dictionary items are looked up via the hash() of the key object. Some python objects are not hashable, e.g., lists, and therefore cannot be dictionary keys. Regular strings in python are interned, and hash("some_string") == hash("some_string") is always true. However, for TStrings, this is not true.

>>> hash(s1)
8777496775913
>>> hash(s2)
8777512141515

For the dictionary, the two TStrings are different objects since their hash() values are different, regardless of the actual string.

The python documentation explains the term hashable as:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

So the fact that for the two TStrings from above

s1 == s2 is true,
but hash(s1) == hash(s2) is false

is a clear violation of the above statement. Therefore, this behavior is an actual bug of PyROOT (or ROOT itself).

Finally, the bizarre printout with two identical dictionary keys in the same dictionary is possible because TStrings implementation for repr() makes them indistinguishable for standard python strings.

My conclusion is: Never use TStrings as dictionary keys.