Python dictionaries allow access to its data items via keys. If you store two
numbers with the same key, the last call will overwrite the former number. For
a research project, I worked on a script, that had a list of
ROOT objects, possibly
with duplicates. The objects are uniquely identified by their name. The name is
returned by a custom method called getName()
. The task was
to eliminate all duplicates. So, In a single line of Python code, I put all the
objects in a dictionary with the object’s names as keys, thus removing all
duplicates.
unique_objs = {obj.getName(): obj for obj in all_objs}
The unit tests for this part of the script failed. It looked as if
there are still duplicates in unique_objs
. So I started pdb
to have a look
at what is going on. The output was something like this:
(Pdb) print(unique_objs)
{'ObjectName1': <ROOT.TObject object ("TObject") at 0x40e70a0>,
'ObjectName1': <ROOT.TObject object ("TObject") at 0x40f6860>}
How can this be? Two distinct objects with two different locations in memory
stored with the same key? Which object does unique_objs["ObjectName1"]
return?
Well… neither:
(Pdb) print(unique_objs["ObjectName1"])
*** KeyError: 'ObjectName1'
The problem here is that the keys are not plain python strings. The method getName()
that I have used to retrieve the key names returns them as TString
s–ROOT’s own
string representation. To reproduce the issue, consider the following example.
import ROOT
# First, create two identical TStrings
s1 = ROOT.TString("some_string")
s2 = ROOT.TString("some_string")
# Check that they are identical
assert s1 == s2
# Use them as dict keys
d = {s1: 0, s2: 0}
# Inspect the result
print(d)
If you run this, you see a dictionary with seemingly two identical dictionary keys.
$ python3 tstring_key.py
{'some_string': 0, 'some_string': 0}
How does that work, even though s1 == s2
?
Dictionary keys must be hashable
objects.
Dictionary items are looked up via the
hash()
of the key object. Some python objects are not hashable, e.g., lists,
and therefore cannot be dictionary keys.
Regular strings in python are
interned, and hash("some_string")
== hash("some_string")
is always true. However, for TString
s, this is not true.
>>> hash(s1)
8777496775913
>>> hash(s2)
8777512141515
For the dictionary, the two TString
s are different objects since their
hash()
values are different, regardless of the actual string.
The python documentation explains the term hashable as:
An object is hashable if it has a hash value which never changes during its lifetime (it needs a
__hash__()
method), and can be compared to other objects (it needs an__eq__()
method). Hashable objects which compare equal must have the same hash value.
So the fact that for the two TString
s from above
s1 == s2
is true,- but
hash(s1) == hash(s2)
is false
is a clear violation of the above statement. Therefore, this behavior is an actual bug of PyROOT (or ROOT itself).
Finally, the bizarre printout with two
identical dictionary keys in the same dictionary is possible because TString
s
implementation for repr()
makes them indistinguishable for standard python
strings.
My conclusion is: Never use TString
s as dictionary keys.
This might also interest you