The release of Python 3.10 is expected on October 4, 2021. As with previous releases, this release will introduce new language features and constructs. I’m looking forward to one special new feature in the upcoming release, namely Pattern Matching. If you have experience in a functional programming language, like Haskell, or a language with support for programming in a functional style like Scala (just to name two), pattern matching is one of the standard items in your toolbox. However, if you are not yet familiar with the concept or its implementation in Python, this article is for you. I will walk you through two examples in increasing complexity to whet your appetite until it’s finally released.
It’s not yet released?
How can I demonstrate this new language feature when it’s not yet released? Well, you can get a glimpse of the new feature by using an early version of the release. As of this writing, there are Python 3.10 alpha versions available on Docker Hub. Beta versions and release candidates should follow.
For example, to play with the code in this article interactively, run
$ docker run --rm -it python:3.10.0a7-buster
Simple matching
On a superficial level, pattern matching is a glorified match
statement,
that looks very similar to the switch ... case
statement in Java or C/C++.
You match a variable against several cases in order, and get the behaviour
defined in the first matching case clause. There is no fall-through in Python.
Consider the following function.
def speak(number):
"""Print a translation of number"""
match number:
# Indentation for visual clarity
case 0: print("nothing")
case 1: print("one")
case 2: print("two")
case 3: print("many")
case x if x > 0: print("lots")
case _: print("??")
The function uses a single match statement to print a verbal translation of
the given number. The first four cases (0
- 3
) look pretty unsurprising and do
exactly what you would expect for a switch
in Java or C/C++. If the input
is, let’s say, 1
, the function prints “one”. If the input is 2
, it prints
“two”.
For the lots
case, it gets more interesting. Every given number is matched
against the pattern x
with an additional condition. Any number matches the
sole name capture x
. This means the number is available as variable x
for
the remainder of the pattern and the case. The additional condition limits the
execution to positive numbers.
The order of the cases resolves the ambiguity with numbers 0, 1, 2, and 3. The
nothing
, one
, two
, and many
cases take precedence over lots
.
The lots
case matches any number greater or equal to 4.
The last case matches anything and therefore corresponds to a default
case
in some languages. If none of the above patterns match the given expression,
the default case is executed. In our case, speak(-1)
yields ??
.
Structural pattern matching
So far, we’ve only scratched the surface of pattern matching in Python. Pattern matching entails so much more than comparing integers with additional conditions. You can match an expression against composite patterns using built-in classes, like dictionaries, list or dataclasses (new in Python 3.7). In the following example, I will focus on dataclasses and built a simple system to represent and simplify symbolic, arithmetic expressions.
The example is borrowed from the book Programming in Scala by Martin Odersky et al. (Scala has excellent support for pattern matching even extending the feature set of Python, e.g., see Extractors).
Let’s start by defining a few dataclasses to represent numbers, variables,
binary operations (as in a + b
), and unary operations (as in -a
). Each class
is a subclass of the generic Expression
class.
from dataclasses import dataclass
class Expression:
"""Generic expression"""
@dataclass
class BinaryOp(Expression):
op: str
left: Expr
right: Expr
@dataclass
class UnaryOp(Expression):
op: str
arg: Expr
@dataclass
class Number(Expression):
value: float
@dataclass
class Variable(Expression):
name: str
With these classes set up, we are now able to define expressions.
a = Variable("a")
b = Variable("b")
c = Variable("c")
# c*c = a*a + b*b
pythagoras = BinaryOp("=", BinaryOp("*", c, c),
BinaryOp("+", BinaryOp("*", a, a),
BinaryOp("*", b, b)))
# (a + 1) * b
score = BinaryOp("*", BinaryOp("+", a, Number(1)), b)
Granted, this is not the nicest way to define the expressions. We will deal with this aspect later. For now, it is important to have a data structure to store and represent simple mathematical expressions.
Assume it is your task to implement the function simplify(expr)
that takes an
arbitrary expression as input, applies a set of simplification rules, and returns
the result. For example, any occurrence of
- \(+e\) should be replaced with just \(e\),
- \(e + 0\) should be replaced with just \(e\),
- \(0 + e\) should be replaced with just \(e\),
- \(e * 1\) should be replaced with just \(e\),
- \(1 * e\) should be replaced with just \(e\),
- \(-(-e)\) should be replaced with just \(e\),
- \(e + (-f)\) should be replaced with \(e - f\),
- and so on…
where \(e\) and \(f\) are arbitrary expressions. Take a minute and picture in your head how you would write such a function without pattern matching. It would probably involve recursive calls, and I bet it would involve a lot of code like this:
if isinstance(left, ...) and left.op == "+" and isinstance(right, ...) ...:
...
elif isinstance(...) and ...:
...
...
It would be difficult to code. It would be difficult to read. It would be
difficult to maintain. In short: it would be ugly code. Structural pattern
matching gives us a much cleaner way to implement simplify()
. Consider:
def simplify(expr):
match expr:
case UnaryOp("+", x): return simplify(x)
case BinaryOp("+", x, Number(0)): return simplify(x)
case BinaryOp("+", Number(0), x): return simplify(x)
case BinaryOp("-", x, Number(0)): return simplify(x)
case BinaryOp("-", Number(0), x): return simplify(x)
case BinaryOp("*", x, Number(1)): return simplify(x)
case BinaryOp("*", Number(1), x): return simplify(x)
case BinaryOp("*", x, Number(0)): return Number(0)
case BinaryOp("*", Number(0), x): return Number(0)
case BinaryOp("+", x, UnaryOp("-", y)):
return simplify(BinaryOp("-", x, y))
case BinaryOp("-", x, UnaryOp("-", y)):
return simplify(BinaryOp("+", x, y))
case UnaryOp("-", UnaryOp("-", x)):
return simplify(x)
case UnaryOp("-", UnaryOp("+", x)):
return simplify(UnaryOp("-", x))
case UnaryOp("+", UnaryOp("-", x)):
return simplify(UnaryOp("-", x))
case UnaryOp(op, x):
return UnaryOp(op, simplify(x))
case BinaryOp(op, x, y):
return BinaryOp(op, simplify(x), simplify(y))
case _: return expr
The above implementation is relatively concise and covers even more simplification rules than were initially required, but how does it work?
In the simple integer matching example above, we saw that a pattern could be a number (optionally with a condition). The pattern syntax, however, is much more expressive. We can use our dataclasses and just write what we would like to match:
The first case case UnaryOp("+", x):
matches any expression that is a unary
expression of the "+"
operator. The argument is arbitrary. The x
in the
pattern is a name capture that makes the operator’s argument available
within the case statement. For the expression \(+x\), we take the
argument x
and pass it recursively to simplify()
.
The way the patterns are specified closely resembles what we specified in the
initial set of requirements for the function simplify()
. Let’s take the
mathematical rule \(e + 0\), for example. This is translated to the pattern
BinaryOp("+", x, Number(0))
in Python. There is nothing magical about this
pattern. It is the formulation of \(e + 0\) using our dataclasses.
The name capture x
encodes the fact that \(e\) is an arbitrary expression.
We see in this example also that the structural pattern matching is not
limited to a single class and its arguments. The right
argument is yet
another dataclass instance, Number(0)
, representing the number \(0\). In this case it looks
trivial, but the concept can be applied to more complex patterns.
Consider the implementation for the \(e + (-f)\) rule. We want to match a sum
(i.e. BinaryOp("+", left, right)
) where the right operand is the negation of
an expression (i.e. UnaryOp("-", arg)
) the corresponding case statement with
it’s pattern is then
case BinaryOp("+", x, UnaryOp("-", y)):
... # x and y are available
Wrap up
You’ve seen two simple examples that showcased two specific features of pattern matching in Python:
- Simple patterns with conditions and
- Structural patterns with dataclasses.
Besides these two features, the specification is much more extensive. I recommend having a look at that document.
Earlier, I’ve promised that we will make defining expressions from the second
example more convenient. We could define overloaded operator methods in the
Expression
class. Additionally, the function parse()
converts
numbers to Number
instances and strings to Variables
.
class Expression:
def __add__(self, other):
return BinaryOp("+", self, parse(other))
def __sub__(self, other):
return BinaryOp("-", self, parse(other))
def __mul__(self, other):
return BinaryOp("*", self, parse(other))
def __neg__(self):
return UnaryOp("-", self)
def parse(expr):
match expr:
case int(a) | float(a): return Number(a)
case str(n): return Variable(n)
case _: return expr
The two expressions from above thus become:
a = Variable("a")
b = Variable("b")
c = Variable("c")
pythagoras = BinaryOp("=", c*c, a*a + b*b)
# (a + 1) * Δ
score = (a + 1) * "delta"
Much nicer, right?
We haven’t actually used the simplify()
function. This is
left as an exercise to the reader. What does simplify((Variable("a") + 1) * 0)
return?
This might also interest you