Pattern Matching in Python 3.10

The release of Python 3.10 is expected on October 4, 2021. As with previous releases, this release will introduce new language features and constructs. I’m looking forward to one special new feature in the upcoming release, namely Pattern Matching. If you have experience in a functional programming language, like Haskell, or a language with support for programming in a functional style like Scala (just to name two), pattern matching is one of the standard items in your toolbox. However, if you are not yet familiar with the concept or its implementation in Python, this article is for you. I will walk you through two examples in increasing complexity to whet your appetite until it’s finally released.

It’s not yet released?

How can I demonstrate this new language feature when it’s not yet released? Well, you can get a glimpse of the new feature by using an early version of the release. As of this writing, there are Python 3.10 alpha versions available on Docker Hub. Beta versions and release candidates should follow.

For example, to play with the code in this article interactively, run

$ docker run --rm -it  python:3.10.0a7-buster

Simple matching

On a superficial level, pattern matching is a glorified match statement, that looks very similar to the switch ... case statement in Java or C/C++. You match a variable against several cases in order, and get the behaviour defined in the first matching case clause. There is no fall-through in Python. Consider the following function.

def speak(number):
    """Print a translation of number"""
    match number:
                          # Indentation for visual clarity
        case 0:           print("nothing")
        case 1:           print("one")
        case 2:           print("two")
        case 3:           print("many")
        case x if x > 0:  print("lots")
        case _:           print("??")

The function uses a single match statement to print a verbal translation of the given number. The first four cases (0 - 3) look pretty unsurprising and do exactly what you would expect for a switch in Java or C/C++. If the input is, let’s say, 1, the function prints “one”. If the input is 2, it prints “two”.

For the lots case, it gets more interesting. Every given number is matched against the pattern x with an additional condition. Any number matches the sole name capture x. This means the number is available as variable x for the remainder of the pattern and the case. The additional condition limits the execution to positive numbers. The order of the cases resolves the ambiguity with numbers 0, 1, 2, and 3. The nothing, one, two, and many cases take precedence over lots. The lots case matches any number greater or equal to 4.

The last case matches anything and therefore corresponds to a default case in some languages. If none of the above patterns match the given expression, the default case is executed. In our case, speak(-1) yields ??.

Structural pattern matching

So far, we’ve only scratched the surface of pattern matching in Python. Pattern matching entails so much more than comparing integers with additional conditions. You can match an expression against composite patterns using built-in classes, like dictionaries, list or dataclasses (new in Python 3.7). In the following example, I will focus on dataclasses and built a simple system to represent and simplify symbolic, arithmetic expressions.

The example is borrowed from the book Programming in Scala by Martin Odersky et al. (Scala has excellent support for pattern matching even extending the feature set of Python, e.g., see Extractors).

Let’s start by defining a few dataclasses to represent numbers, variables, binary operations (as in a + b), and unary operations (as in -a). Each class is a subclass of the generic Expression class.

from dataclasses import dataclass

class Expression:
    """Generic expression""" 

@dataclass
class BinaryOp(Expression):
    op: str
    left: Expr
    right: Expr

@dataclass
class UnaryOp(Expression):
    op: str
    arg: Expr

@dataclass
class Number(Expression):
    value: float

@dataclass
class Variable(Expression):
    name: str

With these classes set up, we are now able to define expressions.

a = Variable("a")
b = Variable("b")
c = Variable("c")

# c*c = a*a + b*b
pythagoras = BinaryOp("=", BinaryOp("*", c, c),
                           BinaryOp("+", BinaryOp("*", a, a),
                                         BinaryOp("*", b, b)))

# (a + 1) * b
score = BinaryOp("*", BinaryOp("+", a, Number(1)), b)

Granted, this is not the nicest way to define the expressions. We will deal with this aspect later. For now, it is important to have a data structure to store and represent simple mathematical expressions.

Assume it is your task to implement the function simplify(expr) that takes an arbitrary expression as input, applies a set of simplification rules, and returns the result. For example, any occurrence of

\(+e\) should be replaced with just \(e\),
\(e + 0\) should be replaced with just \(e\),
\(0 + e\) should be replaced with just \(e\),
\(e * 1\) should be replaced with just \(e\),
\(1 * e\) should be replaced with just \(e\),
\(-(-e)\) should be replaced with just \(e\),
\(e + (-f)\) should be replaced with \(e - f\),
and so on…

where \(e\) and \(f\) are arbitrary expressions. Take a minute and picture in your head how you would write such a function without pattern matching. It would probably involve recursive calls, and I bet it would involve a lot of code like this:

if isinstance(left, ...) and left.op == "+" and isinstance(right, ...) ...:
    ...
elif isinstance(...) and ...:
    ...
...

It would be difficult to code. It would be difficult to read. It would be difficult to maintain. In short: it would be ugly code. Structural pattern matching gives us a much cleaner way to implement simplify(). Consider:

def simplify(expr):
    match expr:
        case UnaryOp("+", x):                return simplify(x)
        case BinaryOp("+", x, Number(0)):    return simplify(x)
        case BinaryOp("+", Number(0), x):    return simplify(x)
        case BinaryOp("-", x, Number(0)):    return simplify(x)
        case BinaryOp("-", Number(0), x):    return simplify(x)
        case BinaryOp("*", x, Number(1)):    return simplify(x)
        case BinaryOp("*", Number(1), x):    return simplify(x)
        case BinaryOp("*", x, Number(0)):    return Number(0)
        case BinaryOp("*", Number(0), x):    return Number(0)
        case BinaryOp("+", x, UnaryOp("-", y)):
            return simplify(BinaryOp("-", x, y))
        case BinaryOp("-", x, UnaryOp("-", y)):
            return simplify(BinaryOp("+", x, y))
        case UnaryOp("-", UnaryOp("-", x)):
            return simplify(x)
        case UnaryOp("-", UnaryOp("+", x)):
            return simplify(UnaryOp("-", x))
        case UnaryOp("+", UnaryOp("-", x)):
            return simplify(UnaryOp("-", x))
        case UnaryOp(op, x):
            return UnaryOp(op, simplify(x))
        case BinaryOp(op, x, y):
            return BinaryOp(op, simplify(x), simplify(y))
        case _: return expr

The above implementation is relatively concise and covers even more simplification rules than were initially required, but how does it work?

In the simple integer matching example above, we saw that a pattern could be a number (optionally with a condition). The pattern syntax, however, is much more expressive. We can use our dataclasses and just write what we would like to match:

The first case case UnaryOp("+", x): matches any expression that is a unary expression of the "+" operator. The argument is arbitrary. The x in the pattern is a name capture that makes the operator’s argument available within the case statement. For the expression \(+x\), we take the argument x and pass it recursively to simplify().

The way the patterns are specified closely resembles what we specified in the initial set of requirements for the function simplify(). Let’s take the mathematical rule \(e + 0\), for example. This is translated to the pattern BinaryOp("+", x, Number(0)) in Python. There is nothing magical about this pattern. It is the formulation of \(e + 0\) using our dataclasses. The name capture x encodes the fact that \(e\) is an arbitrary expression.

We see in this example also that the structural pattern matching is not limited to a single class and its arguments. The right argument is yet another dataclass instance, Number(0), representing the number \(0\). In this case it looks trivial, but the concept can be applied to more complex patterns.

Consider the implementation for the \(e + (-f)\) rule. We want to match a sum (i.e. BinaryOp("+", left, right)) where the right operand is the negation of an expression (i.e. UnaryOp("-", arg)) the corresponding case statement with it’s pattern is then

        case BinaryOp("+", x, UnaryOp("-", y)):
            ...  # x and y are available

Wrap up

You’ve seen two simple examples that showcased two specific features of pattern matching in Python:

Simple patterns with conditions and
Structural patterns with dataclasses.

Besides these two features, the specification is much more extensive. I recommend having a look at that document.

Earlier, I’ve promised that we will make defining expressions from the second example more convenient. We could define overloaded operator methods in the Expression class. Additionally, the function parse() converts numbers to Number instances and strings to Variables.

class Expression:
    def __add__(self, other):
        return BinaryOp("+", self, parse(other))
    def __sub__(self, other):
        return BinaryOp("-", self, parse(other))
    def __mul__(self, other):
        return BinaryOp("*", self, parse(other))
    def __neg__(self):
        return UnaryOp("-", self)

def parse(expr):
    match expr:
        case int(a) | float(a):  return Number(a)
        case str(n):             return Variable(n)
        case _:                  return expr

The two expressions from above thus become:

a = Variable("a")
b = Variable("b")
c = Variable("c")

pythagoras = BinaryOp("=", c*c, a*a + b*b)

# (a + 1) * Δ
score = (a + 1) * "delta"

Much nicer, right?

We haven’t actually used the simplify() function. This is left as an exercise to the reader. What does simplify((Variable("a") + 1) * 0) return?