What is 'Pattern Matching' in functional languages?

44,203

Solution 1

Understanding pattern matching requires explaining three parts:

  1. Algebraic data types.
  2. What pattern matching is
  3. Why its awesome.

Algebraic data types in a nutshell

ML-like functional languages allow you define simple data types called "disjoint unions" or "algebraic data types". These data structures are simple containers, and can be recursively defined. For example:

type 'a list =
    | Nil
    | Cons of 'a * 'a list

defines a stack-like data structure. Think of it as equivalent to this C#:

public abstract class List<T>
{
    public class Nil : List<T> { }
    public class Cons : List<T>
    {
        public readonly T Item1;
        public readonly List<T> Item2;
        public Cons(T item1, List<T> item2)
        {
            this.Item1 = item1;
            this.Item2 = item2;
        }
    }
}

So, the Cons and Nil identifiers define simple a simple class, where the of x * y * z * ... defines a constructor and some data types. The parameters to the constructor are unnamed, they're identified by position and data type.

You create instances of your a list class as such:

let x = Cons(1, Cons(2, Cons(3, Cons(4, Nil))))

Which is the same as:

Stack<int> x = new Cons(1, new Cons(2, new Cons(3, new Cons(4, new Nil()))));

Pattern matching in a nutshell

Pattern matching is a kind of type-testing. So let's say we created a stack object like the one above, we can implement methods to peek and pop the stack as follows:

let peek s =
    match s with
    | Cons(hd, tl) -> hd
    | Nil -> failwith "Empty stack"

let pop s =
    match s with
    | Cons(hd, tl) -> tl
    | Nil -> failwith "Empty stack"

The methods above are equivalent (although not implemented as such) to the following C#:

public static T Peek<T>(Stack<T> s)
{
    if (s is Stack<T>.Cons)
    {
        T hd = ((Stack<T>.Cons)s).Item1;
        Stack<T> tl = ((Stack<T>.Cons)s).Item2;
        return hd;
    }
    else if (s is Stack<T>.Nil)
        throw new Exception("Empty stack");
    else
        throw new MatchFailureException();
}

public static Stack<T> Pop<T>(Stack<T> s)
{
    if (s is Stack<T>.Cons)
    {
        T hd = ((Stack<T>.Cons)s).Item1;
        Stack<T> tl = ((Stack<T>.Cons)s).Item2;
        return tl;
    }
    else if (s is Stack<T>.Nil)
        throw new Exception("Empty stack");
    else
        throw new MatchFailureException();
}

(Almost always, ML languages implement pattern matching without run-time type-tests or casts, so the C# code is somewhat deceptive. Let's brush implementation details aside with some hand-waving please :) )

Data structure decomposition in a nutshell

Ok, let's go back to the peek method:

let peek s =
    match s with
    | Cons(hd, tl) -> hd
    | Nil -> failwith "Empty stack"

The trick is understanding that the hd and tl identifiers are variables (errm... since they're immutable, they're not really "variables", but "values" ;) ). If s has the type Cons, then we're going to pull out its values out of the constructor and bind them to variables named hd and tl.

Pattern matching is useful because it lets us decompose a data structure by its shape instead of its contents. So imagine if we define a binary tree as follows:

type 'a tree =
    | Node of 'a tree * 'a * 'a tree
    | Nil

We can define some tree rotations as follows:

let rotateLeft = function
    | Node(a, p, Node(b, q, c)) -> Node(Node(a, p, b), q, c)
    | x -> x

let rotateRight = function
    | Node(Node(a, p, b), q, c) -> Node(a, p, Node(b, q, c))
    | x -> x

(The let rotateRight = function constructor is syntax sugar for let rotateRight s = match s with ....)

So in addition to binding data structure to variables, we can also drill down into it. Let's say we have a node let x = Node(Nil, 1, Nil). If we call rotateLeft x, we test x against the first pattern, which fails to match because the right child has type Nil instead of Node. It'll move to the next pattern, x -> x, which will match any input and return it unmodified.

For comparison, we'd write the methods above in C# as:

public abstract class Tree<T>
{
    public abstract U Match<U>(Func<U> nilFunc, Func<Tree<T>, T, Tree<T>, U> nodeFunc);

    public class Nil : Tree<T>
    {
        public override U Match<U>(Func<U> nilFunc, Func<Tree<T>, T, Tree<T>, U> nodeFunc)
        {
            return nilFunc();
        }
    }

    public class Node : Tree<T>
    {
        readonly Tree<T> Left;
        readonly T Value;
        readonly Tree<T> Right;

        public Node(Tree<T> left, T value, Tree<T> right)
        {
            this.Left = left;
            this.Value = value;
            this.Right = right;
        }

        public override U Match<U>(Func<U> nilFunc, Func<Tree<T>, T, Tree<T>, U> nodeFunc)
        {
            return nodeFunc(Left, Value, Right);
        }
    }

    public static Tree<T> RotateLeft(Tree<T> t)
    {
        return t.Match(
            () => t,
            (l, x, r) => r.Match(
                () => t,
                (rl, rx, rr) => new Node(new Node(l, x, rl), rx, rr))));
    }

    public static Tree<T> RotateRight(Tree<T> t)
    {
        return t.Match(
            () => t,
            (l, x, r) => l.Match(
                () => t,
                (ll, lx, lr) => new Node(ll, lx, new Node(lr, x, r))));
    }
}

For seriously.

Pattern matching is awesome

You can implement something similar to pattern matching in C# using the visitor pattern, but its not nearly as flexible because you can't effectively decompose complex data structures. Moreover, if you are using pattern matching, the compiler will tell you if you left out a case. How awesome is that?

Think about how you'd implement similar functionality in C# or languages without pattern matching. Think about how you'd do it without test-tests and casts at runtime. Its certainly not hard, just cumbersome and bulky. And you don't have the compiler checking to make sure you've covered every case.

So pattern matching helps you decompose and navigate data structures in a very convenient, compact syntax, it enables the compiler to check the logic of your code, at least a little bit. It really is a killer feature.

Solution 2

Short answer: Pattern matching arises because functional languages treat the equals sign as an assertion of equivalence instead of assignment.

Long answer: Pattern matching is a form of dispatch based on the “shape” of the value that it's given. In a functional language, the datatypes that you define are usually what are known as discriminated unions or algebraic data types. For instance, what's a (linked) list? A linked list List of things of some type a is either the empty list Nil or some element of type a Consed onto a List a (a list of as). In Haskell (the functional language I'm most familiar with), we write this

data List a = Nil
            | Cons a (List a)

All discriminated unions are defined this way: a single type has a fixed number of different ways to create it; the creators, like Nil and Cons here, are called constructors. This means that a value of the type List a could have been created with two different constructors—it could have two different shapes. So suppose we want to write a head function to get the first element of the list. In Haskell, we would write this as

-- `head` is a function from a `List a` to an `a`.
head :: List a -> a
-- An empty list has no first item, so we raise an error.
head Nil        = error "empty list"
-- If we are given a `Cons`, we only want the first part; that's the list's head.
head (Cons h _) = h

Since List a values can be of two different kinds, we need to handle each one separately; this is the pattern matching. In head x, if x matches the pattern Nil, then we run the first case; if it matches the pattern Cons h _, we run the second.

Short answer, explained: I think one of the best ways to think about this behavior is by changing how you think of the equals sign. In the curly-bracket languages, by and large, = denotes assignment: a = b means “make a into b.” In a lot of functional languages, however, = denotes an assertion of equality: let Cons a (Cons b Nil) = frob x asserts that the thing on the left, Cons a (Cons b Nil), is equivalent to the thing on the right, frob x; in addition, all variables used on the left become visible. This is also what's happening with function arguments: we assert that the first argument looks like Nil, and if it doesn't, we keep checking.

Solution 3

It means that instead of writing

double f(int x, int y) {
  if (y == 0) {
    if (x == 0)
      return NaN;
    else if (x > 0)
      return Infinity;
    else
      return -Infinity;
  } else
     return (double)x / y;
}

You can write

f(0, 0) = NaN;
f(x, 0) | x > 0 = Infinity;
        | else  = -Infinity;
f(x, y) = (double)x / y;

Hey, C++ supports pattern matching too.

static const int PositiveInfinity = -1;
static const int NegativeInfinity = -2;
static const int NaN = -3;

template <int x, int y> struct Divide {
  enum { value = x / y };
};
template <bool x_gt_0> struct aux { enum { value = PositiveInfinity }; };
template <> struct aux<false> { enum { value = NegativeInfinity }; };
template <int x> struct Divide<x, 0> {
  enum { value = aux<(x>0)>::value };
};
template <> struct Divide<0, 0> {
  enum { value = NaN };
};

#include <cstdio>

int main () {
    printf("%d %d %d %d\n", Divide<7,2>::value, Divide<1,0>::value, Divide<0,0>::value, Divide<-1,0>::value);
    return 0;
};

Solution 4

Pattern matching is sort of like overloaded methods on steroids. The simplest case would be the same roughly the same as what you seen in java, arguments are a list of types with names. The correct method to call is based on the arguments passed in, and it doubles as an assignment of those arguments to the parameter name.

Patterns just go a step further, and can destructure the arguments passed in even further. It can also potentially use guards to actually match based on the value of the argument. To demonstrate, I'll pretend like JavaScript had pattern matching.

function foo(a,b,c){} //no pattern matching, just a list of arguments

function foo2([a],{prop1:d,prop2:e}, 35){} //invented pattern matching in JavaScript

In foo2, it expects a to be an array, it breaks apart the second argument, expecting an object with two props (prop1,prop2) and assigns the values of those properties to variables d and e, and then expects the third argument to be 35.

Unlike in JavaScript, languages with pattern matching usually allow multiple functions with the same name, but different patterns. In this way it is like method overloading. I'll give an example in erlang:

fibo(0) -> 0 ;
fibo(1) -> 1 ;
fibo(N) when N > 0 -> fibo(N-1) + fibo(N-2) .

Blur your eyes a little and you can imagine this in javascript. Something like this maybe:

function fibo(0){return 0;}
function fibo(1){return 1;}
function fibo(N) when N > 0 {return fibo(N-1) + fibo(N-2);}

Point being that when you call fibo, the implementation it uses is based on the arguments, but where Java is limited to types as the only means of overloading, pattern matching can do more.

Beyond function overloading as shown here, the same principle can be applied other places, such as case statements or destructuring assingments. JavaScript even has this in 1.7.

Solution 5

Pattern matching allows you to match a value (or an object) against some patterns to select a branch of the code. From the C++ point of view, it may sound a bit similar to the switch statement. In functional languages, pattern matching can be used for matching on standard primitive values such as integers. However, it is more useful for composed types.

First, let's demonstrate pattern matching on primitive values (using extended pseudo-C++ switch):

switch(num) {
  case 1: 
    // runs this when num == 1
  case n when n > 10: 
    // runs this when num > 10
  case _: 
    // runs this for all other cases (underscore means 'match all')
}

The second use deals with functional data types such as tuples (which allow you to store multiple objects in a single value) and discriminated unions which allow you to create a type that can contain one of several options. This sounds a bit like enum except that each label can also carry some values. In a pseudo-C++ syntax:

enum Shape { 
  Rectangle of { int left, int top, int width, int height }
  Circle of { int x, int y, int radius }
}

A value of type Shape can now contain either Rectangle with all the coordinates or a Circle with the center and the radius. Pattern matching allows you to write a function for working with the Shape type:

switch(shape) { 
  case Rectangle(l, t, w, h): 
    // declares variables l, t, w, h and assigns properties
    // of the rectangle value to the new variables
  case Circle(x, y, r):
    // this branch is run for circles (properties are assigned to variables)
}

Finally, you can also use nested patterns that combine both of the features. For example, you could use Circle(0, 0, radius) to match for all shapes that have the center in the point [0, 0] and have any radius (the value of the radius will be assigned to the new variable radius).

This may sound a bit unfamiliar from the C++ point of view, but I hope that my pseudo-C++ make the explanation clear. Functional programming is based on quite different concepts, so it makes better sense in a functional language!

Share:
44,203

Related videos on Youtube

Roman
Author by

Roman

Updated on September 23, 2020

Comments

  • Roman
    Roman over 3 years

    I'm reading about functional programming and I've noticed that Pattern Matching is mentioned in many articles as one of the core features of functional languages.

    Can someone explain for a Java/C++/JavaScript developer what does it mean?

  • Roman
    Roman about 14 years
    Next time I'll mention in question that I've already read wikipedia and it gives very bad explaination.
  • charlieb
    charlieb about 14 years
    Also as you can see from the example the interpreter can also break up a single argument into several variables automatically (e.g. [Head|Tail])
  • Totti
    Totti almost 14 years
    +1 but don't forget about other languages with pattern matching like Mathematica.
  • Doval
    Doval over 10 years
    "errm... since they're immutable, they're not really "variables", but "values" ;)" They are variables; it's the mutable variety that's mislabeled. Nevertheless, excellent answer!
  • prashant
    prashant about 10 years
    "Almost always, ML languages implement pattern matching without run-time type-tests or casts" <- How does this work? Can you point me to a primer?
  • fracca
    fracca about 10 years
    In Scala: import Double._ def divide = { values: (Double, Double) => values match { case (0,0) => NaN case (x,0) => if (x > 0) PositiveInfinity else NegativeInfinity case (x,y) => x / y } }
  • Totti
    Totti about 8 years
    @DavidMoles: The type system makes it possible to elide all run-time checks by proving pattern matches to be exhaustive and not redundant. If you try to feed a language like SML, OCaml or F# a pattern match that is not exhaustive or contains redundancy then the compiler will warn you at compile time. This is an extremely powerful feature because it allows you to eliminate run-time checks by rearranging your code, i.e. you can have aspects of your code proven correct. Furthermore, it is easy to understand!
  • prashant
    prashant about 8 years
    @JonHarrop I can see how that would work (effectively it's similar to dynamic message dispatch) but I can't see how at run-time you select a branch without a type test.
  • Totti
    Totti about 8 years
    @DavidMoles: I see what you mean. The difference is terminology: the parts of a union type are called type constructors and they are not regarded as types as they would be in other languages so that isn't technically testing types at runtime. The point here is that there is no implicit run time test for errors during dispatch. Static checking of errors but still dynamic dispatch.
  • jmrah
    jmrah almost 7 years
    What an interesting way of thinking about the equals sign. Thanks for sharing that!
  • Foobar
    Foobar over 5 years
    What does Cons mean?
  • Antal Spector-Zabusky
    Antal Spector-Zabusky over 5 years
    @Roymunson: Cons is the constructor that builds a (linked) list out of a head (the a) and a tail (the List a). The name comes from Lisp. In Haskell, for the built-in list type, it's the : operator (which is still pronounced "cons").
  • joel
    joel over 4 years
    I wouldn't say you absolutely need ADTs to talk about pattern matching - you can pattern match on primitive types and e.g. structs in rust, with no ADT in sight afaik
  • 0x49D1
    0x49D1 about 4 years
    Lets update this a little bit: C# now has pattern matching and C# 8 will expand it further: docs.microsoft.com/en-us/archive/msdn-magazine/2019/may/…