Dependent Pattern Matching in Coq
Introduction
I've always hated having to write match arms for cases that I know will never match. Having recently started my first non-trivial Coq development, I've learned that Coq offers some interesting ways to address this, albeit at the cost of several days being rabbit-holed on something called dependent pattern-matching. I've learned that others have been confused on this as well, so I thought I'd lay-out what I learned as simply as I could.
Background
Coq, like Rust & Haskell, provides a match
statement, where one can compare an expression (called the "scrutinee" in Rust & the "discriminee" in Coq) to one or more match patterns that will (generally) destructure the discriminee into smaller pieces. Each pattern typically has code associated with it, possibly referencing sub-terms in that pattern, that will be executed should the discriminee match. These are often referred to as the "arms" of the match statement.
In all these cases, the set of patterns in a match statement must be exhaustive; that is, any legal term of the discriminee's type must match at least one pattern in the match statement. In other words, it's not like a switch statement in C, where you're free to just do nothing if none of the provided cases apply. For instance, you can't say:
Definition silly (x: bool): nat := match x with | true => 1 end.
Toplevel input, characters 0-70: > Definition silly (x: bool): nat := > match x with > | true => 1 > end. Error: Non exhaustive pattern-matching: no clause found for pattern false
One is sometimes confronted with the need to write a match arm for a case that will never happen. Such cases may be syntactically valid but ruled out by the logic of the surrounding code. In Rust, the convention is to fill such a match arm with the unreachable!() macro, but that's merely a more-mnemonic invocation of panic!() should the impossible turn out to be possible.
Coq not only offers us the chance to prove, a priori, that such a match arm will never be invoked, but in practice confronts us with such match arms more frequently once we begin writing the sort of stronger specifications for our functions that Coq encourages.
I've encountered this now that I'm starting my first real Coq development. While I wouldn't consider this the sort of beginner-level Coq on which I should probably be focused, it comes up fairly quickly if you want to use dependent types, as well as some of the idioms described in the "Functions and Their Specifications" chapter in Interactive Theorem Proving and Program Development with Coq along with those popularized by Adam Chlipala in Certified Programming with Dependent Types.
The goal of this post is to lay-out this topic, hopefully from the perspective of the working programmer. Familiarity with Coq will be helpful, but I'll try to explain that as I go.
The following were all very useful to me over the past week:
- Interactive Theorem Proving and Program Development with Coq, by Yves Bertot & Pierre Castéran
- Certified Programming with Dependent Types, by Adam Chlipala
- Matthew Brecknell's talk Pattern matching dependent types in Coq
- the Coq documentation on Extended Pattern Matching
- my own question on StackOverflow
and for the Convoy pattern in particular:
- extracting evidence of equality from match (Stack Overflow)
- Coq: keeping information in a match statement (Stack Overflow)
My usual disclaimer applies: I'm new to this, so if readers notice any errors or omissions, corrections will be gratefully accepted.
The "Refine Trick"
Let's start with a commonly-used approach to writing such match arms. While there are a few stock idioms, at the end of the day we're going to be writing-down terms of certain types, and it pays to have the system derive those types for us. A common approach is to start by defining the desired function using the goal system and then using the refine tactic to have Coq display the required types to us.
Let's start with a trivial example to show there's no magic going on here. Let's define the function:
\begin{equation} \label{eq:1} f(x) = \begin{cases} 1&\text{if }x=0 \\ 2&\text{if }x\ne 0 \nonumber \end{cases} \end{equation}We could of course just write:
Definition f (x: nat) : nat := match x with | O => 1 | S n' => 2 end.
For readers unfamiliar with Coq, the naturals in Coq are typically represented using the Peano numbers: zero is O
and one is (S O)
("S" for successor). Two is then (S 1)
which is (S (S O))
and so on. So our match pattern here is exhaustive: every natural number is either zero or the successor of some other natural.
It is sometimes useful to develop our definition using the goal system:
Reset Initial. Definition f: forall n: nat, nat.
1 goal ============================ nat -> nat
We have defined a name f
and declared it to have type \(\mathbb{N}\rightarrow \mathbb{N}\). Per Top Level Definitions in the Coq manual, since we haven't provided a definition, Coq enters proof mode. In particular, the refine tactic can be used to sketch-out the term, leaving slots to be filled-in later. We next write:
refine ( fun n: nat => match n with | O => 1 | S n' => _ end ).
1 goal n, n' : nat ============================ nat
We see that we've left a "hole" (AKA a "joker" AKA a "wildcard") in the match arm corresponding to the pattern S n'
. Coq treats this as a goal. We have that n
& n'
are in \(\mathbb{N}\) in our context, and to dispatch the goal, we need to produce a term of type nat
(\(\mathbb{N}\)). This is just like being in a proof where we need to provide a proof term for some proposition. It's also an example of term finding.
In our case, the request is trivial: we can just say "2":
exact 2. Defined. Compute f 0. Compute f 1.
No more goals. = 1 : nat = 2 : nat
Alright: let's consider a more realistic example. We define a type Option
that can be Fail
or Ok
. In the latter case, it can carry a payload of true or false:
Reset Initial. Inductive Option: Set := | Fail: Option | Ok: bool -> Option.
Option is defined Option_rect is defined Option_ind is defined Option_rec is defined Option_sind is defined
We'd like to write a function that, given an Option
returns the boolean value within. A naive attempt comes to a halt fairly quickly:
Definition get (x: Option) : bool := match x with | Ok b => b | Fail => _ (* Oops! What do we write here? *) end.
There's nothing we can put into the empty slot (short of some hack like just returning "false"). There are a few approaches we could take. One way of dealing with partial functions such as get
is to demand from the caller evidence that the argument is actually in the function's domain. In our case, that means requiring our caller to prove that x
is not Fail
before they're allowed to invoke our function:
Definition get (x: Option): x <> Fail -> bool := match x with | Ok b => fun _ => b | Fail => _ end.
Well. It's still not clear what to put into the slot for the Fail
case. Let's use the goal system & see what Coq can tell us:
Definition get: forall x: Option, x <> Fail -> bool. refine (fun x: Option => match x return x<> Fail -> bool with | Fail => _ | Ok b => fun _ => b end).
1 goal ============================ forall x : Option, x <> Fail -> bool 1 goal x : Option ============================ Fail <> Fail -> bool
Alright, we have a goal of Fail <> Fail -> bool
. But this is trivial: Fail <> Fail
is clearly False
, from which anything follows. Let's just prove this:
intros; absurd (Fail = Fail); trivial; reflexivity. Show Proof.
No more goals. (fun x : Option => match x as x0 return (x0 <> Fail -> bool) with | Fail => fun H : Fail <> Fail => False_rec bool (H eq_refl) | Ok b => fun _ : Ok b <> Fail => b end)
Again, for readers new to Coq, we're again in proof mode. Coq is presenting us with both what is known and what is to be shown (proven). At the outset, it is known that x
is an Option
, and we need to prove that Fail <> Fail -> bool
.
Statements like intros
& reflexivity
are examples of tactics. Tactics transform the current goal (hopefully into something smaller, or simpler), generally using things that are known (either in the context of our current proof, or that have been previously proven). Here, I've applied several tactics in one shot, so the output shows no transformations until we reach "No more goals.", signalling that we're done.
The end result of all this is a "proof term": a term inhabiting the type corresponding to Option -> Fail <> Fail -> bool
(see here for more on how proving a proposition corresponds to finding an inhabitant of a type; tactics can be viewed as incremental, interactive term finding). We see that the term, unsurprisingly, is a function accepting an Option
named x
and returning a function that can turn evidence for the proposition x <> Fail
into a bool
.
Alright– let's close our proof by saying Defined
rather than the more typical Qed
:
Defined.
Qed
is appropriate for closing-out the definition of proof terms; terms that inhabit propositions. It marks the definition as opaque; you won't be able to unfold it later (more precisely, such definitions cannot have δ-reduction performed on them). This is suitable for proofs, since we generally don't care much about the precise form of the term, only that it exists. Defined
on the other hand will mark this definition as transparent.
Dependent Pattern Matching
The structure of the proof term we indirectly built is interesting, and suggestive of how we could code this function directly. It's a pure term (i.e. a lambda) that takes an Option
named x
and matches against it. The interesting thing here is that the match arms are not the same type. Coming from Rust, where that is required, this was surprising to me.
We see that the first match arm yields another lambda taking a proof term for the proposition Fail <> Fail
and yielding… something. The second arm yields a lambda taking a proof term for the proposition Ok b <> Fail
and yields a bool (specifically, the payload for the Ok
variant of the input Option
).
Let's start with the Fail
arm: what is False_rec
?
Print False_rec.
False_rec = fun P : Set => False_rect P : forall P : Set, False -> P Arguments False_rec P%type_scope f
Hmmm…. so False_rec
is a function that takes a type P
and returns False_rect P
. Alright, so what's False_rect
?
Print False_rect.
False_rect = fun (P : Type) (f : False) => match f return P with end : forall P : Type, False -> P Arguments False_rect P%type_scope f
Ahhhhhhh… this is the computational equivalent of the logical statement "ex falso quodlibet", or "from falsehood, anything follows." False_rect
is a function of two parameters: a type P
, and a proof term f
for False
(of course, no such proof term can exist). The body of the function is just:
match f return P with end.
Now, as the empty type, False
has no constructors, so the empty match statement is trivially exhaustive.
The "return" clause on the match statement, however, is new. Generally, Coq has no trouble inferring the type of a match statement: it is simply the (common) type of all the match arms. As we're seeing, first in our proof term for get
, now in False_rect
, this situation does not always attain in Coq. The return
clause is sometimes required to clarify things.
In the case of False_rect
, there are no match arms, so the return
clause here gives Coq the desired type. As we can see from the type of False_rect
, given proof of False
, we can use it to get a term of any type P
.
Jumping back "up the stack", False_rec P
is just False_rect P
; in other words, we're Currying the first parameter (which is a type), and so we have a function just waiting for a piece of evidence for False
to return a term of type P
. Such evidence can never be given, and so the particular element of P
need never be specified.
Popping the stack one more level, to the Fail
arm of our definition get
, we see that we have just such a thing: the match arm is:
fun H : Fail <> Fail => False_rec bool (H eq_refl)
Now, H
is of type Fail <> Fail
. De-sugaring & unfolding not
yields the equivalent formulation of the type of H
:
In Coq, eq
is just an inductive predicate
Print eq.
Inductive eq (A : Type) (x : A) : A -> Prop := eq_refl : x = x. Arguments eq {A}%type_scope x _ Arguments eq_refl {A}%type_scope {x}, [_] _
with a single constructor eq_refl
of type (eq x x)
. So, if I'm reading this correctly and lining-up all my types, eq_refl
(specialized to type Option
and Fail
) is a term of type \(Fail = Fail\), and hence we may apply H
to it to derive our required term of type False
.
Alright– there's the match arms explained. What about that match statement though?
match x as x0 return (x0 <> Fail -> bool) with
We've met the "return" clause, above: it helps clarify the type of the match statement. But it actually does more: it plays in something called dependent pattern matching. In dependent pattern matching, the return value of the match statement depends on the discriminee.
From Certified Programming with Dependent Types, a general dependent pattern match is of the form:
match E as y in (T x1 ... xn) return U with | C z1 ... zm: T x1'...xn' => B | ... end
The discriminee E
is a value in some inductive type family T
(which generally takes arguments, although Option
does not). The "as" clause binds the name y
to refer to the discriminee so that we can refer to it in U
, which is a formula, written in terms of the discriminee, for the match's overall type. In other words, U
will generally be some term incorporating y
and the xi
. To determine that type, the term E
will be substituted for y
in U
, and each xi
will be substituted in U
for that in the actual type of E
.
Now to the types of each match arm. Each match pattern will be a constructor application that looks like C z1 ... zm
and has some type T x1' ... xn'
. The body B
should be of type U
after replacing y
with C z1 ... zm
and the xi
with xi'
. This is key: the type of each match arm is specialized by what we now "know" about the value given the constuctor that matched. It's not that the match arms can have completely different types, it's that the type of each arm is a refinement of the type of the overall match.
Now there's a rather serious footgun here that might not be apparent at first blush (it certainly wasn't to me). The point to this is to incorporate knowledge gained from the match pattern into the type of the match arm. That type is obtained by substituting the variables y
and x1
through xn
in U
with their analogs based on the match pattern. That means that if you don't mention y
or x1
through xn
in the return clause this won't happen. I spent entirely too much time this week staring at my Coq session wondering why Coq couldn't "see" information from the match pattern in the match arm. In fact, I was looking to dependent pattern matching when I should have been leveraging the Convoy pattern (on which more below).
Let's look at the match statement in our proof-style definition and see what the types of the match arms should be according to these rules:
Figure 1: Dependently typing our match statement
Our expression E
in this case is just x
. We still need to bind it to a name in U
, so, like the proof system, let's just call it x0
. Our inductive type here is quite simple, so the entire epxression T x1 ... xn
is scratched-out & replaced with just Option
. Finally, our return type is now x0 <> Fail -> bool
.
Now for the arms: in each case, the match body shall have type U
after replacing x0
with the form of the matching constructor. So, for the Fail
match expression, our body will be of type Fail <> Fail -> bool
and for the Ok
match, it will be of type Ok b <> Fail -> bool
.
This is the power of dependent matching– in the "bad" match arm (i.e. Fail
), we can take advantage of the fact that, at that point, we "know" that x
is just Fail
.
Therefore, we can write our function directly:
Reset get. Definition get (x: Option): x <> Fail -> bool := match x as x0 return (x0 <> Fail -> bool) with | Fail => fun H : Fail <> Fail => False_rec bool (H eq_refl) | Ok b => fun _ : Ok b <> Fail => b end.
get is defined
We have an exhaustive pattern match, but since the impossible arm requires evidence that can never be produced, the term type checks even though it will never be invoked.
The Convoy Pattern
Flush with my newfound knowledge, I got to work on my Coq development, which entails carrying-out certain operations on strings. I decided to model strings in Coq by simple lists of characters. The Coq Standard Library defines lists in a way that will be familiar to functional programmers:
Print list.
Inductive list (A : Type) : Type := nil : list A | cons : A -> list A -> list A. Arguments list A%type_scope Arguments nil {A}%type_scope Arguments cons {A}%type_scope a l%list_scope (where some original arguments have been renamed)
It also has a package for ASCII characters called, naturally enough, Ascii
, so let's give ourselves a specialization of list (i.e. a type depending on a type) for ASCII characters:
Reset Initial. Require Import List. Require Import Ascii. Import ListNotations. Definition String := list ascii.
String is defined
Let's develop the "delete" operation: given a string, we'd like to delete the character at a given index. We'll take the same approach as we did above for Option
, and demand from our caller evidence that the index is valid before we return the resulting string. We'll also begin with the Refine Trick.
My scheme here is to recurse on the input string s
, decreasing the index i
at each recursive call of delete
on the tail (or cdr) of s
. When i
reaches zero, we know we want to delete the head of the current list.
Definition delete: forall s: String, forall i: nat, i < length s -> String. refine( (* We need to name our function, since we'll be calling ourselves recursively *) fix delete s i := match s with | [ ] => fun H => _ (* 👈 this should never happen... *) | h :: t => fun H => match i with | O => t (* Need evidence that i' < length t here 👇... *) | S i' => h :: delete t i' _ end end).
1 goal ============================ forall (s : String) (i : nat), i < length s -> String 2 goals delete : forall (s : String) (i : nat), i < length s -> String s : String i : nat H : i < length [] ============================ String goal 2 is: i' < length t
We see that we have two goals, one for each slot in our first implementation.
Let's look at the first goal: we have in our context the hypothesis H
that \(i\in \mathbb{N} < \text{length }()\), which is of course impossible. This feels like a job for False_rec
(see above). Let's bail & give ourselves a little helper lemma that we can use to produce the proof of False
required by False_rec
:
Abort. (* Bail out of our current proof *) Lemma delete_exfalso_helper: forall n: nat, n < length ([]: String) -> False. Admitted. (* Obvious-- don't want to clutter the post with this proof *) Definition delete: forall s: String, forall i: nat, i < length s -> String. refine( (* We need to name our lambda, since we'll be calling ourselves recursively *) fix delete s i := match s with | [ ] => fun H => _ (* Should never happen *) | h :: t => fun H => match i with | O => t (* Need evidence that i' < length t here 👇... *) | S i' => h :: delete t i' _ end end).
1 goal ============================ forall n : nat, n < length ([] : String) -> False delete_exfalso_helper is declared 1 goal ============================ forall (s : String) (i : nat), i < length s -> String 2 goals delete : forall (s : String) (i : nat), i < length s -> String s : String i : nat H : i < length [] ============================ String goal 2 is: i' < length t
We're confronted with the same goal, but now we're armed with the lemma delete_exfalso_helper
– we use it to dispatch the first goal:
exfalso; apply delete_exfalso_helper in H; apply H.
1 goal delete : forall (s : String) (i : nat), i < length s -> String s : String i : nat h : ascii t : list ascii H : i < length (h :: t) i' : nat ============================ i' < length t
This still leaves us with the second goal, but we'll get to thtat in a minute. Let's print out our proof term as it stands now:
Show Proof.
(fix delete (s : String) (i : nat) {struct s} : i < length s -> String := match s as s0 return (i < length s0 -> String) with | [] => fun H : i < length [] => False_rec String (let H0 : False := delete_exfalso_helper i H in H0) | h :: t => fun H : i < length (h :: t) => match i with | 0 => t | S i' => h :: delete t i' ?Goal end end)
Cool, cool… we've filled-in the impossible match arm in a way that I, at least, found far more intuititive than writing it directly. Note that Coq added the appropriate clauses to our match statement to take advantage of dependent pattern-matching: that's why we were able to use the fact that H
was a proof term for the proposition i < length []
in the []
case, rather than just i < length s
.
Now for that second slot. We here have a proof term H
for the proposition that i < length(h :: t)
, but we need a proof term for the proposition that i' < length t
(to hand to our recursive call of delete
). That's clearly true, so let's:
- bail on the current proof
- give ourselves another helper lemma to that effect
- and try refining a new definition, this time with the impossible arm handled
Abort. Lemma delete_tail_1: forall h: ascii, forall t: String, forall i: nat, S i < length (h :: t) -> i < length t. Admitted. (* Proof again omitted for clarity *) Definition delete: forall s: String, forall i: nat, i < length s -> String. refine( (* We need to name our lambda, since we'll be calling ourselves recursively *) fix delete s i := match s with | [ ] => (* This 👇 is the term we derived last time *) fun H => False_rec String (let H0 : False := delete_exfalso_helper i H in H0) | h :: t => fun H => match i with | O => t (* Need evidence that i' < length t here 👇... *) | S i' => h :: delete t i' _ end end).
1 goal ============================ forall (h : ascii) (t : String) (i : nat), S i < length (h :: t) -> i < length t delete_tail_1 is declared 1 goal ============================ forall (s : String) (i : nat), i < length s -> String 1 goal delete : forall (s : String) (i : nat), i < length s -> String s : String i : nat h : ascii t : list ascii H : i < length (h :: t) i' : nat ============================ i' < length t
We ought to be able to get somewhere with our helper lemma…
apply delete_tail_1 with (h := h).
1 goal delete : forall (s : String) (i : nat), i < length s -> String s : String i : nat h : ascii t : list ascii H : i < length (h :: t) i' : nat ============================ S i' < length (h :: t)
and we're tantalizingly close, but stuck. We have (in H
) that i < length (h :: t)
, and we ahve to show that S i' < length (h :: t)
… but nowhere in our context do we have that i = S i'
!
This comes down to something Brecknell points out: dependent pattern matching only refines the return type of the match: it doesn't convey any other knowledge about the match pattern. You can sometimes work around this by binding variables later in your term, but here we're going to incorporate the knowledge we need into the dependent pattern match. Instead of just matching on i
, we setup our match to return a function taking as argument a proof term for the equality we need:
Abort. (* Bail *) (* Try again... *) Definition delete: forall s: String, forall i: nat, i < length s -> String. refine( fix delete s i := match s with | [ ] => fun H => False_rec String (let H0 : False := delete_exfalso_helper i H in H0) | h :: t => fun H => (* this match is now evaluating to a function from (i = i0) to String 👇 *) match i as i0 return i = i0 -> String with | O => fun _ => t | S i' => fun I => h :: delete t i' _ end eq_refl (* 👈 so we need to apply the entire match to such a term *) end).
1 goal ============================ forall (s : String) (i : nat), i < length s -> String 1 goal delete : forall (s : String) (i : nat), i < length s -> String s : String i : nat h : ascii t : list ascii H : i < length (h :: t) i' : nat I : i = S i' ============================ i' < length t
We see that we now have just the required hypothesis I
in our context, so:
apply delete_tail_1 with (h:=h); rewrite <- I; apply H.
No more goals.
Let's take a look at that proof term:
Show Proof.
(fix delete (s : String) (i : nat) {struct s} : i < length s -> String := match s as s0 return (i < length s0 -> String) with | [] => fun H : i < length [] => False_rec String (let H0 : False := delete_exfalso_helper i H in H0) | h :: t => fun H : i < length (h :: t) => match i as i0 return (i = i0 -> String) with | 0 => fun _ : i = 0 => t | S i' => fun I : i = S i' => h :: delete t i' (delete_tail_1 h t i' (eq_ind i (fun n : nat => n < length (h :: t)) H (S i') I)) end eq_refl end)
That one's a big uglier– we could probably tidy it up, but it's still not too bad to just write-down our function directly:
Abort. Fixpoint delete (s: String) (i: nat): i < length s -> String := match s with | [ ] => fun H : i < length [] => let H0 : False := delete_exfalso_helper i H in False_rec String H0 | h :: t => fun H => match i as i0 return i = i0 -> String with | O => fun _ => t | S i' => fun I => h :: delete t i' (delete_tail_1 h t i' (eq_ind i (fun n => n < length (h :: t)) H (S i') I)) end (eq_refl i) end. Open Scope char_scope. Definition hi := ["H" ; "i"; "!"]. Compute delete hi 0. Compute delete hi 2.
delete is defined delete is recursively defined (guarded on 1st argument) hi is defined = fun _ : 0 < length hi => ["i"; "!"] : 0 < length hi -> String = fun _ : 2 < length hi => ["H"; "i"] : 2 < length hi -> String
As we see, what we get back after invoking delete
in this way is itself a function: we need to first provide a proof term for the proposition that 0 < length hi
and only then will we get our answer back:
Require Import Lia. Lemma zero_is_legit: 0 < length hi. simpl; lia. Qed. Compute delete hi 0 zero_is_legit.
1 goal ============================ 0 < length hi No more goals. = ["i"; "!"] : String
Summary
To summarize, "basic" (as in, non-dependent) pattern matching in Coq is just dispatching: once inside a match arm, the only way you can appeal to the fact that you've matched a particular pattern is by referencing the sub-terms in the pattern. In fact, looking back, I realize that one obstacle to understanding was that I was still regarding the match
statement as a C switch
statement with a pattern language. This is fine for simple cases, but once you begin using Coq in anger, you quickly find that's not enough.
Worse, from the perspective of the learner, Coq tries to help by adding annotations to the match clause when it decides you need them (we saw this above, in fact), meaning that you may slip into dependent pattern matching without even realizing it. I lost quite a bit of time trying to understand what the system was doing before I learned that Coq employs heuristics in deciding the match type. Chlipala says: "recent Coq versions are adding more and more heuristics to infer dependent match annotations in certain conditions. The general annotation inference problem is undecidable, so there will always be serious limitations on how much work these heuristics can do. When in doubt about why a particular dependent match is failing to type-check, add an explicit return annotation! At that point, the mechanical rule sketched [above] will provide a complete account of 'what the type checker is thinking.'"
Once you're clear on the fact that you're using dependent pattern matching, and what the (dependent) type of your match statement is going to be, Chlipala's formula gives the (refined) types expected of each match arm.
All that said, dependent pattern matching will only refine the type of the term expected in each match arm. If you need more than that (such as a proof term for the proposition that the match pattern is equal to to the match expression) you're going to need something else. The Convoy pattern lets you "thread" the knowledge that the discriminee takes the form of the given match pattern into your match arm.
Finally, employing both, along with the principle of ex falso quodlibet, lets us write an exhaustive set of match patterns where the impossible arms provably never match. This is generally because the match arm requires a proof of False
, which can never be presented.
This turned into a much longer post than I'd intended, but still: judging by a perusal of of Stack Overflow posts tagged [coq]
, I'm far from the only one to be confused on first brush with these techniques. Hopefully my rather meandering explanation will be of some help to the next would-be Coq programmer.
06/21/24 08:01