Lambda-P
Abstract
This post is part of a series on how to reason about code. It describes \(\lambda P\), a precursor to the formal language understood by Coq, an interactive proof assistant I'm learning. Prior posts:
- first, long-ago post here
- a concrete example of symbolic execution
- why I'm interested in type theory here
- the λ Calculus
- the Simply Typed Lambda Calculus
- the Polymorphic Lambda Calculus
- \(\lambda\underline{\omega}\)
This exposition follows that in "Type Theory & Formal Proof" by Nederpelt & Geuvers (abbreviated TTFP throughout), with help from "Lambda Calculi with Types" by Henk Barendregt. Also highly recommended: Philip Wadler's talk Propositions as Types.
Recap
At this point, we've been extending the Simply Typed Lambda Calculus (\(\lambda\rightarrow\)) in different ways, looking at more expressive type systems that preserve the strong normalization guarantee of \(\lambda\rightarrow\). \(\lambda 2\) gave us polymorphism AKA terms depending on types. \(\lambda\underline{\omega}\) gave us functions from types to types AKA types depending on types. Here we'll be looking at one more extension: types depending on terms, otherwise known as dependent types.
Unlike \(\lambda 2\) & \(\lambda \underline{\omega}\), dependent types are not something seen in familiar languages like C++ or Rust. Because of that, and, frankly, because I've found them challenging to understand, I'm going to proceed more slowly than in the earlier posts.
However, that work will, I hope, pay off in the second half of this post when we finally get to the point to which I've been working all this time: using a type system as a framework for formal proof, in which proving things becomes an exercise in term finding (in fact, the "P" in \(\lambda P\) stands for "Predicate").
\(\lambda P\)
Motivational Example
Let's use a standard motivational example for dependent types: length-indexed lists. Suppose we're working in Haskell, we're using lists of Double
to represent vectors, and we'd like to implement the venerable SAXPY operation (Scalar Alpha X Plus Y):
saxpy :: Double -> [Double] -> [Double] -> [Double] saxpy alpha x y = zipWith (\x y -> alpha*x + y) x y
The problem here, of course, is: what if the two vectors are of different size? This implementation will happily return an answer (it will stop the "zip" operation at the end of the shorter of the two vectors), but that answer will be meaningless.
Of course, we could add a check to our implementation, before the zipWith
invocation, and perhaps abort execution. Better would be to return Maybe [Double]
, with the convention that we'll return Nothing
on error, but this would require our callers to unpack this at every call site (I jest; of course our callers won't always check). Really, what we'd like to do is make the length of the vector part of its type, and arrange to have invocations of saxpy
on arguments of different lengths just be illegal.
In order to do this, we'd need a term in our type system something like:
\begin{equation} \lambda n:\mathbb{N}.\mathbb{V}_n \end{equation}where \(\mathbb{V}_n\) is the set of all vectors of length \(n\). Such a term could be regarded as an indexed type family in that it describes not a single type, but a family of types, each corresponding to some \(n\in \mathbb{N}\), Notably, the type of such a term would have to be \(\mathbb{N}\rightarrow\ast\), something heretofore illegal.
The Π Type
We'll represent indexed type families with a "Π-type": a term of the form \(\Pi x:A.M\) where \(A:\ast\) and \(M\) is a type. We could have a formation rule that looks something like:
\begin{equation} \textbf{(form-v0) } \frac{\Gamma\vdash A:\ast\quad\Gamma,x:A\vdash B:\square}{\Gamma\vdash\Pi x:A.B:\square} \end{equation}Cool, cool: we can express families of types indexed by some other type, and that mapping is itself \(\square\).
Now suppose we want to write a function that, given \(n\in \mathbb{N}\), returns the zero vector of length \(n\) (i.e. the vector \(\langle 0,0,\dots,0\rangle\) of length \(n\)). Call it zero
. This would be a function from \(n\rightarrow \mathbb{V}_n\), because it takes as its argument \(n\in \mathbb{N}\) and returns an element in \(\mathbb{V}_n\). We might try picking out that category of functions with a Π-term of
Since we've moved "down" in some sense, to functions yielding values instead of types, we might guess this type to be ∗, and write a formation rule like:
\begin{equation} \textbf{(form-v1) } \frac{\Gamma\vdash A:\ast\quad\Gamma,x:A\vdash B:\ast}{\Gamma\vdash\Pi x:A.B:\ast} \end{equation}Schematically, if I've wrapped my brain around this correctly, we have:
\begin{align} & & &\lambda n:\mathbb{N}.\mathbb{V}_n &: \quad& \Pi n:\mathbb{N}\rightarrow\ast &: \quad &\square \nonumber\\ &\text{zero}(n) &: \quad &\Pi n:\mathbb{N}.\mathbb{V}_{n} &: \quad &\ast &: \quad &\square \end{align}Π-types are a generalization of the arrow type in that they both describe mappings, but in arrow types the term on the right-hand side of the arrow just doesn't depend on the value on the left. I imagine \(\Pi x:A.\underline{ }\) to represent "a mapping that takes \(x\) into the thing in the blank that is defined by \(x\) (\(\mathbb{V}_n\), e.g.) But if that thing has no dependency on \(x\), say it's just \(\mathbb{Z}\), then all we're expressing is the type of mappings that take \(x\in A\) to \(\mathbb{Z}\)… or \(A\rightarrow \mathbb{Z}\).
Derivation Rules
The reader has no doubt noted that the formation rules (form-v0) and (form-v1) differ only in the sort of thing that is produced. \(\lambda P\), like \(\lambda \underline{\omega}\), parameterizes numerous rules over sort (i.e. ∗ or \(\square\)). In fact, the derivation rules are quite similar to those of \(\lambda \underline{\omega}\), except that we're now swapping types dependent on types for types dependent on terms:
- (sort), (var), (weak) & (conv) are unchanged
- (form) & (abst) now produce Π-types rather than arrow types
- (appl) now works with Π-types rather than arrow types
It should be noted, however, that we'll still write \(A\rightarrow B\) for \(\Pi x:A.B\) when \(x\) does not occur free in \(B\) (as discussed above).
The complete set of derivation rules for \(\lambda P\):
\begin{align} &\textbf{(sort) } &\emptyset\vdash\ast:\square \nonumber\\ \nonumber\\ &\textbf{(var) } &\frac{\Gamma\vdash A:s}{\Gamma,x:A\vdash x:A}\text{ if }x\not\in\Gamma \nonumber\\ \nonumber\\ &\textbf{(weak) } &\frac{\Gamma\vdash A:B \enspace \Gamma\vdash C:s}{\Gamma,x:C\vdash A:B}\text{ if }x\not\in\Gamma \nonumber\\ \nonumber\\ &\textbf{(form) } &\frac{\Gamma\vdash A:\ast \quad \Gamma, x:A\vdash B:s}{\Gamma\vdash \Pi.x:A.B:s} \nonumber\\ \nonumber\\ &\textbf{(appl) } &\frac{\Gamma\vdash M:\Pi x:A.B \quad \Gamma\vdash N:A}{\Gamma\vdash MN:B[x:=N]}\nonumber\\ \nonumber\\ &\textbf{(abst) } &\frac{\Gamma,x:A\vdash M:B \quad \Gamma\vdash \Pi x:A.B:s}{\Gamma\vdash\lambda x:A.M:\Pi x:A.B} \nonumber\\ \nonumber\\ &\textbf{(conv) } &\frac{\Gamma\vdash A:B \enspace \Gamma\vdash B^{\prime}:s}{\Gamma\vdash A:B^{\prime}}\text{ if }B\equiv_{\beta}B^{\prime} \end{align}A Derivation in \(\lambda P\)
Nederpelt & Geuvers then present a non-trivial derivation in \(\lambda P\). I found it tough going, so I'm going to re-cast it here augmenting it in two ways in the hopes of making it clearer to the reader.
Firstly, I'm going to break-out a few frequently-used "sub-derivations" that were not obvious to me at first blush, and show them in tree form. I touched on them in my last post, but I'm going to gather them all here for reference.
Firstly, starting from nothing, we can get a type into the context thusly:
Figure 1: Pattern 1: Introducing a type and deriving it
While the fact that ∗ is \(\square\) is true of the empty context (by (sort)) we need to derive that in a non-empty context:
Figure 2: Pattern 2: Deriving that star is box in a non-empty context
I think of this one as "adding (to the context) at the next level":
Figure 3: Adding a term at the next level
I should note that Nederpelt & Geuvers pick-out these three judgements, along with \(\alpha:\ast,\beta:\ast\vdash\alpha:\ast\) and \(\alpha:\ast,\beta:\ast\vdash\beta:\ast\) (both of which were derived in the last post) as being "very obvious" and hence not needed in subsequent flag format derivations. For myself, I'm not quite ready to take off the training wheels, so to speak, so I'll be retaining them in derivations throughout this post.
Let's begin working through their derivation (I'll get to my second addition in a moment):
Figure 4: \(\lambda P\) Example in Flag Format
This derivation should be fairly comfortable, up until the last step. In fact, everything up until that last step is also derivable in \(\lambda \underline{\omega}\). A term of the form \(A\rightarrow\ast\) (where \(A\) is itself ∗) is something new: a mapping from a type into the universe of types. Note that this is an abuse of our notation for Π-terms. Strictly speaking, this should be written \(\Pi x:A.\ast\), but since the right-hand side has no dependency on the left, we decay to simply \(A\longrightarrow\ast\).
With that, let's add add a term \(P\) of that type to our derivation:
Figure 5: \(\lambda P\) Example in Flag Format
Now, to help fix ideas, I'm going to add something else to this exposition: a very simple worked example. Let's let \(A:=bool=\{false, true\}\). So what we've derived here is that, in \(\lambda P\), we can have a mapping from booleans to types, and the type of that mapping is \(\square\). What might such a mapping look like? Well, a trivial example would be:
\begin{equation} P(x)= \begin{cases} \mathbb{Z}, & \text{if }x=false \\ \mathbb{N}, & \text{if }x=true \end{cases} \end{equation}If we pick an \(x\) in \(A\), and apply \(P\) to it:
Figure 6: \(\lambda P\) Example in Flag Format
Yup, yup, yup… if \(P:A\rightarrow\ast\), and \(x:A\) then indeed we should expect \(Px\) to be a type (i.e. \(Px:\ast\)). Let's continue thinking through our little example. Let \(x=true\), then \(Px=\mathbb{N}\), which is indeed ∗– good. Now… let's abstract over \(x\) to get a "real" dependent product. We have that \(A\) is ∗, and we have that given an \(x\in A\), \(Px\) is also ∗, so by (form), we're allowed to write \(\Pi x:A.Px:\ast\):
Figure 7: \(\lambda P\) Example in Flag Format
Now, this took me a few minutes when I first read it, but what we have is: \(\Pi x:A.Px\) is a well-formed type (given that \(A:\ast\) and \(P:A\rightarrow\ast\)). It is the type of functions that map an \(x\in A\) into the type \(Px\). What might such a function look like? Here's a trivial example:
\begin{equation} P^{\prime}(x)= \begin{cases} -1\in\mathbb{Z}, & \text{if }x=false \\ 1\in\mathbb{N}, & \text{if }x=true \end{cases} \end{equation}So… \(P^{\prime}\) is a function that takes an \(x\in \{false,true\}\) and returns a value in the the type to which \(x\) maps.
Alright, let's keep going. We're now going to show that \(Px\rightarrow Px\) is also a type, and build an inhabitant of that type:
Figure 8: \(\lambda P\) Example in Flag Format
Concluding our example, suppose we let \(x=false\). Then the result of applying \((\lambda x:A.\lambda y:Px.y)x\) is \(\lambda y:\mathbb{Z}.y\)– the identity function over \(\mathbb{Z}\). Similarly, applying this term to \(x=true\) will just yield the identity function over \(\mathbb{N}\). More generally, as \(x\) ranges over \(A\), applying this term to \(x\) will build the identity function over the type to which \(P\) maps \(x\). Cool.
Propositions as Types
Alright, we're coming-up to the reason I'm interested in all of this: type systems as proof checkers. I'm going to show, or rather, explain what Nederpelt & Geuvers show: that \(\lambda P\) can express a certain logic, that propositions in that logic are types in \(\lambda P\), and that proving those propositions is an exercise in term finding.
That said, since I haven't said much about logic so far, set aside all the type theory for a few moments, and let's talk about the logic side of things.
A Very Simple Logic
Our logic is called minimal predicate logic. It provides two operations: implication & universal quantification. The former means we can express the fact that if some proposition \(A\) is true, then some other proposition \(B\) must be true (this is written \(A\Rightarrow B\)). The latter means that we can say, for some set \(A\), that a proposition indexed by \(A\) is true for all of its elements: \(\forall x\in A.P(x)\). We call such a family of propositions a predicate. If \(n\in \mathbb{N}\), then the statement "\(n\) is prime" is a predicate over \(\mathbb{N}\)– any given \(n\) produces a particular proposition which may be true or false.
In formal logic, each operation comes with one or more introduction rules and one or more elimination rules. A formal proof is very much like a derivation in that it begins with assumptions and/or axioms and ends with that which is to be shown. In between, the steps are connected by these rules. I'll introduce the rules for minimal predicate logic, then work a few examples to fix ideas.
For implication we have:
\begin{align} \label{eq:6} &(\Rightarrow\text{-intro }) &\frac{\text{Assume: A}\dots B}{A\Rightarrow B} \nonumber\\ &(\Rightarrow\text{-elim }) &\frac{A\Rightarrow B \quad A}{B} \end{align}In other words, if we assume \(A\), and under that assumption derive \(B\), we are entitled to conclude \(A\Rightarrow B\). Conversely, if \(A\Rightarrow B\) and \(A\) are known to be true, we may write \(B\).
For universal quantification we have:
\begin{align} \label{eq:7} &(\forall\text{-intro }) &\frac{\text{Let }x\in S\dots P(x)}{\forall x\in S.P(x)} \nonumber\\ &(\forall\text{-elim }) &\frac{\forall x\in S.P(x) \quad N\in S}{P(N)} \end{align}Here, if we assume \(x\) is in S, and under that assumption derive \(P(x)\) then we can conclude that for all \(x\in S\), \(P(x)\) is true. Conversely, if we know that \(\forall x\in S.P(x)\) then we may conclude \(P(N)\) for any \(N\in S\).
Let's work an example. We'll prove that \(A\Rightarrow((A\Rightarrow B)\Rightarrow B)\) is a tautology (i.e. holds with no external assumptions; is always true). We'll start with a derviation in tree form. The superscripts indicate where assumptions are made and discharged (via introduction rules):
Figure 9: Deduction in Tree Form
We begin by assuming both \(A\) and \(A\Rightarrow B\) (at (1) & (2), respectively); both assumptions are given superscripts. \((\forall-elim)\) then allows us to conclude \(B\) at the point marked (3). With that, we can discharge assumption 1 (the assumption \(A\Rightarrow B\)) and conclude \((A\Rightarrow B)\Rightarrow B\) at the point marked (4). That, in turn, allows us to discharge assumption 2 at the point marked (5) and conclude our proof.
Deduction can also be performed in flag format:
Figure 10: Deduction in Flag Format
While I found derivations in tree form a bit easier to follow than flag format, when it comes to deduction I actually found flag format easier to read. This is exercise 5.5 in TTFP, by the way.
An Encoding of Logic in \(\lambda P\)
Now: how can we represent minimal predicate logic in \(\lambda P\)? The fundamental idea is to represent propositions as types. A proposition will be said to be true if and only if the corresponding type is inhabited. The proposition \(False\) corresponds to the type with no elements; conversely, if a proposition is true, then an inhabitant of the corresponding type is called a proof term. In other words, we have:
\begin{equation} \label{eq:8} \frac{\text{propositions}}{\text{proofs}}\equiv\frac{\text{types}}{\text{programs}} \end{equation}This, crudely, is the famous Curry-Howard correspondence, but for now let's stay focused on encoding minimal predicate logic in \(\lambda P\).
We'll also use types to represent sets of objects, such as the booleans or the natural numbers. Elements of set \(S\) are terms of type \(S\); i.e. \(x\in S \leftrightarrow x:S\). Such types are ∗.
A predicate over a set \(S\) is a mapping from \(S\) to the set of all propositions. Therefore, for a predicate \(P\) over \(S\), we have \(P:S\rightarrow\ast\). If \(x\in S\), then \(Px:\ast\); \(Px\) is a type which may or may not be inhabited. In the former case, the predicate \(Px\) is true, and in the latter false.
We'll model implication with arrow types; this is consistent with our intuition so far, since if \(A\Rightarrow B\), then the type corresponding to \(B\) is inhabited whenever that corresponding to \(A\) is. Therefore, whenever \(A\Rightarrow B\), we can find a function with the type \(f:A\rightarrow B\); i.e. \(A\rightarrow B\) is inhabited. And vice versa.
Finally, we'll model universal quantification as a Π-type. This, too, makes sense: if \(P(x)\) is true \(\forall x \in S\), then for each \(x\) the type corresponding to \(P(x)\) is inhabited. Thus, for each \(x\) we can construct a function from \(x\) into the type \(Px\). Such a function would then have type \(\Pi x:S.Px\).
Proof as Computation
Let's consider the following theorem in the minimal predicate logic:
\begin{equation} \label{eq:9} \forall x\in S.\forall y\in S.Q(x,y)\Rightarrow\forall u\in S.Q(u,u) \end{equation}Let's start by proving this via natural deduction, using the flag format:
Figure 11: Proof of Theorem 12 Via Natural Deduction
Now, let's re-cast that theorem as a problem in type theory. At the top level, it's an implication, so we're going to have an arrow type:
\begin{align} \label{eq:11} &\text{type} &\rightarrow &\quad\text{type} \nonumber\\ &(\forall x\in S.\forall y\in S.Q(x,y)) &\rightarrow &\quad(\forall u\in S.Q(u,u)) \end{align}Now, within each pair of parentheses, we have a universal quantification, so they'll each map to a Π-type. The only thing not straightforward is the double-quantification on the left-hand side due to the fact that \(Q\) is a predicate over two variables. We'll model that by Currying:
\begin{align} \label{eq:12} &\text{type} &\rightarrow &\quad\text{type} \nonumber\\ &(\Pi x:S.\Pi y:S.Qxy) &\rightarrow &\quad(\Pi u:S.Quu) \end{align}and so our problem is to find the question marks in:
\begin{equation} \label{eq:13} ?\vdash?: (\Pi x:S.\Pi y:S.Qxy) \rightarrow (\Pi u:S.Quu) \end{equation}We can setup the skeleton of our derivation by noting that \(S\) & \(Q\) both occur free in that judgement, so they'll need to be in the context:
Figure 12: Skeleton Derivation of Theorem 12
We now proceed in the same way that we did in the last post: at each step, we examine the type of the term marked by the "?", deduce from that the derivation rule that must be applied to reach that conclusion, and infer from that the form of the term that will go in place of the question mark, along with any new variables to be introduced and/or the type of the term that must then appear on the previous line. For instance, in this case, we see that the type of the term marked "?" above is an arrow type. The only way we could arrive at this point is to introduce a variable of the type on the left-hand side of the arrow, derive a term of the type on the right-hand side, and apply (abst), winding up with a term of the form \(\lambda z:(\Pi x:S.\Pi y:S.Qxy).\underline{ }\). Here, I'm going to change my convention slightly, since the form of the final term is now known up to the abstraction body, I'll replace that with … to indicate that we'll fill that in with whatever term is found for the "?" now written on the line above:
Figure 13: Derivation of Theorem 12 – First Steps
(4) above is again a Π-type, and so again must be a product of an application of (abst) over a variable of type \(S\), and producing a term of type \(Quu\):
Figure 14: Derivation of Theorem 12, Continued
Now we need to find a term of type \(Quu\) without adding any new variables to the context (otherwise, we wouldn't be proving the given theorem). But we have such a term immediately: \(zuu\):
Figure 15: Derivation of Theorem 12, Completed
For myself, I was able to follow the explanation in TTFP, and even to reproduce it for myself. However, I found it quickly became an abstract game in which I applied the various derivation rules to get the symbols to work out, completely detatched from any logical implications. And perhaps that's one of the appeals of this approach: to render natural deduction in a system in which a computer program can more readily operate. Still. I felt the need to go back over it all in a few ways.
Firstly, what is this term? Whelp… it's a function:
\begin{align} \label{eq:14} &\{\underline{\Pi x:S.\Pi y:S.Qxy}\} &\longrightarrow &\quad \{\underline{Pi u:S.Quu\}} \nonumber\\ & \qquad \quad \mathscr{X} & & \qquad \quad \mathscr{Y} \end{align}The term marked \(\mathscr{X}\) is the set of functions mapping \(x\in S,y\in S\) into the type \(Qxy\). The term marked \(\mathscr{Y}\) is the set of functions mapping \(u\in S\) into the type \(Quu\). So, a member of this arrow type, given a function in \(\mathscr{X}\) must produce a function in \(\mathscr{Y}\). Therefore,
- given a function \(z\in \mathscr{X}\)
- and given a \(u\in S\)
we need a formula to produce a value in \(Quu\). The term \(zuu\) is just such a formula.
Secondly, I wanted to go back & look for parallels between the natural deduction proof of this theorem and the type theoretical proof. Here's the deductive proof:
and here's a version of the type theoretical proof (with the assumptions leading to the final context removed):
We see a number of parallels; in fact, the proofs correspond to one another line-for-line. In each line, the deductive proof either assumes or derives a logical statement, and in the corresponding line of the term-finding proof, we are either assuming or deriving a term in the type representing the corresponding proposition. We see, too, that there's a correspondence between \(\forall-elim\) and (appl), and between \(\forall-intro\) and (abst). That said, the type theoretical proof doesn't just prove the given theorem, but also produces the actual proof term, from which one could reconstruct the entire derivation.
I worked out similar analyses for two of the exercises in TTFP, but this post is getting a bit long. Let's take a step back.
Where Are We?
We've augmented \(\lambda\rightarrow\) to be able to write more generalized function types; functions from a set in ∗ to ∗, or from that set into such an indexed type family. This is sometimes called "types depending on terms". This has led to a much more expressive system; expressive enough to begin, via Curry-Howard, to use it as a formal proof system.
The reader may well ask: why? Why not simply use natural deduction? I certainly don't know enough to given a general answer, though I gather the fact that we're actually constructing proof terms in the type-theoretical approach is part of it. According to this very informative discussion on MathOverflow, the competing approach for formalizing mathematical statements is set theory (presumably augmented with some kind of predicate logic), and to date the type theoretic approach has proved more practical (although it is admitted that it has also received far more effort, in terms of building practical proof assistants).
Regardless, this whole series of posts was kicked-off by my wanting to understand Coq, and this is the approach used by Coq; the abovelinked discussion notes "…Mizar is based on set theory, while HOL, Lean, Coq, and Agda use type theory in the kernel."
With this, we've explored the three dimensions of Barendregt's λ Cube. In the next post, we'll introduce that, briefly go over the type system \(\lambda C\), or the Calculus of Constructions, which resides at the far corner of the cube, and finally mention the Calculus of Inductive Constructions: the extension of \(\lambda C\) actually used by Coq.
05/28/24 18:25