The Second Order Typed Lambda Calculus


Terms depending on types

Abstract

This post is part of a series on how to reason about code. It describes the Second Order Typed Lambda Calculus, a precursor to the formal language understood by Coq, an interactive proof assistant I'm learning. Prior posts:

This exposition follows closely that in "Type Theory & Formal Proof" by Nederpelt & Geuvers (abbreviated TTFP throughout), with help from "Lambda Calculi with Types" by Henk Barendregt and a nice post I came across entitled "A Guide to Typed Lambda Calculus".

Recap

So far, we've introduced the λ Calculus as a very general formal model of computation. We saw that we could write down λ terms for which computation never terminated and we saw that in his 1940 paper "A Formulation of the Simple Theory of Types" Church added a type system to the λ Calculus that ruled-out such terms (this was the subject of the last post). The resulting system is called the Simply Typed λ Calculus, or \(\lambda\rightarrow\), and while it has the very nice property that all terms are strongly normalizing (in other words, their reduction to normal form is guaranteed to terminate), it is not terribly expressive.

While it's frequently noted that it is capable of encoding the extended polynomials over \(\mathbb{N}\), it was never clear to me whether that was exactly the set of things expressible in \(\lambda\rightarrow\), or whether it was a subset. "Lectures on the Curry-Howard Isomorphism" states that "The \(\lambda\rightarrow\)-definable functions are exactly the extended polynomials [over \(\mathbb{N}\)]" with the proviso that "If one does not insist that numbers be uniformly represented as terms of type int, more functions become \(\lambda\rightarrow\)-definable…" I'm not sure how to interpret that.

I found this very nice MathOverflow comment, referencing a result from the paper "On the Expressive Power of Simply Typed and Let-Polymorphic Lambda Calculi" (Hillebrand & Kanellakis, 1996) to the effect that \(\lambda\rightarrow\) expresses exactly the regular languages on \(\{0,1\}\). That is to say, the languages on the alphabet \(\{0,1\}\) that can be recognized by a finite-state machine, or, equivalently, regular expressions.

Regardless, there are many sorts of mathematical objects we would like to represent beyond this. This has led to efforts in the decades since to develop different type systems on top of the \(\lambda\) Calculus that, on the one hand, preserve \(\lambda\rightarrow\)'s strong normalization, but on the other expand the class of representable things. There have been many such excursions, but in this series we're going to limit ourselves to three that are particularly significant.

The subject of this post will be the first of the three; it was discovered in 1972 by Jean-Yves Girard in his PhD thesis & he called it "System F". John Reynolds independently developed it in 1974, though he called it the polymorphic lambda calculus. Today it is also known as second-order lambda calculus, or \(\lambda 2\).

\(\lambda 2\)

We'll start with a motivational example: the identity function. We saw this function in the untyped \(\lambda\) Calculus: \(\lambda x.x\). With the advent of types in \(\lambda\rightarrow\), this became an infinite set of identity functions, one for each type: \(\lambda x:\alpha.x:\alpha\rightarrow\alpha\), \(\lambda x:\beta.x:\beta\rightarrow\beta\), \(\lambda x:\gamma.x:\gamma\rightarrow\gamma\), and so on.

It's unsatisfying to have all these variants of what are structurally the same function floating around as independent things; it would be nice to be able to write something like \(\Lambda\alpha:\ast.\lambda x:\alpha.x\). We can think of this as the "polymorphic identity function". We should note that polymorphism is an overloaded word in type theory and that this is, strictly speaking, parametric polymorphism; it corresponds to the following in C++:

template <typename T>
T id(T x) {
    return x;
}

in that we're writing a general-purpose function and the type of the thing to which it is being applied is encoded as a type variable in the implementation. We're not doing:

int id(int x) {
    return x;
}
float id(float x) {
    return x;
}
// ...

and so on for some ad-hoc collection of types.

This, of course, raises the question of what the type of \(\Lambda\alpha:\ast.\lambda x:\alpha.x\) should be. Following Nederpelt & Geuvers, we might naively reason that since it accepts arguments of type ∗ and returns functions of type \(\alpha\rightarrow\alpha\), we should type it as \(\ast\rightarrow(\alpha\rightarrow\alpha)\). However, a little pressure-testing will show that this can't be right. This would yield a different type for the term \(\Lambda\beta:\ast.\lambda x:\beta.x\) (namely \(\ast\rightarrow(\beta\rightarrow\beta)\)). Going all the way back to α-conversion, we have identified terms that differ only in their binders and corresponding bound variables. Since the terms

\begin{align} &\Lambda\alpha:\ast.\lambda x:\alpha.x \nonumber\\ &\Lambda\beta:\ast.\lambda x:\beta.x \nonumber \end{align}

differ only in their bound variables (α & β respectively), we'd like these two terms to be α-equivalent and to have the same type.

To solve this, we need to extend our language for expressing types beyond simple types & arrow types. In order to express "a thing that takes an arbitrary type α and returns a thing of type \(\alpha\rightarrow\alpha\)", we'll write \(\Pi\alpha:\ast.\alpha\rightarrow\alpha\).

That's it– we're now ready to proceed along the lines of our \(\lambda\rightarrow\) development. We once again give ourselves an infinite set of ordinary variables \(\mathscr{V} = \{a,b,c,\dots\}\) and a second infinite set of type variables \(\mathbb{V} = \{\alpha,\beta,\gamma,\dots\}\).

Now we define a larger set of types \(\mathbb{T}2\) inductively as:

  1. if \(\alpha\in \mathbb{V}\) then \(\alpha\in \mathbb{T}2\) (i.e. every simple type is in \(\mathbb{T}2\))
  2. if \(\alpha,\beta\in \mathbb{T}2\) then \(\alpha\rightarrow\beta\in \mathbb{T}2\) (so we still have arrow types)
  3. if if \(\alpha\in \mathbb{V}\) and \(T\in \mathbb{T}2\), then the Pi-type \(\Pi\alpha:\ast.T\) is in \(\mathbb{T}2\)

Define the set of pre-typed \(\lambda 2\) terms as:

\begin{equation} \Lambda_{\mathbb{T}2} = \mathscr{V}|(\Lambda_{\mathbb{T}2}\Lambda_{\mathbb{T}2})| (\Lambda_{\mathbb{T}2} \mathbb{T}2)|(\lambda \mathscr{V}:\mathbb{T}2.\Lambda_{\mathbb{T}2})| (\Lambda \mathbb{V}:\ast.\Lambda_{\mathbb{T}2}) \end{equation}

This is an extension of \(\lambda\rightarrow\) in that we have all the terms we defined before, but we've added application & abstraction over types. We'll also need to get a little stricter regarding how we define contexts. In \(\lambda\rightarrow\) a context was just a list of declarations of the form \(x:\sigma\) where \(x\) was an ordinary variable and \(\sigma\) a type variable. Now we require that type variables be declared as well, and that all types be declared before they're used.

This brings us to derivation rules. We'll need a rule for abstracting over types instead of terms:

\begin{equation} \textbf{(abst2) } \frac{\Gamma,\alpha:\ast\vdash M:A}{\Gamma\vdash\Lambda\alpha:\ast.M:\Pi\alpha:\ast.A} \end{equation}

In other words, given a term such as \(\lambda x:\alpha.x\), we can abstract over the type α to get a new term whose type is a Π-type. I've tried to distinguish abstraction over types from abstraction over terms by using \(\Lambda\) instead of \(\lambda\).

We'll also need a derivation rule for the application of Π-types to other types:

\begin{equation} \textbf{(appl2) } \frac{\Gamma\vdash M:\Pi\alpha:\ast.A \quad \Gamma\vdash B:\ast} {\Gamma\vdash M[B]:A[\alpha := B]} \end{equation}

Again, I've tried to distinguish application to types from application to terms by writing \(M[B]\) instead of simply \(MB\).

There's only one small complication here. The second premise to (appl2) is \(\Gamma\vdash B:\ast\) and so far we have no derviation rule enabling us to conclude that a given term is ∗. For this, we introduce one more (new) rule, the "formation rule" (so named, I gather, because it enables us to prove terms are well-formed):

\begin{aligned} \textbf{(form) } &\Gamma\vdash B:\ast\enspace\text{ if }\Gamma\text{ is a }\lambda 2\text{ context, }B\in\mathbb{T}2\text{,}\nonumber\\ &\text{and all free variables in }B\text{ are declared in }\Gamma \end{aligned}

Here is the complete set of derivation rules for \(\lambda 2\):

\begin{align} \textbf{(var) } &\Gamma\vdash x:\sigma\text{ if }\Gamma\text{ is a }\lambda 2\text{ context and }x:\sigma\in\Gamma\nonumber\\ \nonumber\\ \textbf{(appl) } &\frac{\Gamma\vdash M:\sigma\rightarrow\tau \quad \Gamma\vdash N:\sigma}{\Gamma\vdash M N:\tau}\nonumber\\ \nonumber\\ \textbf{(abst) } &\frac{\Gamma,x:\sigma\vdash M:\tau}{\Gamma\vdash\lambda x:\sigma.M:\sigma\rightarrow\tau}\nonumber\\ \nonumber\\ \textbf{(form) } &\Gamma\vdash B:\ast\text{ if }\Gamma\text{ is a }\lambda 2\text{ context, }B\in\mathbb{T}2\nonumber\\ &\text{ and all free variables in B are declared in }\Gamma \nonumber\\ \nonumber\\ \textbf{(appl2) } &\frac{\Gamma\vdash M:(\Pi\alpha:\ast.A) \quad \Gamma\vdash B:\ast} {\Gamma\vdash M[B]:A[\alpha:=B]}\nonumber\\ \nonumber\\ \textbf{(abst2) } &\frac{\Gamma,\alpha:\ast\vdash M:A} {\Gamma\vdash\lambda\alpha:\ast.M:\Pi\alpha:\ast.A} \end{align}

In order to solidify our understanding, let's work through an example in detail. Let's find a term in \(\Lambda_{\mathbb{T}2}\) that inhabits the type:

\begin{equation} \Pi\delta:\ast.((\alpha\rightarrow\gamma)\rightarrow\delta)\rightarrow(\alpha\rightarrow\beta)\rightarrow(\beta\rightarrow\gamma)\rightarrow\delta \end{equation}

assuming a context \(\Gamma:=\{\alpha:\ast,\beta:\ast,\gamma:\ast\}\) (this is exercise 3.6(b) in TTFP). Because this is a long one, I'm going to try a different exposition: I've typeset my solution pretty-much just the way I worked it out with paper & pen. The numbering on the right-hand side indicates the order in which I wrote down lines. If the reader really wants to understand how these derivations work, however, I've found there's no substitute for taking paper & pen and working it out for oneself. Here's my work:

lambda2-example-derivation.svg

Figure 1: Term-Finding in \(\lambda 2\)

I began by writing down line (1) (the given context), and since the target type is a Π-type, the only rule that could possibly yield a term of that type is (abst2), so I gave myself a type δ on line (2) & a Λ-abstraction for the line marked (14).

You'll see that I left the term over which we are abstracting on (14) empty (well, marked with a "?"). I could see that whatever it was, it had an arrow type; I knew this from reading the target type after the Π binder on line (14). Therefore, I added a variable, named \(x\) at random, of type equal to the first component of the arrow type at line (3), and on line (13) wrote down what I knew about what I needed for the "?" on line (14): it had to be an abstraction binding \(x:(\alpha\rightarrow\gamma)\rightarrow\delta\). I just marked the function body "?" and noted the necessary type once again. In other words, the term on line (13) should fit the "?" on line (14).

The same game is played out in the pairs (4) & (12) and (5) & (11). In each case, the unknown term on the second line in each pair had an arrow type, so I concluded that, whatever else it might be, it was going to be an abstraction with a binder of whatever type appeared on the left-hand side of the arrow.

By line (11), I had the problem reduced to finding some term, any term, that had type δ. Now, one might think I could just throw \(v:\delta\) into my context, but that would mean I'd wind-up with a new variable in Γ at the end (since \(v\) wouldn't be bound anywhere), which violates the problem statement. What I needed to do was somehow combine the terms available to me at that point to produce a term of type \(\delta\).

Of course, if you stare at it long enough, you might "just see" a solution, but I'd like to describe a reproducible process. Bear in mind that I had not yet written down line (6) at this point. Examining my flags (i.e. the state of my context at this point), I could only see one possible way to produce a thing of type δ: apply \(x\) to something of type \(\alpha\rightarrow\gamma\). I wrote down line (10) to indicate that.

Now I needed a term of type \(\alpha\rightarrow\gamma\)– a term abstraction. This meant I needed a flag, giving me lines (6) & (9).

Now I needed a way to produce something of type \(\gamma\) (the body of my abstraction on line (9)). Again, the only way available to me is \(z:\beta\rightarrow\gamma\), but in order to use it, I needed a term of type β– this is line (8). Finally, we see that we have \(u:\alpha\) and \(y:\alpha\rightarrow\beta\)– there's the term we need, written down on line (7).

I'll leave it as an exercise to the reader to back-fill the question marks to show that

\begin{align} \Lambda\delta:\ast.\lambda x:(\alpha\rightarrow\gamma)\rightarrow\delta. \lambda y:\alpha\rightarrow\beta.\lambda z:\beta\rightarrow\gamma. x(\lambda u:\alpha.z(yu)):\nonumber\\ \Pi:\ast.((\alpha\rightarrow\gamma)\rightarrow\delta)\rightarrow(\alpha\rightarrow\beta)\rightarrow(\beta\rightarrow\gamma)\rightarrow\delta \end{align}

Where Are We?

We've extended \(\lambda\rightarrow\) by allowing abstraction over types as well as terms, along with the corresponding application to types as well as terms. This is sometimes referred to by the shorthand "terms depending on types" in that the \(\lambda 2\)-term \(\Lambda\alpha:\ast.\lambda x:\alpha.x\) must first be applied to a type before it can be used for further computations.

Something on which I'm still confused is the expressive power of \(\lambda 2\); in fact, it's not clear to me that it's any more expressive than \(\lambda\rightarrow\) (i.e. it could simply be a nicer way of writing-down the same things). I know that Girard originally developed it (under the name System F) to study second-order arithmetic but I can't say if what he was expressing couldn't be also expressed in \(\lambda\rightarrow\), albeit less conveniently. Now Barendregt, in "Lambda Calculi with Types", stated that \(\lambda\rightarrow\) corresponds to intuitionist propositional logic (with only → as a connective) and \(\lambda 2\) to second-order propositional logic with only → and ∀ as primitives (although ¬, ∧, ∨, and ∃ are definable), so I gather that \(\lambda 2\) is more expressive, I just can't say exactly how.

In the next post, we'll cover a different extension to \(\lambda\rightarrow\), namely \(\lambda\underline{\omega}\).

05/23/24 13:51