Saturday, January 14, 2017

Intersection Types as Denotations

In my previous post I described a simple denotational semantics for the CBV lambda calculus in which the meaning of a \(\lambda\) function is a set of tables. For example, here is a glimpse at some of the tables in the meaning of \(\lambda x. x+2\).

\[ E[\!| (\lambda x. x+2) |\!](\emptyset) = \left\{ \begin{array}{l} \emptyset, \\ \{ 5\mapsto 7 \},\\ \{ 0\mapsto 2, 1 \mapsto 3 \},\\ \{ 0\mapsto 2, 1\mapsto 3, 5 \mapsto 7 \}, \\ \vdots \end{array} \right\} \]

Since then I've been reading the literature starting from an observation by Alan Jeffrey that this semantics seems similar to the domain logic in Abramsky's Ph.D. thesis (1987). That in turn pointed me to the early literature on intersection types, which were invented in the late 1970's by Coppo, Dezani-Ciancaglini, Salle, and Pottinger. It turns out that one of the motivations for intersection types was to create a denotational semantics for the lambda calculus. Furthermore, it seems that intersection types are closely related to my simple denotational semantics!

The intersection types for the pure lambda calculus included function types, intersections, and a top type: \[ A,B,C ::= A \to B \mid A \wedge B \mid \top \] For our purposes we shall also add singleton types for numbers. \[ A,B,C ::= A \to B \mid A \wedge B \mid \top \mid n \] So the number \(2\) has the singleton type \(2\) and any function that maps \(0\) to \(2\) will have the type \(0 \to 2\). Any function that maps \(0\) to \(2\) and also maps \(1\) to \(3\) has the intersection type \[ (0 \to 2) \wedge (1 \to 3) \] These types are starting to look a lot like the tables above! Indeed, even the empty table \(\emptyset\) corresponds to the top type \(\top\), they both can be associated with any \(\lambda\) function.

The addition of the singleton number types introduces a choice regarding the top type \(\top\). Does it include the numbers and functions or just functions? We shall go with the later, which corresponds to the \(\nu\) type in the literature (Egidi, Honsell, Rocca 1992).

Now that we have glimpsed the correspondence between tables and intersection types, let's review the typing rules for the implicitly typed lambda calculus with singletons, intersections, and \(\top\).

\begin{gather*} \frac{}{\Gamma \vdash n : n} \\[2ex] \frac{}{\Gamma \vdash \lambda x.\,e : \top}(\top\,\mathrm{intro}) \quad \frac{\Gamma \vdash e : A \quad \Gamma \vdash e : B} {\Gamma \vdash e : A \wedge B}(\wedge\,\mathrm{intro}) \\[2ex] \frac{\Gamma \vdash e : A \quad A <: B} {\Gamma \vdash e : B}(\mathrm{Sub}) \quad \frac{x:A \in \Gamma}{\Gamma \vdash x : A} \\[2ex] \frac{\Gamma,x:A \vdash B} {\Gamma \vdash \lambda x.\, e : A \to B} \quad \frac{\Gamma \vdash e_1: A \to B \quad \Gamma \vdash e_2 : A} {\Gamma \vdash e_1 \; e_2 : B}(\to\mathrm{elim}) \end{gather*} where subtyping is defined as follows \begin{gather*} \frac{}{n <: n} \quad \frac{}{\top <: \top} \quad \frac{}{A \to B <: \top} \quad \frac{A' <: A \quad B <: B'} {A \to B <: A' \to B'} \\[2ex] \frac{C <: A \quad C <: B}{C <: A \wedge B} \quad \frac{}{A \wedge B <: A} \quad \frac{}{A \wedge B <: B} \\[2ex] \frac{}{(C\to A) \wedge (C \to B) <: C \to (A \wedge B)} \end{gather*}

With intersection types, one can write the same type in many different ways. For example, the type \(5\) is the same as \(5 \wedge 5\). One common way to define such equalities is in terms of subtyping: \(A = B\) iff \(A <: B\) and \(B <: A\).

So how does one define a semantics using intersection types? Barendregt, Coppo, Dezani-Ciancaglini (1983) (BCD) define the meaning of an expression \(e\) to be the set of types for which it is typable, something like \[ [\!| e |\!](\Gamma) = \{ A \mid \Gamma \vdash e : A \} \] For a simple type system (without intersection), such as semantics would not be useful. Any term with self application (needed for recursion) would not type check and therefore its meaning would be the empty set. But with intersection types, the semantics gives a non-empty meaning to all terminating programs!

The next question is, how does the BCD semantics relate to my simple table-based semantics? One difference is that the intersection type system has two rules that are not syntax directed: \((\wedge\,\mathrm{intro})\) and (Sub). However, we can get rid of these rules. The \((\wedge\,\mathrm{intro})\) rule is not needed for numbers, only for functions. So one should be able to move all uses of the \((\wedge\,\mathrm{intro})\) rules to \(\lambda\)'s. \[ \frac{\Gamma \vdash \lambda x.\, e : A \quad \Gamma \vdash \lambda x.\; e : B} {\Gamma \vdash \lambda x.\, e : A \wedge B} \] To get rid of (Sub), we need to modify \((\to\mathrm{elim})\) to allow for the possibility that \(e_1\) is not literally of function type. \[ \frac{\Gamma \vdash e_1 : C \quad C <: A \to B \quad \Gamma \vdash e_2 : A} {\Gamma \vdash e_1 \; e_2 : B} \]

All of the rules are now syntax directed, though we now have three rules for \(\lambda\), but those rules handle the three different possible types for a \(\lambda\) function: \(A \to B\), \(A \wedge B\), and \(\top\). Next we observe that a relation is isomorphic to a function that produces a set. So we change from \(\Gamma \vdash e : A\) to \(E[\!| e |\!](\Gamma) = \mathcal{A}\) where \(\mathcal{A}\) ranges over sets of types, i.e., \(\mathcal{A} \in \mathcal{P}(A)\). We make use of an auxiliary function \(F\) to define the meaning of \(\lambda\) functions. \begin{align*} E[\!| n |\!](\Gamma) & = \{ n \} \\ E[\!| x |\!](\Gamma) & = \{ \Gamma(x) \} \\ E[\!| \lambda x.\, e |\!](\Gamma) & = \{ A \mid F(A,x,e,\Gamma) \} \\ E[\!| e_1 \; e_2 |\!](\Gamma) & = \left\{ B \middle| \begin{array}{l} C \in E[\!| e_1 |\!](\Gamma) \\ \land\; A \in E[\!| e_2 |\!](\Gamma) \\ \land\; C <: A \to B \end{array} \right\} \\ \\ F(A \to B, x,e,\Gamma) &= B \in E[\!| e |\!](\Gamma(x:=A)) \\ F(A \wedge B, x,e,\Gamma) &= F(A, x,e,\Gamma) \land F(B, x,e,\Gamma) \\ F(\top, x,e,\Gamma) &= \mathrm{true} \end{align*}

I conjecture that this semantics is equivalent to the ``take 3'' semantics. There are a couple remaining differences and here's why I don't think they matter. Regarding the case for \(\lambda\) in \(E\), the type \(A\) can be viewed as an alternative representation for a table. The function \(F\) essentially checks that all entries in the table jive with the meaning of the \(\lambda\)'s body, which is what the clause for \(\lambda\) does in the ``take 3'' semantics. Regarding the case for application in \(E\), the \(C\) is a table and \(C <: A \to B\) means that there is some entry \(A' \to B'\) in the table \(C\) such that \(A' \to B' <: A \to B\), which means \(A <: A'\) and \(B' <: B\). The \(A <: A'\) corresponds to our use of \(\sqsubseteq\) in the ``take 3'' semantics. The \(B' <: B\) doesn't matter.

There's an interesting duality and change of viewpoint going on here between the table-based semantics and the intersection types. The table-based semantics is concerned with what values are produced by a program whereas the intersection type system is concerned with specifying what kind of values are allowed, but the types are so precise that it becomes dual in a strong sense to the values themselves. To make this precise, we can talk about tables in terms of their finite graphs (sets of pairs), and create them using \(\emptyset\), union, and a singleton input-output pair \(\{(v_1,v_2)\}\). With this formulation, tables are literally dual to types, with \(\{(v_1,v_2)\}\) corresponding to \(v_1 \to v_2\), union corresponding to intersection, empty set corresponding to \(\top\), and \(T_1 \subseteq T_2\) corresponding to \(T_2 <: T_1\).

Wednesday, December 21, 2016

Take 3: Application with Subsumption for Den. Semantics of Lambda Calculus

Alan Jeffrey tweeted the following in reply to the previous post:

@jeremysiek wouldn't it be easier to change the defn of application to be
⟦MN⟧σ = { W | T ∈ ⟦M⟧σ, V ∈ ⟦N⟧σ, (V′,W) ∈ T, V′ ⊆ V }?

The idea is that, for higher order functions, if the function \(M\) is expecting to ask all the questions in the table \(V'\), then it is OK to apply \(M\) to a table \(V\) that answers more questions than \(V'\). This idea is quite natural, it is like Liskov's subsumption principle but for functions instead of objects. If this change can help us with the self application problem, then it will be preferable to the graph-o-tables approach described in the previous post because it retains the simple inductive definition of values. So let's see where this takes us!

We have the original definition of values

\[ \begin{array}{lrcl} \text{values} & v & ::= & n \mid \{ (v_1,v'_1),\ldots,(v_n,v'_n) \} \end{array} \]

and here is the denotational semantics, updated with Alan's suggestion to include the clause \(v'_2 \sqsubseteq v_2\) in the case for application.

\begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ \rho(x) \} \\ E[\!| \lambda x.\; e |\!](\rho) &= \{ T \mid \forall v v'. (v,v') \in T \Rightarrow v' \in E[\!|e|\!](\rho(x:=v)) \} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ v \middle| \begin{array}{l} \exists T v_2 v'_2. T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land v'_2 \sqsubseteq v_2 \land (v'_2,v) {\in} T \end{array} \right\} \end{align*}

The ordering on values \(\sqsubseteq\) used above is just equality on numbers and subset on function tables.

The first thing to check is whether this semantics can handle self application at all, such as \[ (\lambda f. f \; f) \; (\lambda g. \; 42) \]

Example 1. \( 42 \in E[\!| (\lambda f. f \; f) \; (\lambda g. \; 42) |\!](\emptyset) \)
The main work is figuring out witnesses for the function tables. We're going to need the following tables: \begin{align*} T_0 &= \emptyset \\ T_1 &= \{ (\emptyset, 42)\} \\ T_2 &= \{ (T_1, 42) \} \end{align*} Here's the proof, working top-down, or goal driven. The important use of subsumption is the \( \emptyset \sqsubseteq T_1 \) below.

  • \( T_2 \in E[\!| (\lambda f. f \; f)|\!](\emptyset)\)
    So we need to show: \( 42 \in E[\!| f \; f|\!](f:=T_1) \)
    • \( T_1 \in E[\!| f |\!](f:=T_1) \)
    • \( T_1 \in E[\!| f |\!](f:=T_1) \)
    • \( \emptyset \sqsubseteq T_1 \)
    • \( (\emptyset, 42) \in T_1 \)
  • \( T_1 \in E[\!| (\lambda g. \; 42) |\!](\emptyset)\)
    So we need to show \( 42 \in E[\!| 42 |\!](g:=\emptyset)\), which is immediate.
  • \( T_1 \sqsubseteq T_1 \)
  • \( (T_1,42) \in T_2 \)

Good, so this semantics can handle a simple use of self application. How about factorial? Instead of considering factorial of 3, as in the previous post, we'll go further this time and consider factorial of an arbitrary number \(n\).

Example 2. We shall compute the factorial of \(n\) using the strict version of the Y combinator, that is, the Z combinator. \begin{align*} M & \equiv \lambda x. f \; (\lambda v. (x\; x) \; v) \\ Z & \equiv \lambda f. M \; M \\ F & \equiv \lambda n. \mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)\\ H & \equiv \lambda r. F \\ \mathit{fact} & \equiv Z\, H \end{align*} We shall show that \[ n! \in E[\!|\mathit{fact}\;n|\!](\emptyset) \] For this example we need very many tables, but fortunately there are just a few patterns. To capture these patterns, be define the following table-producing functions. \begin{align*} T_F(n) &= \{ (n,n!) \} \\ T_H(n) &= \{ (\emptyset,T_F(0)), (T_F(0), T_F(1)), \ldots ,(T_F(n-1), T_F(n)) \} \\ T_M(n) &= \begin{cases} \emptyset & \text{if } n = 0 \\ \{ (T_M(n'), T_F(n')) \} \cup T_M(n') & \text{if } n = 1+n' \end{cases} \\ T_Z(n) &= \{ (T_H(n), T_F(n) )\} \end{align*} \(T_F(n)\) is a fragment of the factorial function, for the one input \(n\). \(T_H(n)\) maps each \(T_F(i)\) to \(T_F(i+1) \) for up to \(i+1 = n\). \(T_M(n)\) is the heart of the matter, and what makes the self application work. It maps successively larger versions of itself to fragments of the factorial function, that is \[ T_M(n) = \left\{ \begin{array}{l} T_M(0) \mapsto T_F(0) \\ T_M(1) \mapsto T_F(1) \\ \vdots & \\ T_M(n-1) \mapsto T_F(n-1) \end{array} \right\} \] For example, here is \(T_M(4)\):

The tables \( T_M \) enable self application because we have the following two crucial properties:
  1. \( T_M(n) \sqsubseteq T_M(1+n) \)
  2. \( (T_M(n), T_F(n)) \in T_M(1+n) \)
The main lemma's that we prove are

Lemma 1. If \(n \le k\), then \(T_M(1+n) \in E[\!| M |\!](f:=T_H(k)) \).

Lemma 2. \( T_Z(n) \in E[\!| Z |\!](\emptyset) \)

If you're curious about the details for the complete proof of \( n! \in E[\!|\mathit{fact}\;n|\!](\emptyset) \) you can take a look at the proof in Isabelle that I've written here.

This is all quite promising! Next we look at the proof of soundness with respect to the big step semantics.

Soundness with Respect to the Big-Step Semantics

The proof of soundness is quite similar to that of the first version, as the relation \(\approx\) between the denotational and big-step values remains the same. However, the following two technical lemmas are needed to handle subsumption.

Lemma (Related Table Subsumption) If \(T' \subseteq T\) and \(T \approx \langle \lambda x.e, \rho \rangle\), then \(T' \approx \langle \lambda x.e, \rho \rangle\).
The proof is by induction on \(T'\).

Lemma (Related Value Subsumption) If \(v_1 \approx v'\) and \(v_2 \sqsubseteq v'\), then \(v_2 \approx v'\).
The proof is by case analysis, using the previous lemma when the values are function tables.

Theorem (Soundness).
If \(v \in E[\!| e |\!](\rho) \) and \( \rho \approx \rho' \), then \( \rho' \vdash e \Downarrow v' \) and \(v \approx v'\) for some \(v'\).

The mechanization of soundness in Isabelle is here.

Monday, December 19, 2016

Take 2: Graph of Tables for the Denotational Semantics of the Lambda Calculus

In the previous post, the denotational semantics I gave for the lambda calculus could not deal with self application, such as the program \[ (\lambda f. f\;f)\;(\lambda g. 42) \] whose result should be \(42\). The problem was that I defined function values to be tables of pairs of values, using a datatype definition, which rules out the possibility of cycles. In the above program, the table for \( \lambda g. 42 \) needs to include itself so that the application \(f \; f \) makes sense.

A straightforward way to solve this problem is to allow cycles by representing all the functions created by the program as a graph of tables. A function value will just contain an index, i.e. it will have the form \(\mathsf{fun}[i]\) with index \(i\), and the graph will map the index to a table. So we define values as follows. \[ \begin{array}{lccl} \text{numbers} & n \in \mathbb{N} \\ \text{values} & v \in \mathbb{V} & ::= & n \mid \mathsf{fun}[i] \\ & \mathit{graph} & = & \mathbb{N} \to (\mathbb{V} \times \mathbb{V})\,\mathit{list} \end{array} \] Given a function value \(v\) and graph \(G\), we write \(\mathit{tab}(v,G)\) for the table \(t\) when \(v= \mathsf{fun}[i]\) and \(G(i)=t\) for some index \(i\). So we modify the semantic function \(E\) to be defined as follows.

\begin{align*} E & : \mathit{exp} \to \mathit{env} \to (\mathbb{V} \times \mathit{graph})\,\mathit{set} \\ E[\!| n |\!](\rho) &= \{ (v,G) \mid v = n \} \\ E[\!| x |\!](\rho) &= \{ (v,G) \mid v = \rho(x) \} \\ E[\!| \lambda x.\; e |\!](\rho) &= \left\{ (v,G) \middle| \begin{array}{l} \forall v_1 v_2. (v_1,v_2) \in \mathit{tab}(v,G) \\ \Rightarrow v_2 \in E[\!|e|\!](\rho(x:=v_1)) \end{array} \right\} \\ E[\!| e_1\;e_2 |\!](\rho) &= \left\{ (v,G) \middle| \begin{array}{l}\exists v_1 v_2. v_1 {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \\ \land (v_2,v) {\in} \mathit{tab}(v_1,G) \end{array} \right\} \end{align*}

Example 1. Let's consider again the program \[ (\lambda f. f\;f)\;(\lambda g. 42) \] We'll define a graph \(G\) as follows. \[ G(0) = [(\mathsf{fun}[1], 42)] \qquad G(1) = [(\mathsf{fun}[1], 42)] \] To show \[ (42,G) \in E[\!|(\lambda f. f\;f)\;(\lambda g. 42)|\!]\emptyset \] we need to show

  1. \( (\mathsf{fun}[0],G) \in E[\!|(\lambda f. f\;f)|\!]\emptyset \). So we need to show \[ (42,G) \in E[\!|f\;f|\!](f:=\mathsf{fun}[1]) \] which we have because \((\mathsf{fun}[1], 42) \in \mathsf{fun}[1]\).
  2. \( (\mathsf{fun}[1],G) \in E[\!|(\lambda g. 42)|\!]\emptyset \). We need \[ (42,G) \in E[\!|42|\!](f:=\mathsf{fun}[1]) \] which is immediate.
  3. \( (\mathsf{fun}[1], 42) \in \mathit{tab}(\mathsf{fun}[0]) \). This is immediate.

Example 2. We shall compute the factorial of 3 using the strict version of the Y combinator, that is, the Z combinator. \begin{align*} R & \equiv \lambda v. (x\; x) \; v \\ M & \equiv \lambda x. f \; R \\ Z & \equiv \lambda f. M \; M \\ F & \equiv \lambda n. \mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)\\ H & \equiv \lambda r. F \\ \mathit{fact} & \equiv Z\, H \end{align*} We shall show that \[ (6,G) \in E[\!|\mathit{fact}\;3|\!]\emptyset \] for some graph \(G\) that we need to construct. By instrumenting an interpreter for the lambda calculus and running \(\mathit{fact}\,3\), we observe the following graph. \begin{align*} G(0) &= [(\mathsf{fun}[1],\mathsf{fun}[4])] & \text{for } Z \\ G(1) &= [(\mathsf{fun}[3],\mathsf{fun}[4])] & \text{for } H \\ G(2) &= [(\mathsf{fun}[2],\mathsf{fun}[4])]& \text{for } M \\ G(3) &= [(0,1),(1,1),(2,2)] & \text{for } R \\ G(4) &= [(0,1),(1,1),(2,2),(3,6)] & \text{for } F \end{align*} We check all of the following (Tedious! I'm going to write a program to do this next time.):

  • \((\mathsf{fun}[4],G) \in E[Z\,H]\emptyset \)
  • \((3,6) \in G(4) \)
  • \((\mathsf{fun}[0],G) \in E[\!|Z|\!]\emptyset\)
  • \( (\mathsf{fun}[4],G) \in E[\!|M\;M|\!](f:=\mathsf{fun}[1])\)
  • \( (\mathsf{fun}[2],G) \in E[\!|M|\!](f:=\mathsf{fun}[1]) \)
  • \( (\mathsf{fun}[4],G) \in E[\!|f \; R|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2]) \)
  • \( (\mathsf{fun}[3],G) \in E[\!|R|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2]) \)
  • \( (1,G) \in E[\!|(x\;x)\;v|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2],v:=0) \) and \( (0,1) \in G(4) \)
  • \( (1,G) \in E[\!|(x\;x)\;v|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2],v:=1) \) and \( (1,1) \in G(4) \)
  • \( (2,G) \in E[\!|(x\;x)\;v|\!](f:=\mathsf{fun}[1],x:=\mathsf{fun}[2],v:=2) \) and \( (2,2) \in G(4) \)
  • \( (\mathsf{fun}[2], \mathsf{fun}[4]) \in G(2) \)
  • \((\mathsf{fun}[1],G) \in E[\!|H|\!]\emptyset\)
  • \((\mathsf{fun}[4],G) \in E[\!|F|\!](r:=\mathsf{fun}[3])\)
  • \((1,G) \in E[\!|\mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)|\!](r:=\mathsf{fun}[3],n:=0)\)
  • \((1,G) \in E[\!|\mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)|\!](r:=\mathsf{fun}[3],n:=1)\)
  • \((2,G) \in E[\!|\mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)|\!](r:=\mathsf{fun}[3],n:=2)\)
  • \((6,G) \in E[\!|\mathbf{if}\,n=0\,\mathbf{then}\, 1\,\mathbf{else}\, n \times r \,(n-1)|\!](r:=\mathsf{fun}[3],n:=3)\)

The next step is to update the proof of soundness wrt. the big-step semantics. The graphs will make it a bit more challenging. But hopefully they will make it possible to also prove completeness!

Thursday, December 15, 2016

Simple Denotational Semantics for the Lambda Calculus, Pω Revisited?

I've been trying to understand Dana Scott's \(P_{\omega}\) and \(D_{\infty}\) models of the lambda calculus, as well as a couple large Coq formalizations of domain theory, and in the process have come up with an extraordinarily simple denotational semantics for the call-by-value lambda calculus. It borrows some of the main ideas from \(P_{\omega}\) but doesn't encode everything into numbers and puts infinity in a different place. (More discussion about this near the bottom of the post.) That being said, I still don't 100% understand \(P_{\omega}\), so there may be other subtle differences. In any event, what follows is so simple that it's either wrong or amazing that it's not already a well-known semantics.

UPDATE: It's wrong. See the section titled Counterexample to Completeness below. Now I need to go and read more literature.
UPDATE: It seems that there is an easy fix to the problem! See the subsequent post.

To get started, here's the syntax of the lambda calculus.

\[ \begin{array}{lrcl} \text{variables} & x \\ \text{numbers} & n & \in & \mathbb{N} \\ \text{expressions} & e & ::= & n \mid x \mid \lambda x.e \mid e\;e \end{array} \]

So we've got a language with numbers and first-class functions. I'm going to represent functions as data in the most simple way possible, as a lookup table, i.e., a list of pairs, each pair is an input value \(v_i\) and it's corresponding output \(v'_i\). Of course, functions can have an infinite number of inputs, so how will a finite-length list suffice?

\[ \begin{array}{lrcl} \text{values} & v & ::= & n \mid [(v_1,v'_1),\ldots,(v_n,v'_n)] \end{array} \]

The answer is that we're going to write the denotational semantics as a logic program, that is, as a relation instead of a function. (We'll turn it back into a function later.) Instead of producing the result value, the semantics will ask for a result value and then say whether it is correct or not. So when it comes to a program that produces a function, the semantics will ask for a list of input-output values and say whether they agree with the function. A finite list suffices because, after all, you can't actually build an infinite list to pass into the semantics.

So here we go, our first version of the semantics, in relational style, or equivalently, as a function with three parameters that returns a Boolean. We name the function denoto, with an o at the end as a nod to The Reasoned Schemer. The meaning of a \(\lambda x.e\) is any table \(T\) that agrees with the semantics of the body \(e\). (This allows the table to be empty!) The meaning of an application \(e_1 e_2\) is simply to do a lookup in a table for \(e_1\). (Requiring a non-empty table.) We write \((v,v') \in T\) when \((v,v')\) is one of the pairs in the table \(T\).

\begin{align*} \mathit{denoto}(n, \rho, n) &= \mathit{true} \\ \mathit{denoto}(x, \rho, \rho(x)) &= \mathit{true} \\ \mathit{denoto}(\lambda x. e, \rho, T) &= \forall v v'.\, (v,v') \in T \Rightarrow \mathit{denoto}(e,\rho(x:=v),v') \\ \mathit{denoto}(e_1 \; e_2, \rho, v) &= \left(\begin{array}{l} \exists T v_2.\; \mathit{denoto}(e_1, \rho, T) \land \mathit{denoto}(e_2, \rho, v_2) \\ \qquad \land (v_2,v) \in T \end{array} \right) \end{align*}

The existential quantifier in the line for application is powerful. It enables the semantics to guess a sufficiently large table \(T\), so long as the table agrees with the semantics of \(e_1\) and the later uses of the result value. Because the execution of a terminating program can only call the function with a finite number of arguments, and the execution can only use the results in a finite number of ways, there is a sufficiently large table \(T\) to cover all its uses in the execution of the whole program. Also, note that \(T\) can be large in a couple dimensions: it may handle a large number of inputs, but also, it can specify large outputs (in the case when the outputs are functions).

Denotational semantics are usually written as functions, not relations, so we remove the third parameter and instead return a set of values. This will bring the semantics more in line with \(P_{\omega}\). Also, for this version we'll use the name \(E\) for the semantic function instead of denoto.

\begin{align*} E[\!| n |\!](\rho) &= \{ n \} \\ E[\!| x |\!](\rho) &= \{ \rho(x) \} \\ E[\!| \lambda x.\; e |\!](\rho) &= \{ T \mid \forall v v'. (v,v') \in T \Rightarrow v' \in E[\!|e|\!](\rho(x:=v)) \} \\ E[\!| e_1\;e_2 |\!](\rho) &= \{ v \mid \exists T v_2. T {\in} E[\!| e_1 |\!](\rho) \land v_2 {\in} E[\!| e_2 |\!](\rho) \land (v_2,v) {\in} T \} \end{align*}

With the semantics written this way, it is clear that the meaning of a \(\lambda\) is not just a finite table, but instead it is typically an infinite set of tables, each of which is an approximation of the actual infinite graph of the function.

Is this semantics correct? I'm not entirely sure yet, but I have proved that it is sound with respect to the big-step semantics.

Theorem (Soundness).
If \(v \in E[\!| e |\!](\rho) \) and \( \rho \approx \rho' \), then \( \rho' \vdash e \Downarrow v' \) and \(v \approx v'\) for some \(v'\).

The two semantics have different notions of values, so the relation \(\approx\) is defined to bridge the two worlds.

\begin{gather*} n \approx n \\[2ex] \frac{\forall v_1 v_2 v'_1. (v_1,v_2) \in T \land v_1 {\approx} v'_1 \Rightarrow \exists v'_2. \rho(x:=v'_1) \vdash e \Downarrow v'_2 \land v_2 {\approx} v'_2} {T \approx \langle \lambda x.e, \rho \rangle} \end{gather*} The definition of \(\approx\) extends to environments in the natural way.

The semantics and proof of soundness in Isabelle is here.

Counterexample to Completeness

This semantics does not handle self application properly. Consider the program \[ (\lambda f. f \; f) \; (\lambda g. 1) \] The operational semantics says the answer is \(1\). The denotational semantics requires us to find some tables \(T \in [\!|\lambda f. f\;f|\!]\) and \(T' \in [\!|\lambda g.1|\!]\). We need \( (T',1) \in T\), so we need \(1 \in [\!|f\; f|\!](f:=T') \). That requires \((T', 1) \in T'\), but that's impossible given that we've defined the values and tables in a way that does not allow cycles.

Relationship with \(P_{\omega}\)

The idea of representing functions as data, and as a lookup table, comes from \(P_{\omega}\), as does having the denotation's result be a set of values. As mentioned above, one (minor) difference is that \(P_{\omega}\) encodes everything into numbers, whereas here we've used a datatype definition for the values. However, the most important difference (if I understand \(P_{\omega}\) correctly) is that its functions are infinite in a first-class sense. That is, \(P_{\omega}\) is a solution to \[ D = \mathbb{N} + (D \to D) \] and the things in \(D \to D\) are functions with potentially infinite graphs. In contrast, I've taken a stratified approach in which I've defined the values \(V\) to include only finite representations of functions \[ V = \mathbb{N} + (V \times V) \, \mathit{list} \] and then, only at the top level, I've allowed for infinity by making the denotation of an expression be a (potentially infinite) set of values. \[ E : \mathit{exp} \to \mathit{env} \to V\, \mathit{set} \]

Thursday, December 08, 2016

Denotational Semantics of IMP without the Least Fixed Point

It has been too long since I wrote a blog post! Needless to say, parenthood, teaching, research, service, and my wonderful graduate students and post-docs have been keeping me busy. But with the Fall 2016 semester winding down I can sneak away for a bit and think.

But first, I have something to admit. My education in the theory of programming languages has a glaring hole in it. I only have a very basic understanding of Denotational Semantics despite having taking a course on Domain Theory in graduate school. I figured it would be fun to remedy this situation, and it might even be useful in the future. So my first step was to understand the denotational semantics of the IMP language, which is the starter language for most textbooks on denotational semantics. IMP is simply an imperative language with assignment statements, while loops, if statements, and arithmetic on integers. The IMP language has two syntactic categories, expressions and commands. The following is the syntax of IMP.

\[ \begin{array}{lrcl} \text{variables} & x \\ \text{numbers} & n & \in & \mathbb{N} \\ \text{unary operators}&u & ::= & \neg \\ \text{binary operators}&b & ::= & + \mid - \mid \times \mid \,=\, \\ \text{expressions} & e & ::= & n \mid x \mid u(e) \mid b(e,e)\\ \text{commands} & c & ::= & \mathtt{skip} \mid x := e \mid c ; c \mid \mathtt{if}\;e\;\mathtt{then}\;c\;\mathtt{else}\;c \\ & & \mid & \mathtt{while}\;e\;\mathtt{do}\;c \end{array} \]

As far as I can tell, for a semantics to be a denotational semantics it has to satisfy two properties.

  • It is a mapping from abstract syntax (the program) to a mathematical object, which is just to say some precisely defined entity, that describes the observable behavior of the program. For example, the mathematical object could be a relation between a program's inputs and outputs.
  • It is compositional, which means that the denotation of a particular language construct is defined in terms of the denotation of the syntactic sub-parts of the construct. For example, the meaning of a while loop is defined in terms of the meaning of its conditional and the meaning of its body.
For the expressions of IMP, it is straightforward to write down a denotational semantics, the following function \(E\). This \(E\) is no different from a recursively defined interpreter. In the following, we map expressions to natural numbers. Following custom, we use the brackets \([\!| e |\!]\) ask a kind of quote to distinguish abstract syntax from the surrounding mathematics. To handle variables, we also pass in a function \(\sigma\) from variables to numbers, which we call a state.

\begin{align*} E[\!| n |\!](\sigma) &= n \\ E[\!| x_i |\!](\sigma) &= \sigma(x) \\ E[\!| u(e) |\!](\sigma) &= [\!|u|\!]( E[\!|e|\!] ) \\ E[\!| b(e_1,e_2) |\!](\sigma) &= [\!|b|\!]( E[\!|e_1|\!], E[\!|e_2|\!]) \\ \\ E[\!| \neg |\!](n) &= \begin{cases} 1 & \text{if } n = 0 \\ 0 & \text{if } n \neq 0\end{cases} \\ E[\!| + |\!](n_1,n_2) &= n_1 + n_2 \\ E[\!| - |\!](n_1,n_2) &= n_1 - n_2 \\ E[\!| \times |\!](n_1,n_2) &= n_1 \times n_2 \\ E[\!| = |\!](n_1,n_2) &= \begin{cases} 1 & \text{if } n_1 = n_2 \\ 0 & \text{if } n_1 \neq n_2 \end{cases} \end{align*}

What do the commands of IMP have in common regarding what they do? They change a state to a new state. For example, if \(\sigma\) is the incoming state, then the assignment command \( x := 42 \) outputs a new state \(\sigma'\) which is the same as \(\sigma\) except that \(\sigma'(x) = 42\). In general, the denotation of a command, \(C[\!|c|\!]\), will be a relation on states, that is, a set of pairs that match up input states with their corresponding output states. We shall give the denotational semantics of commands, one construct as a time.

The meaning of the skip command is that it it doesn't change the state, so it relates each state to itself.

\[ C[\!| \mathtt{skip} |\!] = \{ (\sigma,\sigma) \mid \sigma \in \mathit{state} \} \]

The meaning of the assignment statement is to update the state to map the left-hand side variable to the result of the right-hand side expression. So the new state is a function that takes in a variable named \(y\) and returns \( [\!|e|\!](\sigma) \) if \(y=x\) and otherwise returns the same thing as \(\sigma\).

\begin{align*} C[\!| x := e |\!] &= \{ (\sigma, \sigma') \mid \sigma \in \mathit{state} \} \\ & \text{where } \sigma'(y) = \begin{cases} E[\!|e|\!](\sigma) & \text{if}\, y = x\\ \sigma(y) & \text{if}\, y \neq x \end{cases} \end{align*}

The meaning of two commands in sequence is just the meaning of the first followed by the meaning of the second.

\[ C[\!| c_1; c_2 |\!] = \{ (\sigma,\sigma'') \mid \exists \sigma'. (\sigma,\sigma') \in C[\!| c_1 |\!] \land (\sigma',\sigma'') \in C[\!| c_2 |\!] \} \]

The meaning of an if command depends on the conditional expression \(e\). If the \(e\) evaluates to 0, then the meaning of if is given by the else branch \(c_2\). Otherwise, the meaning of if is given by the then branch \(c_1\).

\[ C[\!| \mathtt{if}\, e \,\mathtt{then}\, c_1 \,\mathtt{else}\, c_2 |\!] = \left\{ (\sigma,\sigma') \middle| \begin{array}{l} (\sigma,\sigma') \in C[\!| c_2 |\!] & \text{if}\, E[\!|e|\!](\sigma) = 0 \\ (\sigma,\sigma') \in C[\!| c_1 |\!] & \text{if}\, E[\!|e|\!](\sigma) \neq 0 \end{array} \right\} \]

The meaning of the while command is the crux of the matter. This is normally where a textbook includes several pages of beautiful mathematics about monotone and continuous functions, complete partial orders, and least fixed points. We're going to bypass all of that.

The meaning of an while command is to map each starting state \(\sigma_0\) to an ending state \(\sigma_n\) obtained by iterating it's body command so long as the condition is non-zero. Pictorially, we have the following:

We define an auxiliary function named \(L\) to express the iterating of the loop. It takes as input the number of iterations of the loop, the denotation of the condition expression \(de\), and the denotation of the command \(dc\).

\begin{align*} L(0, de, dc) &= \{ (\sigma,\sigma) \mid de(\sigma) = 0 \} \\ L(1+n, de, dc) &= \{ (\sigma,\sigma'') \mid de(\sigma) \neq 0 \land \exists \sigma'. (\sigma,\sigma') \in dc \land (\sigma',\sigma'') \in L(n, de, dc) \} \end{align*}

The meaning of the while command is to relate any state \(\sigma\) to state \(\sigma'\) if \(L(n,[\!|e|\!],[\!|c|\!])\) relates \(\sigma\) to \(\sigma'\) for some \(n\).

\[ C[\!| \mathtt{while}\, e \,\mathtt{do}\, c |\!] = \{ (\sigma,\sigma') \mid \exists n.\, (\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!]) \} \]

At this point I'm worried that this is so simple that it couldn't possibly be correct. A good way to check is to prove that it is equivalent to the standard big-step semantics for IMP, which we shall do now.

Equivalence to the Standard Big-Step Semantics

The big-step semantics for the IMP language is a three-place relation on a command, a starting state, and the final state, which we shall write \(c\mid\sigma\Downarrow\sigma'\). It is defined inductively by the following rules.

\begin{gather*} \frac{}{\mathtt{skip} \mid \sigma \Downarrow \sigma} \\[2ex] \frac{\sigma'(y) = \begin{cases} E[\!|e|\!](\sigma) & \text{if}\, y = x\\ \sigma(y) & \text{if}\, y \neq x \end{cases}}{ x := e \mid \sigma \Downarrow \sigma'} \qquad \frac{c_1 \mid \sigma \Downarrow \sigma' \quad c_2 \mid \sigma' \Downarrow \sigma''} {c_1 ; c_2 \mid \sigma \Downarrow \sigma''} \\[2ex] \frac{E[\!|e|\!](\sigma) = 0 \quad c_2 \mid \sigma \Downarrow \sigma' } {\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2 \mid \sigma \Downarrow \sigma'}\qquad \frac{E[\!|e|\!](\sigma) \neq 0 \quad c_1 \mid \sigma \Downarrow \sigma' } {\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2 \mid \sigma \Downarrow \sigma'}\\[2ex] \frac{E[\!|e|\!](\sigma) = 0} {\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma} \qquad \frac{E[\!|e|\!](\sigma) \neq 0 \quad c \mid \sigma \Downarrow \sigma' \quad \mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma' \Downarrow \sigma''} {\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma''} \end{gather*}

(The big-step semantics is not denotational because the second rule for while is not compositional: the recursion is not on a proper sub-part but instead on the entire while command.)

We shall prove that the denotational semantics is equivalent to the big-step semantics in two steps.

  1. The big-step semantics implies the denotational semantics. (completeness)
  2. The denotation semantics implies the big-step semantics. (soundness)

Theorem (Completeness). If \(c \mid \sigma \Downarrow \sigma'\), then \((\sigma,\sigma') \in [\!|c|\!]\).
Proof. We proceed by induction on the derivation of \(c \mid \sigma \Downarrow \sigma'\). We have one case to consider per rule in the big-step semantics. (For the reader in a hurry: the case for while at the end is the only interesting one.)
Case: \[ \frac{}{\mathtt{skip} \mid \sigma \Downarrow \sigma} \] We need to show that \((\sigma,\sigma) \in [\!|\mathtt{skip}|\!]\), which is immediate from the definition of the denotational semantics.
Case: \[ \frac{\sigma'(y) = \begin{cases} E[\!|e|\!](\sigma) & \text{if}\, y = x\\ \sigma(y) & \text{if}\, y \neq x \end{cases}}{ x := e \mid \sigma \Downarrow \sigma'} \] We need to show that \((\sigma,\sigma') \in [\!|x := e|\!]\). Again this is immediate.
Case: \[ \frac{c_1 \mid \sigma \Downarrow \sigma' \quad c_2 \mid \sigma' \Downarrow \sigma''} {c_1 ; c_2 \mid \sigma \Downarrow \sigma''} \] We have two induction hypotheses: \((\sigma,\sigma') \in C[\!|c_1|\!]\) and \((\sigma',\sigma'') \in C[\!|c_2|\!]\). It follows (by definition) that \((\sigma,\sigma'') \in C[\!|c_1 ; c_2|\!]\).
Case: \[ \frac{E[\!|e|\!](\sigma) = 0 \quad c_2 \mid \sigma \Downarrow \sigma' } {\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2 \mid \sigma \Downarrow \sigma'}\qquad \] We have the induction hypothesis \((\sigma,\sigma') \in [\!|c_2|\!]\). Together with the condition expression evaluating to 0, we have \((\sigma,\sigma') \in C[\!|\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2|\!]\).
Case: \[ \frac{E[\!|e|\!](\sigma) \neq 0 \quad c_1 \mid \sigma \Downarrow \sigma' } {\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2 \mid \sigma \Downarrow \sigma'}\\[2ex] \] We have the induction hypothesis \((\sigma,\sigma') \in C[\!|c_1|\!]\). Together with the condition is non-zero, we have \((\sigma,\sigma') \in C[\!|\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,c_2|\!]\).
Case: \[ \frac{E[\!|e|\!](\sigma) = 0} {\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma} \qquad \] From \(E[\!|e|\!](\sigma) = 0\) we have \((\sigma,\sigma) \in L(0,E[\!|e|\!],E[\!|c|\!]) \). Therefore \((\sigma,\sigma) \in C[\!|\mathtt{while}\, e \,\mathtt{do}\, c|\!]\).
Case: \[ \frac{E[\!|e|\!](\sigma) \neq 0 \quad c \mid \sigma \Downarrow \sigma' \quad \mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma' \Downarrow \sigma''} {\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma''} \] We have the induction hypotheses \((\sigma,\sigma') \in C[\!|c|\!]\) and \((\sigma',\sigma'') \in C[\!|\mathtt{while}\, e \,\mathtt{do}\, c|\!]\). Unpacking the definition of the later, we have \( (\sigma',\sigma'') \in L(n,E[\!|e|\!],C[\!|c|\!]) \) for some \(n\). Therefore we have \( (\sigma,\sigma'') \in L(1+n,E[\!|e|\!],C[\!|c|\!]) \). So we conclude that \((\sigma,\sigma'') \in C[\!|\mathtt{while}\, e \,\mathtt{do}\, c|\!]\).
QED.

The other direction, that if the denotation of \(c\) relates \(\sigma\) to \(\sigma'\), then so does the big-step semantics, takes a bit more work. The proof will be by induction on the structure of \(c\). In the case for while we need to reason about the \(L\) function. We get to assume that \((\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!])\) for some \(n\) and we have the induction hypothesis that \(\forall \sigma \sigma'.\, (\sigma,\sigma') \in C[\!|c|\!] \to c \mid \sigma \Downarrow \sigma' \). Because \(L\) is recursive, we going to need a lemma about \(L\) and prove it by induction on the number of iterations.

Lemma If \((\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!])\) and \(\forall \sigma \sigma'.\, (\sigma,\sigma') \in [\!|c|\!] \to c \mid \sigma \Downarrow \sigma' \), then \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma' \).
Proof. The proof is by induction on \(n\).
Case \(n=0\). We have \((\sigma,\sigma') \in L(0,E[\!|e|\!],C[\!|c|\!])\), so \(\sigma = \sigma'\) and \(E[\!|e|\!](\sigma) = 0\). Therefore we can conclude that \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma'\).
Case \(n=1 + n'\). We have \((\sigma,\sigma') \in L(1+n',E[\!|e|\!],C[\!|c|\!])\), so \(E[\!|e|\!](\sigma) \neq 0\) and \( (\sigma,\sigma_1) \in C[\!|c|\!] \) and \( (\sigma_1,\sigma') \in L(n',E[\!|e|\!],C[\!|c|\!]) \) for some \(\sigma_1\). From the premise about \(c\), we have \(c \mid \sigma \Downarrow \sigma_1\). From the induction hypothesis, we have \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma_1 \Downarrow \sigma'\). Putting all of these pieces together, we conclude that \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma'\).
QED.

Theorem (Soundness). For any \(\sigma\) and \(\sigma'\), if \((\sigma,\sigma') \in C[\!|c|\!]\), then \(c \mid \sigma \Downarrow \sigma'\).
Proof. The proof is by induction on the structure of \(c\).
Case \(\mathtt{skip}\). From \((\sigma,\sigma') \in C[\!|\mathtt{skip}|\!]\) we have \(\sigma = \sigma'\) and therefore \(\mathtt{skip} \mid \sigma \Downarrow \sigma'\).
Case \(x:=e\). We have \[ \sigma'(y) = \begin{cases} E[\!|e|\!](\sigma) & \text{if}\, y = x\\ \sigma(y) & \text{if}\, y \neq x \end{cases} \] and therefore \(x := e \mid \sigma \Downarrow \sigma'\).
Case \(c_1 ; c_2\). We have \( (\sigma, \sigma_1) \in C[\!|c_1|\!]\) and \( (\sigma_1, \sigma') \in C[\!|c_2|\!]\) for some \(\sigma_1\). So by the induction hypothesis, we have \(c_1 \mid \sigma \Downarrow \sigma_1\) and \(c_2 \mid \sigma_1 \Downarrow \sigma'\), from which we conclude that \( c_1 ; c_2 \mid \sigma \Downarrow \sigma'\).
Case \(\mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,e_2\). We have two cases to consider, whether \(E[\!|e|\!](\sigma) = 0\) or not.

  • Suppose \(E[\!|e|\!](\sigma) = 0\). Then \( (\sigma,\sigma') \in C[\!|c_2|\!] \) and by the induction hypothesis, \( c_2 \mid \sigma \Downarrow \sigma' \). We conclude that \( \mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,e_2 \mid \sigma \Downarrow \sigma' \).
  • Suppose \(E[\!|e|\!](\sigma) \neq 0\). Then \( (\sigma,\sigma') \in C[\!|c_1|\!] \) and by the induction hypothesis, \( c_1 \mid \sigma \Downarrow \sigma' \). We conclude that \( \mathtt{if}\,e\,\mathtt{then}\,c_1\,\mathtt{else}\,e_2 \mid \sigma \Downarrow \sigma' \).

Case \(\mathtt{while}\, e \,\mathtt{do}\, c\). From \((\sigma,\sigma') \in C[\!|\mathtt{while}\, e \,\mathtt{do}\, c|\!]\) we have \( (\sigma,\sigma') \in L(n, E[\!|e|\!], C[\!|c|\!]) \). Also, by the induction hypothesis, we have that \( \forall \sigma \sigma'. \; (\sigma,\sigma') \in C[\!|c|\!] \to c \mid \sigma \Downarrow \sigma' \). By the Lemma about \(L\), we conclude that \(\mathtt{while}\, e \,\mathtt{do}\, c \mid \sigma \Downarrow \sigma'\).
QED.

Wow, the simple denotational semantics of IMP is correct!

The mechanization of all this in Coq is available here.

What about infinite loops?

Does this denotational semantics give meaning to programs with infinite loops, such as \[ \mathtt{while}\, 1 \,\mathtt{do}\, \mathtt{skip} \] The answer is yes, the semantics defines a total function from commands to relations, so every program gets a meaning. So the next question is which relation is the denotation of an infinite loop? Just like the fixed-point semantics, the answer is the empty relation. \[ C[\!|\mathtt{while}\, 1 \,\mathtt{do}\, \mathtt{skip} |\!] = \{ (\sigma,\sigma') \mid \exists n.\; (\sigma,\sigma') \in L(n, E[\!|1|\!], C[\!|\mathtt{skip}|\!]) \} = \emptyset \]

Comparison to using the least fixed point semantics

The standard denotational semantics for IMP defines the meaning of while in terms of the least fixed point of the following functional.

\[ W(de, dc)(dw) = \left\{ (\sigma,\sigma') \middle| \begin{array}{l} \sigma = \sigma' & \text{if } de(\sigma) = 0 \\ \exists \sigma_1, (\sigma,\sigma_1) \in dc \land (\sigma_1,\sigma') \in dw & \text{if } de(\sigma) \neq 0 \end{array} \right\} \]

One of the standard ways to compute the least fixed point of a functional \(F\) is from Kleene's fixpoint theorem, which says that the least fixed point of \(F\) is \[ \bigcup_{k=0}^{\infty} F^k(\emptyset) \] where \begin{align*} F^0(x) & = x \\ F^{k+1}(x) & = F (F^k(x)) \end{align*} So the traditional denotation of while is: \[ C[\!| \mathtt{while}\,e\,\mathtt{do}\,c |\!] = \bigcup_{k=0}^{\infty} W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \] Applying the definition of infinitary union, we have \[ C[\!| \mathtt{while}\,e\,\mathtt{do}\,c |\!] = \{ (\sigma,\sigma') \mid \exists k.\; (\sigma,\sigma') \in W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \} \] which starts to look similar to our definition. But they are not trivially equivalent.

Consider the following loop that counts down to zero. \[ \mathtt{while}\,\neg (x=0)\, \mathtt{do}\, x := x - 1 \] To talk about the semantics of this loop, we create the following abbreviations for some relations on states. \begin{align*} R_0 &= \{ (\sigma, \sigma) \mid \sigma(x) = 0 \} \\ R_1 &= \{ (\sigma, \sigma') \mid \sigma(x) = 1 \land \sigma'(x) = 0 \} \\ R_2 &= \{ (\sigma, \sigma') \mid \sigma(x) = 2 \land \sigma'(x) = 0 \} \\ R_3 &= \{ (\sigma, \sigma') \mid \sigma(x) = 3 \land \sigma'(x) = 0 \} \\ & \vdots \end{align*}

  • If \(x=0\) in the initial state, the loop immediately terminates, so the final state is the same as the input state. This is \(R_0\).
  • If \(x=1\) in the initial state, the loop executes one iteration, so the final state has \(x=0\). This is \(R_1\).
  • If \(x=2\) in the initial state, the loop executes one iteration, so the final state has \(x=0\). This is \(R_2\).
  • and so on.
The \(L\) function computes exactly these \(R\)'s. \begin{align*} L(0, E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!]) &= R_0 \\ L(1, E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!]) &= R_1 \\ L(2, E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!]) &= R_2 \\ L(3, E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!]) &= R_3\\ & \vdots \end{align*} The semantics of while given by \(L\) says that an initial state is related to a final state if it is possible to guess the iteration count \(n\) to pick out the appropriate line of \(L\) that relates the two states. \[ C[\!| \mathtt{while}\,e\,\mathtt{do}\,c |\!] = \{ (\sigma,\sigma') \mid \exists n.\, (\sigma,\sigma') \in L(n, [\!|e|\!],[\!|c|\!]) \} \] In contrast, Kleene's iteration incrementally builds up the union of all the \(R\)'s: \begin{align*} W(E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!])^0(\emptyset) &= \emptyset \\ W(E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!])^1(\emptyset) &= R_0\\ W(E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!])^2(\emptyset) &= R_0 \cup R_1 \\ W(E[\!| \neg (x = 0) |\!], C[\!| x := x - 1 |\!])^3(\emptyset) &= R_0 \cup R_1 \cup R_2 \\ & \vdots \end{align*} The semantics of while given by the least fixed point of \(W\) says that an initial state is related to a final state if, after a sufficient number of applications of \(W\), say \(k\), the two states are in the resulting union \(R_0 \cup \cdots \cup R_{k-1}\). \[ C[\!| \mathtt{while}\,e\,\mathtt{do}\,c |\!] = \{ (\sigma,\sigma') \mid \exists k.\; (\sigma,\sigma') \in W([\!|e|\!],[\!|c|\!])^k(\emptyset) \} \]

In general, \( L(n,E[\!|e|\!],C[\!|c|\!]) \) gives the denotation of the loop exactly when \(n\) is the number of iterations executed by the loop in a given initial state. In contrast, \( W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \) produces the \(k\)th approximation of the loop's meaning, providing the appropriate initial/final states for up to \(k-1\) iterations of the loop.

However, these two algorithms are equivalent in the following sense.

Theorem (Equivalence to LFP semantics)

  1. If \( (\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!]) \), then \( \exists k.\; (\sigma,\sigma') \in W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \).
  2. If \( (\sigma,\sigma') \in W(E[\!|e|\!],C[\!|c|\!])^k(\emptyset) \), then \( \exists n.\; (\sigma,\sigma') \in L(n,E[\!|e|\!],C[\!|c|\!]) \).
The first part is proved by induction on n. The second part is proved by induction on k. The full proofs are in the Coq development linked to above.

Parting Thoughts

  • The traditional least fixed-point semantics is overkill for IMP.
  • What makes this simple version work is the combination of choosing the denotation of commands to be relations on states (instead of functions on states) and the use of an existential for the number of loop iterations in the meaning of while.
  • The IMP language is just a toy. As we consider more powerful language constructs such as functions, pointers, etc., at what point will we be forced to use the heavier-duty math of least fixed-points, continuous functions, etc.? I'm looking forward to finding out!

Sunday, January 26, 2014

The Publication Process in Programming Languages

There was an interesting discussion at the ACM SIGPLAN Meeting at POPL 2014 regarding the problems and potential solutions with the publication process within programming languages. The discussion centered around the problem that we primarily publish in conferences and not journals, which makes the field of programming languages look bad compared to other disciplines. I agree that this is a problem, but it is not the only problem with our publication process. As we consider alternative publication models, we should aim to solve all of the problems (or as many as possible). The following are some of the other problems that I see with our current conference-oriented process. I conclude with a short outline of a possible solution inspired by code review processes in open-source software groups such as C++ Boost.

Soundness vs. Importance

When making the accept/reject decision for conference publications, we judge papers on both scientific soundness (are the claims in the paper true?) and some sort of feeling of importance. These two things should be decoupled. On the one hand, we don't give ourselves enough time to evaluate scientific soundness, and on the other, we don't need to, and should not try to evaluate importance at the time of publication because we are not very good at doing so. A somewhat amusing/annoying side-effect of this evaluation model is that many papers dedicate several paragraphs to inflating the importance of the results, sometimes obscuring the actual results. Another rumored side-effect is authors cherry-picking data to make their results seem more important. Similarly, when is the last time you read about a negative result in a programming languages conference or journal?

Publication vs. Presentation

The conference system conflates the decision to accept for publication with the decision to accept for presentation. Despite many conferences going to multi-track formats, the number of available presentation slots (and their cost) plays a strong role in limiting the number of papers accepted for publication. On the other hand, the cost of publishing a paper is very low (and should be much lower than what it is for the ACM). The only limit on publishing papers should be with regards to scientific soundness.

Publication Delay

One of the supposed benefits of the conference system is fast turn-around. However, I know of many papers that take over a year, sometimes multiple years, to be published because they receive multiple rejections before being accepted for publication, and not because the paper is scientifically unsound. This phenomenon slows the scientific process because the dissemination of scientific results is delayed. (Many researchers do not feel it is appropriate to post pre-prints on their web pages.) This phenomenon also increases the reviewing load on the community because some papers end up with nine reviews when three would have sufficed.

 Reviewer Expertise

Papers are not always reviewed by the reviewers with the most expertise. On a fairly routine basis, I find minor inaccuracies and problems in published papers in the areas that I'm expert. In such situations, I feel that I should have reviewed the paper but didn't get a chance to because I wasn't on that particular program committee. I was on other PC's that year. One of the reasons this happens is that there is large overlap in topics among the major programming language conferences.

Narrow Audience

Most papers in our area are written with a very narrow audience in mind: other experts in programming languages. The reason for this is obvious: program committees consist of experts in programming languages. This narrow writing style reduces the readability of our papers to people outside the area, including the people in industry who are in a position to apply the research.

Reviewer Load

Recently, program chairs have been discouraging PC members from getting help from their graduate students and friends. There are a couple problems with this. First, reviewing is an important part of  training graduate students. Second, with over 20 papers assigned to a PC member over a couple months time, doing high-quality reviews becomes a heroic effort. For academics whose job description includes reviewing, this kind of load is manageable in a macho sort of way. For those in industry (not counting a few research laboratories), I imagine this kind of load is a non-starter.

Post-Publication Reviews and Corrections

The current publication process does not provide a sufficiently low-overhead mechanism for reviews and/or corrections to come along after a paper has been published. In particular, it would be great to have an easy way for someone that is trying to replicate an experiment, or apply a technique in industry, to post a review providing further data.

The Outline of a Solution

Instead of a mixture of conferences and journals, we should just have one big continual reviewing process for all research on programming languages. Anyone who wants to participate could submit papers (anonymously or not) and anyone could review any submission (anonymously or not). Both the paper submissions and the reviews would be public. Authors would likely revise their papers as reviews come in. If it turns out that some people submit lots of papers without doing any reviews, then some simple rules regarding the ratio of paper submissions to reviews could be created. A group of editors would evaluate reviews and skim papers to decide whether a paper is scientifically sound and therefore deserving of publication, that is, deserving of their seal of approval. The editor's meta-review and decision would also be public. Of course, reviews of papers that have already been published would be allowed and could lead to a revision or even retraction of a paper. Regarding presentations, conferences and workshops could select from the recently published papers as they see fit.

This outline of a solution has much in common with the Open Review model already used by several conferences (ICLR, ICML, and AKBC at openreview.net), though it is directly inspired by my positive experiences with the code review process in the C++ Boost open-source software group.

Wednesday, July 10, 2013

Indiana Univ. beats Univ. of Colorado, 14 to 12

As of August 1, 2013 I am starting a new job as an Associate Professor in the Computer Science Division of the School of Informatics (SOIC) at Indiana University (IU). If you're unfamiliar with Indiana University, you might wonder why I'm moving there from the University of Colorado (CU). After all, CU is on the doorstep of the Rocky Mountains and some of the best skiing in the world. (And I love skiing.)

There are three reasons why I was open to interviewing elsewhere this year. The first reason is that at CU, I was in the Electrical, Computer, and Energy Engineering (ECEE) Department, not the Computer Science Department. I was happy to start out as an Assistant Prof. in CU's ECEE department because it solved our two-body problem (my wife Katie is a prof. in the CS dept.) and got my foot in the door of academia. However, I am a Computer Scientist, so in the long term that was just not going to work and an offer from another university would make the switch to CS possible. The second reason is that I come up for tenure at CU in AY 2013-2014, and although my case looks good, one can never be too careful. If the chances are 99% of getting tenure, then I'm looking for how to change that to 100%. I figured that an outside offer would do just that. Finally, the third reason I was open to interviewing was the low or nonexistent raises at CU over the past six years. I realize that this is pretty common around the country, but then again CU was giving enormous raises to some of the people in administration and raising tuition.

In the Fall of 2012, I found out that IU had several faculty positions open in the SOIC and that they wanted Katie and I to apply. Katie and I both got our Ph.D. from IU and have fond memories of IU and Bloomington, IN. IU has a strong tradition in Programming Languages and the current PL group is excellent. So for me, IU is a better fit with respect to research collaboration than CU. Similarly for Katie, IU has a growing group in Health Informatics, so IU is also a great fit for Katie.
Bloomington is a wonderful little college town in the rolling hills of southern Indiana, surrounded by state forest and lakes. It's a good place for bicycling but not downhill skiing.

To make a long story short (the academic hiring process only has one gear, first gear), we interviewed at IU and received an attractive offer that included tenure, a sizable startup package, and a significant raise. CU made a counter-offer that included a moderate package of research funds, a moderate raise,  and me moving to the CS dept. CU elected not to rush our tenure process.

On April 24, 2013, Katie and I had a complex and difficult decision to make. The below picture is the pros/cons list that we created that night to help us make the decision. We divided the items into things related to family (the top three), things related to career (next 6), and then things related to lifestyle. We gave a 3x weighting to the family items, a 2x weighting to the career items, and a 1x weighting to the lifestyle items. To keep things simple, we just gave a + (for +1), - (for -1), or 0 to each item. As you can see written in red at the bottom, IU won out over CU by a score of 14 to 12. So we're off to Bloomington!