## Thursday, October 18, 2012

### Interp. of the GTLC, Part 5: Eager Cast Checking

Back in Part 1 of this series, I mentioned that there is a design choice between eager and lazy cast checking. Recall the following example. \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} With eager cast checking, the cast labeled $$\ell_1$$ fails at the moment when it is applied to a value. Whereas with lazy cast checking, the $$\ell_1$$ cast initially succeeds, but then later, when the function is applied at $$f\,\mathsf{true}$$, the cast fails. I like eager cast checking because it tells the programmer as soon as possible that something is amiss. Further, it turns out that when using the space-efficient implementations, eager and lazy checking are about the same regarding run-time overhead. (Lazy can be faster if you don't care about space efficiency.)

We saw the specification for lazy cast checking in Part 1, but the specification for eager checking was postponed. The reason for the postponement was that specifying the semantics of eager cast checking requires more machinery than for lazy cast checking. (I'll expand on this claim in the next paragraph.) Thankfully, in the meantime we've acquired the necessary machinery: the Coercion Calculus. The Eager Coercion Calculus was first discussed in the paper Space-Efficient Gradual Typing and was extended to include blame labels in Exploring the Design Space of Higher-Order Casts. Here we'll discuss the version with blame labels and flesh out more of the theory, such as characterizing the coercion normal forms and defining an efficient method of composing coercions in normal form. This is based on an ongoing collaboration with Ronald Garcia.

Before getting into the eager coercion calculus, let me take some time to explain why the eager coercion calculus is needed for the semantics, not just for an efficient implementation. After all, there was no mention of coercions in the semantics of the lazy variants of the Gradually-Typed Lambda Calculus. Instead, those semantics just talked about casts, which consisted of a pair of types (source and target) and a blame label. The heart of those semantics was a $$\mathsf{cast}$$ function that applies a cast to a value.

It's instructive to see where naive definitions of a $$\mathsf{cast}$$ function for eager checking break down. The most obvious thing to try is to modify the $$\mathsf{cast}$$ function to check for (deep) type consistency instead of only looking at the head of the type. So we change the first line of the $$\mathsf{cast}$$ function from $\mathsf{cast}(v,T_1,\ell,T_2) = \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2)$ to $\mathsf{cast}(v,T_1,\ell,T_2) = \mathbf{blame}\,\ell \qquad \text{if } T_1 \not\sim T_2$ Let's see what happens on an example just a tad different from the previous example. In this example we go through $$\star \to \star$$ instead of $$\star$$. \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \to \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} With the naive cast function, both casts initially succeed, producing the value $(\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \to \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}$ However, that's not what an eager cast checking should do. The above should not be a value, it should have already failed and blamed $$\ell_0$$.

So the $$\mathsf{cast}$$ function not only needs to check whether the target type is consistent with the source type, it also needs to check whether the target type is consistent with all of the casts that are wrapping the value. One way we could try to do this is to compute the greatest lower bound (with respect to naive subtyping) of all the types in the casts on the value, and then compare the target type to the greatest lower bound. The meet operator on types is defined as follows: \begin{align*} B \sqcap B &= B \\ \star \sqcap T &= T \\ T \sqcap \star &= T \\ (T_1 \to T_2) \sqcap (T_3 \to T_4) &= (T_1 \sqcap T_3) \to (T_2 \sqcap T_4) \\ & \text{if } (T_1 \to T_2) \sim (T_3 \to T_4)\\ T_1 \sqcap T_2 &= \bot & \text{if } T_1 \not\sim T_2 \end{align*} We introduce the bottom type $$\bot$$ so that the meet operator can be a total function. Next we define a function that computes the meet of all the casts wrapping a value. \begin{align*} \sqcap (s : T_1 \Rightarrow^{\ell} T_2) &= T_1 \sqcap T_2 \\ \sqcap (v : T_1 \Rightarrow^{\ell_1} T_2 \Rightarrow^{\ell_1} T_3) & = (\sqcap (v : T_1 \Rightarrow^{\ell_1} T_2)) \sqcap T_3 \end{align*} Now we can replace the first line of the $$\mathsf{cast}$$ function to use this meet operator. \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \left(\sqcap v\right) \not\sim T_2 \end{align*} How does this version fare on our example? An error is now triggered when the value flows into the cast labeled $$\ell_1$$, so that's good, but the blame goes to $$\ell_1$$. Unfortunately, the prior work on eager checking based on coercions says that $$\ell_0$$ should be blamed instead! The problem with this version of $$\mathsf{cast}$$ is that the $$\sqcap$$ operator forgets about all the blame labels that are in the casts wrapping the value. In this example, it's dropping the label $$\ell_0$$ which really ought to be blamed.

### The Eager Coercion Calculus

In the context of the Coercion Calculus, one needs to add the following two reduction rules to obtain eager cast checking. What these rules do is make sure that failure coercions immediately bubble up to the top of the coercion where they can trigger a cast failure. \begin{align*} (\mathsf{Fail}^\ell \to c) &\longrightarrow \mathsf{Fail}^\ell \\ (\hat{c} \to \mathsf{Fail}^\ell) & \longrightarrow \mathsf{Fail}^\ell \end{align*} In the second rule, we require that the domain coercion be in normal form, thereby imposing a left-to-right ordering for coercion failures.

To ensure confluence, we also need to make two changes to existing reduction rules. In the rule for composing function coercions, we need to require that the two coercions be in normal form. (The notation $$\tilde{c}$$ is new and will be explained shortly.) \begin{align*} (\tilde{c}_{11} \to \tilde{c}_{12}); (\tilde{c}_{21} \to \tilde{c}_{22}) & \longrightarrow (\tilde{c}_{21};\tilde{c}_{11}) \to (\tilde{c}_{12}; \tilde{c}_{22}) \end{align*} Here's the counter-example to confluence, thanks to Ron, if the above restriction is not made. \begin{align*} (\mathsf{Fail}^{\ell_1} \to c_1); (\mathsf{Fail}^{\ell_2} \to c_2) & \longrightarrow \mathsf{Fail}^{\ell_1}; (\mathsf{Fail}^{\ell_2}\to c_2) \longrightarrow \mathsf{Fail}^{\ell_1} \\ (\mathsf{Fail}^{\ell_1} \to c_1); (\mathsf{Fail}^{\ell_2} \to c_2) & \longrightarrow (\mathsf{Fail}^{\ell_2};\mathsf{Fail}^{\ell_1}) \to (c_1; c_2) \\ & \longrightarrow \mathsf{Fail}^{\ell_2} \to (c_1; c_2)\\ & \longrightarrow \mathsf{Fail}^{\ell_2} \end{align*} There is also a confluence problem regarding the following rule. \begin{align*} \overline{c} ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \end{align*} The counter-example, again thanks to Ron, is \begin{align*} (\iota \to \mathsf{Bool}!); (\iota \to \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} & \longrightarrow (\iota;\iota) \to (\mathsf{Bool}!; \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow^{*} \iota \to (\mathsf{Fail}^{\ell_2}); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow^{*} \mathsf{Fail}^{\ell_2} \\ (\iota \to \mathsf{Bool}!); (\iota \to \mathsf{Int}?^{\ell_2}); \mathsf{Fail}^{\ell_1} & \longrightarrow (\iota \to \mathsf{Bool}!); \mathsf{Fail}^{\ell_1} \\ & \longrightarrow \mathsf{Fail}^{\ell_1} \end{align*} We fix this problem by making the reduction rule more specific, by only allowing injections to be consumed on the left of a failure. \begin{align*} I! ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \end{align*}

Here's the complete set of reduction rules for the Eager Coercion Calculus. \begin{align*} I_1!; I_2?^\ell & \longrightarrow \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (\tilde{c}_{11} \to \tilde{c}_{12}); (\tilde{c}_{21} \to \tilde{c}_{22}) & \longrightarrow (\tilde{c}_{21};\tilde{c}_{11}) \to (\tilde{c}_{12}; \tilde{c}_{22}) \\ \mathsf{Fail}^\ell; c & \longrightarrow \mathsf{Fail}^\ell \\ I! ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \\ (\mathsf{Fail}^\ell \to c) &\longrightarrow \mathsf{Fail}^\ell \\ (\tilde{c} \to \mathsf{Fail}^\ell) & \longrightarrow \mathsf{Fail}^\ell \end{align*}

These additions and changes to the reduction rules cause changes in the normal forms for coercions. First, $$\mathsf{Fail}^\ell$$ cannot appear under a function coercion We therefore introduce another category, called normal parts'' and written $$\tilde{c}$$, that excludes $$\mathsf{Fail}^\ell$$ (but still includes $$I?^{\ell_1}; \mathsf{Fail}^{\ell_2}$$ because the $$\ell_1$$ projection could still fail and take precedence over $$\ell_2$$). Also, $$(\tilde{c}_1 \to \tilde{c}_2); \mathsf{Fail}^\ell$$ is now a normal form. Further, to regularize the form that coercions can take, we always write them as having three parts. The following grammar defines the normal coercions for eager cast checking. $\begin{array}{llcl} \text{optional injections} & i & ::= & \iota \mid I! \\ & i_\bot & ::= & i \mid \mathsf{Fail}^\ell \\ \text{optional functions} & f & ::= & \iota \mid \tilde{c} \to \tilde{c} \\ & f_\bot & ::= & f \mid \mathsf{Fail}^\ell \\ \text{optional projections} & j & ::= & \iota \mid I?^\ell \\ \text{wrapper coercions} & \overline{c} & ::= & \iota; f; i \qquad \dagger\\ \text{normal parts} & \tilde{c} & ::= & j ; f; i_\bot \qquad \ddagger \\ \text{normal coercions} & \hat{c} & ::= & \tilde{c} \mid \iota; \iota; \mathsf{Fail}^\ell \end{array}$ $$\dagger$$ The coercion $$(\iota ;\iota; \iota)$$ is not a wrapper coercion.
$$\ddagger$$ The coercion $$(\iota; \iota; \mathsf{Fail}^\ell)$$ is not a normal part.

### The Eager Gradually-Typed Lambda Calculus

Taking a step back, recall that we gave the semantics of the Lazy Gradually-Typed Lambda Calculus in terms of a denotational semantics, based on an evaluation function $$\mathcal{E}$$. We can do the same for the Eager variant but using coercions to give the meaning of casts. The following is the definition of values and results for the Eager variant. $\begin{array}{lrcl} & F & \in & V \to_c R \\ \text{values} & v \in V & ::= & k \mid F \mid v : \overline{c} \\ \text{results}& r \in R & ::= &v \mid \mathbf{blame}\,\ell \end{array}$

Most of the action in the $$\mathcal{E}$$ function is in the $$\mathsf{cast}$$ auxiliary function. We will give an alternative version of $$\mathsf{cast}$$ for eager checking. To make $$\mathsf{cast}$$ more succinct we make use of the following helper function regarding cast failure. $\mathsf{isfail}(c,\ell) \equiv (c = \mathsf{Fail}^\ell \text{ or } c = \mathsf{Fail}^\ell \circ (\tilde{c}_1 \to \tilde{c}_2) \text{ for some } \tilde{c}_1 \text{ and } \tilde{c}_2)$ Here's the updated definition of $$\mathsf{cast}$$ for eager checking. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota \\ \mathbf{blame}\,\ell & \text{if } \mathsf{isfail}(\hat{c},\ell) \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c}_2) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c}_2)= \iota \\ \mathbf{blame}\,\ell & \text{if } (\overline{c_1}; \hat{c}_2) \longrightarrow^{*} \hat{c}_3 \text{ and } \mathsf{isfail}(\hat{c}_3,\ell) \\ \tilde{v} : \overline{c_3} & \text{if } (\overline{c_1}; \hat{c}_2) \longrightarrow^{*} \overline{c}_3 \end{cases} \end{align*}

We can now give the definition of $$\mathcal{E}$$, making use of the above $$\mathsf{cast}$$ function as well as a function $$\mathcal{C}$$ for compiling casts to coercions. (Use $$\mathcal{C}_{\mathit{D}}$$ or $$\mathcal{C}_{\mathit{UD}}$$ for $$\mathcal{C}$$ to obtain the D or UD blame tracking strategy.) \begin{align*} \mathcal{E}(k,\rho) &= \mathbf{return}\, k \\ \mathcal{E}(x,\rho) &= \mathbf{return}\, \rho(x) \\ \mathcal{E}(\lambda x{:}T.\,e, \rho) &= \mathbf{return}\, (\lambda v.\, \mathcal{E}(e, \rho[x\mapsto v])) \\ \mathcal{E}(\mathit{op}(e)) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \delta(\mathit{op},X) \\ \mathcal{E}(e : T_1 \Rightarrow^\ell T_2) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \mathsf{cast}(X, \mathcal{C}(T_1 \Rightarrow^\ell T_2)) \\ \mathcal{E}(e_1\,e_2) &= \mathbf{letB}\,X_1 = \mathcal{E}(e_1,\rho)\,\mathbf{in}\\ & \quad\; \mathbf{letB}\,X_2 = \mathcal{E}(e_2,\rho)\,\mathbf{in}\\ & \quad\; \mathsf{apply}(X_1,X_2) \end{align*}

The semantics for the Eager Gradually-Typed Lambda Calculus is defined by the following $$\mathit{eval}$$ partial function. $\mathit{eval}(e) = \begin{cases} \mathit{observe(r)} & \text{if }\emptyset \vdash e \leadsto e' : T \text{ and } \mathcal{E}(e',\emptyset) = r \\ \bot & \text{otherwise} \end{cases}$ where \begin{align*} \mathit{observe}(k) &= k \\ \mathit{observe}(F) &= \mathit{function} \\ \mathit{observe}(v : \iota \circ (\hat{c}_1 \to \hat{c}_2) \circ \iota) &= \mathit{function} \\ \mathit{observe}(v : I! \circ \iota \circ \iota) &= \mathit{dynamic} \\ \mathit{observe}(\mathbf{blame}\,\ell) &= \mathbf{blame}\,\ell \end{align*}

### An Eager Space-Efficient Machine

To obtain a space efficient machine for the Eager variant, we just plug the eager version of $$\mathsf{cast}$$ into the lazy space-efficient machine.

### An Eager Time-Efficient Machine

Recall that the lazy time-efficient machine used threesomes instead of coercions because we could define an efficient function for composing threesomes, whereas reducing coercions is a complex process. The natural thing to do here is to try and come up with an eager variant of threesomes and the composition function. The lazy threesomes were isomorphic to lazy coercions in normal form, and we already have the normal forms for eager coercions, so it should be straightforward to come up with eager threesomes. It is straightforward, but in this case nothing is gained; we just end up with a slightly different notation. The reason is that the normal forms for eager coercions are more complex. So we might as well stick with using the eager coercions.

However, the essential lesson from the threesomes is that we don't need to implement reduction on coercions, instead we just need to define a composition function that takes coercions in normal form. After thinking about this for a long time, trying lots of variants, we've come up with the definition shown below. (Here we use $$\rhd$$ for composition. I'd prefer to use the fatsemi latex symbol, but it seems that is not available in MathJax.)

Composition of Normal Coercions: $$\hat{c} \rhd \hat{c}$$
\begin{align*} (j; f; i_\bot) \rhd (j'; f'; i'_\bot) &= \mathbf{case}\;i_\bot \rhd j'\;\mathbf{of}\\ & \qquad I! \Rightarrow j; f; (I! \rhd i'_\bot) \\ & \quad \mid I?^\ell \Rightarrow I?^\ell; f'; i'_\bot \\ & \quad \mid \mathsf{Fail}^\ell \Rightarrow j; f; \mathsf{Fail}^\ell \\ & \quad \mid c \Rightarrow \mathbf{case}\;(f \rhd c) \rhd f' \;\mathbf{of}\\ & \qquad\qquad\quad \mathsf{Fail}^\ell \Rightarrow j; \iota; \mathsf{Fail}^\ell\\ & \qquad\quad\quad \mid c' \Rightarrow j; c'; i'_\bot \end{align*} \begin{align*} \iota \rhd c &= c \\ c \rhd \iota &= c \\ I_1! \rhd I_2?^\ell &= \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (\tilde{c}_1 \to \tilde{c}_2) \rhd (\tilde{c}_3 \to \tilde{c}_4) &= (\tilde{c_3}\rhd \tilde{c_1}) \overset{\bullet}{\to} (\tilde{c}_2 \rhd \tilde{c}_4) \\ \mathsf{Fail}^\ell \rhd c &= \mathsf{Fail}^\ell \\ I! \rhd \mathsf{Fail}^\ell &= \mathsf{Fail}^\ell \\ \\ \tilde{c}_1 \overset{\bullet}{\to} \tilde{c}_2 &= \tilde{c}_1 \to \tilde{c}_2 \\ \mathsf{Fail}^\ell \overset{\bullet}{\to}\hat{c}_2 &= \mathsf{Fail}^\ell \\ \tilde{c}_1 \overset{\bullet}{\to}\mathsf{Fail}^\ell &= \mathsf{Fail}^\ell \end{align*}

To obtain an eager, time-efficient machine, we just replace coercion reduction with coercion composition. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota \\ \mathbf{blame}\,\ell & \text{if } \mathsf{isfail}(\hat{c},\ell) \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c}_2) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c}_2)= \iota \\ \mathbf{blame}\,\ell & \text{if } (\overline{c_1} \rhd \hat{c}_2) = \hat{c}_3 \text{ and } \mathsf{isfail}(\hat{c}_3,\ell) \\ \tilde{v} : \overline{c_3} & \text{if } (\overline{c_1} \rhd \hat{c}_2) = \overline{c}_3 \end{cases} \end{align*}

## Monday, October 08, 2012

### Is TypeScript gradually typed? Part 2

Consider the following TypeScript program, in which a number is stored in variable x of type any and then passed to the display function that expects a string. As we saw in the previous post, a gradual type system allows the implicit down-cast from any to string, so this is a well-typed program.

function display(y : string) {
document.body.innerHTML = y.charAt(0);
}

var x : any = 3;
display(x);

But what happens at run-time? The answer for TypeScript is that this program hits an error at the y.charAt(0) method call because charAt is not supported by numbers like 3. But isn't y guaranteed to be a string? No, not in TypeScript. TypeScript does not guarantee that the run-time value in a variable is consistent with the static type of the variable. The reason for this is simple, TypeScript does not perform any run-time checks at down-casts to ensure that the incoming value is of the target type. In the above program, the call display(x) causes an implicit cast to string, but there's no run-time check to make sure that the value is in fact a string. TypeScript is implemented as a compiler to JavaScript, and the compiler simply ignores the implicit casts. Let's refer to this no-checking semantics as level 1 gradual typing. I briefly describe this approach in the paper Gradual Typing for Objects.

A second alternative semantics for gradual typing is to perform run-time checking to ensure that values are consistent with the static types. For implicit casts concerning simple types like number and string, this run-time checking is straightforward. In the above example program, an error would be signaled just prior to the call to display, saying that display expects a string, not a number.

For implicit casts concerning complex types, such as function and object types, run-time checking is more subtle. Consider the following program that defines a deriv function that takes another function as a parameter.

function deriv(d:number, f:(number)=>number, x:number) {
return (f(x + d) - f(x - d)) / (2.0 * d);
}

function fun(y):any {
if (y > 0)
return Math.pow(y,3) - y - 1;
else
return "yikes";
}

deriv(0.01, fun, 3.0);
deriv(0.01, fun, -3.0);

The function fun has type (any)=>any, and at each call to deriv, this function is implicitly cast to (number)=>number. The fundamental challenge in casting functions is that it's impossible to tell in general how a function will behave, and in particular, what the return value will be. Here we don't know whether fun will return a number or a string until we've actually called it.

The standard way to deal with function casts is to delay the checking until subsequent calls. One way to visualize this semantics is to imagine the compiler generating the following wrapper function, casted_fun, that applies casts to the argument and return value.

function casted_fun(z:number):number {
return <number>fun(<any>z);
}

deriv(0.01, casted_fun, 3.0);
deriv(0.01, casted_fun, -3.0);


My first two papers on gradual typing, Gradual Typing for Functional Languages and Gradual Typing for Objects, both used level 2 gradual typing.

The down-side of delayed checking of function casts is that when an error is finally caught, the location of the error can be far away from the cast that failed. In the above example, the error would occur during the call f(x + d), not at the call to deriv. Findler and Felleisen solved this problem by introducing the notion of blame tracking in their paper Contracts for higher-order functions. The idea is to associate source location information with each cast and then to carry along this information at run-time, in the wrapper functions, so that when the cast in a wrapper fails, it can emit an error that mentions the source location of the original cast, in this example, the call to deriv.

Implementing casts in a way that supports blame tracking while also keeping space overheads to a constant factor is challenging. My paper Threesomes, With and Without Blame shows how to do this.

### Discussion

Each of the three levels comes with some advantages and disadvantages. Level 1 gradual typing is the easiest to implement, an important engineering concern, and it comes with no run-time overhead, as there is no run-time checking. On the other hand, level 1 gradual typing does not provide run-time support for catching broken invariants, such as deriv's expectation that its arguments have type string. Thus, a TypeScript programmer that really wants to enforce such an invariant would need to add code to check the type of the argument, a common practice today in JavaScript.

Level 2 gradual typing ensures that the value stored in a variable is consistent with the variable's static type and it provides the run-time checking to catch when this invariant is about to be broken. Thus, level 2 gradual typing removes the need for hand-written type tests. Also, level 2 gradual typing opens up the possibility of compiling statically-typed regions of a program in a more efficient, type-specific manner. (This is an active area of my current research.) The disadvantages of level 2 gradual typing are the run-time overhead from cast checking and the increased implementation complexity.

Level 3 gradual typing improves on level 2 by adding blame tracking, thereby improving the diagnostic errors reported when a cast fails. The extra cost of blame tracking is not very significant, so I would always suggest level 3 over level 2.

## Thursday, October 04, 2012

### Is TypeScript gradually typed? Part 1

If you haven't heard already, there's a new language named TypeScript from Microsoft, designed by Anders Hejlsberg and several others, including a recent alumni from my research group named Jonathan Turner. The TypeScript language extends JavaScript with features that are intended to help with large-scale programming such as optional static type checking, classes, interfaces, and modules. In this post I'll try to characterize what optional static typing means for TypeScript. There are a large number of possible design decisions regarding optional static typing, so the characterization is non-trivial. When discussing types, it's often easy to fixate on the static semantics, that is, how the type checker should behave, but we'll also need to look at the dynamic semantics of TypeScript in Part 2 of this post. The punch line will be that TypeScript is a gradually-typed language, but only to level 1. (I'll define levels that go from 1 to 3 and discuss their pros and cons.)

### Static Semantics (Type System)

TypeScript has an any type. Variables and fields of this type can store any type of value. TypeScript has function types that describe the types of the parameters and return types of a function and the way in which the any type and function types interact is closely related to the design I wrote about in Gradual Typing for Functional Languages, SFP 2006. Further, TypeScript has object types to describe the types of fields and methods within an object. The way in which the any type and the object types behave in TypeScript is closely related to the system I described in Gradual Typing for Objects, ECOOP 2007.

The basic feature of any is that you can implicitly convert from any type to any and you can implicitly convert from any to any other type. For example, the following is a well-typed program in TypeScript that demonstrates converting from type string to type any and back to string.

var answer : string = "42";
var a : any = answer;
var the_answer : string = a;

On the other hand, a gradual type system acts like a static type system when the any type is not involved. For example, the following program tries to implicitly convert from a number to a string, so the type system rejects this program.
var answer : number = 42;


Next, let's look at how any and function types interact. TypeScript uses structural typing for function types, which means that whether you can convert from one function type to another depends on the parts of the function type. The parts are the parameter and the return types. Consider the following example, in which a function of type (string)=>string is implicitly converted to (any)=>string and then to (any)=>any.

function f(x:string):string { return x; }
var g : (any)=>string = f;
var h : any = g;
document.body.innerHTML = h("42");

The first conversion is interesting because, if g is called with an argument of type any, then the argument needs to be implicitly converted to the string that f expects. This is an implicit down-cast, and doesn't follow the contra-variance rule for functions that one sees in the subtyping rules for object-oriented languages. Indeed, in a gradually typed system, assignment compatibility is co-variant in the parameter type of a function, at least, with respect to the any type. The second conversion, from (any)=>string to any is not so surprising, it's just up-casting from (any)=>string to any. Interestingly, there is a third implicit conversion in this program. Can you see it? It's in the call to h. The fact that we're calling h implies that h needs to be a function (or something callable), so there's essentially an implicit conversion here from any to (string)=>any.

Next let's look at implicit conversions involving object types. Like function types, object types are also structural. Consider the following well-typed program in TypeScript, in which an object of type {x:number; y:any} is implicitly converted to {x:any; y:string}, then {x:number}, and finally to any.

var o : {x:number; y:any;} = {x:1, y:"42"};
var p : {x:any; y:string;} = o;
var q : {x:number;} = p;
var r : any = p;
document.body.innerHTML = r.y;

The assignment of o to p shows structural changes within an object type, both to and from any. The next conversion, to {x:number}, shows that the type system allows implicit narrowing of object types. Thus, the rules governing implicit conversion are quite close to the consistent-subtyping relation described in Gradual Typing for Objects. This relation combines the consistency relation that governs the static behavior of any (sometimes called compatibility) with the traditional subtyping relation of structural type systems that allows the implicit narrowing of object types. Getting back to the above example, similar to the function call at type any, TypeScript allows member access on things of type any.

The next example is not well-typed in TypeScript.

var o : {x:number; y:any; } = {x:1, y:"42"};
var q : {x:number;} = o;
var r : {x:number; y:any;} = q;
document.body.innerHTML = r.y;

The tsc compiler complains that
example.ts(3,29): Cannot convert '{ x: number; }'
to '{ x: number; y: any; }':
Type '{ x: number; }' is missing property 'y'
from type '{ x: number; y: any; }'

which shows the TypeScript doesn't allow implicit widening (again in line with the consistent-subtyping relation).

To wrap up the discussion of the static semantics, let's take a look at the interaction between function types (arrows) and object types. To quote John Reynolds by way of Olivier Danvy, "As usual, something funny happens at the left of the arrow". I'm curious to see whether object narrowing is contra-variant in the parameters of function types, which is what I'd expect based on traditional subtyping relations and based on wanting a consistent design with respect to not allowing implicit widening. Consider the following example.

function f(o: {x:number;}):string { return "42"; };
var g : (o: {x:number; y:number;})=>string = f;
var h : (o: {x:number;})=>string = g;
document.body.innerHTML = h({x:1,y:2});

The conversion from f to g should be OK, because it only requires an argument of type {x:number; y:number;} to be up-cast (narrowed) to {x:number;}. However, the conversion from g to h should not be OK because it requires an argument of type {x:number;} to be implicitly down-cast (widened) to {x:number; y:number;}. Surprisingly, the tsc compiler does not give a type error for the above example! So what I said above about TypeScript disallowing implicit widening is not quite true. In many cases it disallows widening, but here we see an exception to the rule. I don't like exceptions in language design because they increase the complexity of the language. So on this one point, TypeScript differs from the design in Gradual Typing for Objects. Perhaps Jonathan can comment on whether this difference was intentional or accidental.

## Thursday, September 20, 2012

### Interpretations of the GTLC, Part 4: Even Faster

Consider the following statically-typed function. (The type $$\star$$ does not occur anywhere in this function.) $\lambda f: \mathsf{Int}{\to}\mathsf{Int}. \; x{:}\mathsf{Int} = f(42); \mathbf{return}\,x$ We'd like the execution speed of this function to be the same as if the entire language were statically typed. That is, we don't want statically-typed parts of a gradually-typed program to pay overhead because other parts may be dynamically typed. Unfortunately, in the abstract machines that we've defined so far, there is an overhead. At the point of a function call, such as $$f(42)$$ above, the machine needs to check whether $$f$$ has evaluated to a closure or to a closure wrapped in a threesome. This act of checking constitutes some run-time overhead.

Taking a step back, there are two approaches that one sees in the literature regarding how a cast is applied to a function. One approach is to build a new function that casts the argument, applies the old function, and then casts the result. The reduction rule looks like this: $v : T_1 \to T_2 \Rightarrow T_3 \to T_4 \longrightarrow \lambda x{:}T_3. (v\,(x : T_3 \Rightarrow T_1)) : T_2 \Rightarrow T_4$ The nice thing about this approach is that there's only one kind of value of function type, functions! So when it comes to function application, we only need one reduction rule, good old beta: $(\lambda x{:}T.\,e)\, v \longrightarrow [x{:=}v]e$ The other approach is to leave the cast around the function and then add a second reduction rule for applications. $(v_1 : T_1 \to T_2 \Rightarrow T_3 \to T_4) \, v_2 \longrightarrow (v_1\, (v_2 : T_3 \Rightarrow T_1)) : T_2 \Rightarrow T_4$ The nice thing about this approach is that the cast around the function is easy to access and change, which we took advantage of to compress sequences of such casts. But as we've already pointed out, having two kinds of values at function type induces some run-time overhead, even in parts of the program that are statically typed.

Our solution to this conundrum is to use a hybrid representation and to take advantage of the indirection that is already present in a function call. Instead of having two kinds of values at function type, we have only one: a closure that includes an optional threesome: $\langle \lambda x{:}T.\, s, \rho, \tau_\bot \rangle$ When a closure is first created, there is no threesome. Later, when a closure is cast, the threesome is added. $V(\lambda x{:}T.\, s,\rho) = \langle \lambda x{:}T.\, s, \rho, \bot \rangle$ The one transition rule for function application passes the optional threesome as a special parameter, here named $$c$$, to the function. In the case of an un-casted closure, the function ignores the $$c$$ parameter. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2,c{:=}\tau_\bot ], (T_1 \overset{T_1}{\Longrightarrow} T_1, (x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho', \tau_\bot \rangle \\ \text{and } & V(e_2,\rho) = v_2 \end{align*}

When an un-casted closure is cast, we build a wrapper function, similar to the first approach discussed above, but using the special variable $$c$$ to refer to the threesome instead of hard-coding the cast into the wrapper function. We add $$\mathit{dom}$$ and $$\mathit{cod}$$ operations for accessing the parts of a function threesome. \begin{align*} \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \bot\rangle ,\tau) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } \tau = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x_1.\,s', \rho', \tau \rangle & \text{otherwise} \end{cases} \\ & \text{where } s' = (x_2 = x_1 {:} \mathit{dom}(c); \mathbf{return}\, f(x_2) : \mathit{cod}(c)) \\ & \text{and } \rho' = \{ f{:=}\langle \lambda x{:}T.\,s, \rho, \bot\rangle \} \end{align*} When a closure is cast for the second time, the casts are combined to save space. \begin{align*} \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \tau_1\rangle ,\tau_2) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } (\tau_1; \tau_2) = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x{:}T.\,s, \rho, (\tau_1; \tau_2)\rangle & \text{otherwise} \end{cases} \end{align*}

That's it. We now have a machine that doesn't perform extra dispatching at function calls. There is still a tiny bit of overhead in the form of passing the $$c$$ argument. This overhead can be removed by passing the entire closure to itself (instead of passing the array of free variables and the threesome separately), and from inside the function, access the threesome from the closure.

In the following I give the complete definitions for the new abstraction machine. In addition to $$\mathit{dom}$$ and $$\mathit{cod}$$, we add a tail call without a cast to avoid overhead when there is no cast. $\begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \mid \mathit{dom}(e) \mid \mathit{cod}(e) \\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) \mid \mathbf{return}\,e(e) : \tau \\ \text{optional threesomes} & \tau_\bot & ::= & \bot \mid \tau \\ \text{values}& v & ::= & k \mid k : \tau \mid \langle \lambda x{:}T.\, s, \rho, \tau_\bot \rangle \end{array}$ Here's the complete definition of the cast function. \begin{align*} \mathsf{cast}(v, \bot) &= v \\ \mathsf{cast}(k, \tau) &= \begin{cases} k & \text{if } \tau = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } \tau = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : \tau & \text{otherwise} \end{cases} \\ \mathsf{cast}(k : \tau_1, \tau_2) &= \begin{cases} k & \text{if } (\tau_1;\tau_2) = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } (\tau_1;\tau_2) = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : (\tau_1;\tau_2) & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \bot\rangle ,\tau) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } \tau = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x_1.\,s' , \{ f{:=}\langle \lambda x{:}T.\,s, \rho, \bot\rangle \}, \tau \rangle & \text{otherwise} \end{cases} \\ & \text{where } s' = (x_2 = x_1 {:} \mathit{dom}(c); \mathbf{return}\, f(x_2) : \mathit{cod}(c)) \\ \mathsf{cast}(\langle \lambda x{:}T.\,s, \rho, \tau_1\rangle ,\tau_2) &= \begin{cases} \mathbf{blame}\,\ell & \text{if } (\tau_1; \tau_2) = (T_1 \overset{I^p;\bot^\ell}{\Longrightarrow} T_2) \\ \langle \lambda x{:}T.\,s, \rho, (\tau_1; \tau_2)\rangle & \text{otherwise} \end{cases} \end{align*} Here are the updated evaluation rules. \begin{align*} V(k,\rho) &= k \\ V(x,\rho) &= \rho(x) \\ V(\lambda x{:}T.\, s,\rho) &= \langle \lambda x{:}T.\, s, \rho, \bot \rangle \\ V(\mathit{dom}(e),\rho) &= \tau_1 & \text{if } V(e,\rho) = \tau_1 \to \tau_2 \\ V(\mathit{cod}(e),\rho) &= \tau_2 & \text{if } V(e,\rho) = \tau_1 \to \tau_2 \end{align*} Lastly, here are the transition rules for the machine. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2,c{:=}\tau_\bot ], (T_1 \overset{T_1}{\Longrightarrow} T_1, (x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho', \tau_\bot \rangle \\ \text{and } & V(e_2,\rho) = v_2 \\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : \tau; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v \text{ and } \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e, \rho, (\tau, (x,s,\rho')::\kappa)) & \longmapsto (s, \rho'[x{:=}v'], \kappa) \\ \text{where }& V(e,\rho) = v \text{ and } \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e_1(e_2), \rho,\kappa) & \longmapsto (s, \rho'[y{:=}v_2,c{:=}\tau_\bot],\kappa) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho',\tau_\bot\rangle\\ \text{and } & V(e_2,\rho) = v_2 \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho,(\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2,c{:=}\tau_\bot], ((\tau_1; \tau_2), \sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho',\tau_\bot\rangle\\ \text{and } & V(e_2,\rho) = v_2 \2ex] (x = e : \tau; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \end{align*} I like how there are fewer rules and the rules are somewhat simpler compared to the previous machine. There is one last bit of overhead in statically typed code: in a normal return we have to apply the pending threesome that's on the stack. If one doesn't care about making tail-calls space efficient in the presence of casts, then this wouldn't be necessary. But I care. ## Wednesday, September 19, 2012 ### Interpretations of the GTLC: Part 3, Going Faster The intuition for an efficient coercion composition function came from thinking about types, not coercions. We'll start with the UD blame tracking strategy and then later consider D. Also, for now we'll stick with lazy cast checking. Consider the following sequence of casts: \[ e : T_1 \Rightarrow^{\ell_1} T_2 \Rightarrow^{\ell_2} \cdots \Rightarrow^{\ell_{n-1}} T_n We'd like some way to summarize the sequence of types without loosing any important information. That is, we'd like to come up with something that can catch the same cast errors as the entire sequence, blaming the appropriate label, but using less space. Imagine the $$n$$ types as a line of differently colored trees on the side of a road. If you're next to the road, staring down the line of trees, you see what looks like one tree with branches of many colors. Some of the branches from further-away trees are hidden from view by closer trees, but some are visible. Now, suppose we wanted to maintain the same view from your standpoint, but save on water. We could replace the line of trees with a single multi-colored tree that includes all the branches visible to you. The figure below depicts three differently-colored trees getting merged into a single multi-color tree. The nodes without color should be considered transparent.

Mapping this idea back to types, the colors are blame labels and transparent nodes are the type $$\star$$. Because we need to color individual branches, we need blame labels on every internal node of a type. In particular, we need the notion of a labeled type: $\begin{array}{lrcl} \text{optional labels} & p,q & ::= & \epsilon \mid \ell \\ \text{labeled types} & P,Q & ::= & B^p \mid P \to^p Q \mid \star \mid (I^p; \bot^\ell) \end{array}$ (These labeled types are for the UD strategy. We'll discuss the labeled types for D later. Also, the labeled type $$I^p; \bot^\ell$$ deserves some explanation, which we'll get to soon.) The $$\mathit{label}$$ function returns the top-most label of a labeled type: \begin{align*} \mathit{label}(B^p) &= p \\ \mathit{label}(P \to^p Q) &= p \\ \mathit{label}(\star) &= \epsilon \\ \mathit{label}(I^p; \bot^\ell) &= p \end{align*}

We'll define a function for composing two labeled types $$P$$ and $$Q$$ to produce a new labeled type $$P'$$, using semicolon as the syntax for this composition function: $P ; Q = P'$ We replace each cast with a threesome, that is, a cast annotated with a labeled type. The labeled type is computed by a simple function $$\mathcal{L}$$ that we define below. \begin{align*} & e : T_1 \Rightarrow^\ell T_2 \text{ becomes } e : T_1 \overset{P}{\Longrightarrow} T_2 \\ & \text{ where } P = \mathcal{L}(T_1 \Rightarrow^\ell T_2) \end{align*} \begin{align*} \mathcal{L}(B \Rightarrow^\ell B) &= B \\ \mathcal{L}(\star \Rightarrow^\ell \star) &= \star \\ \mathcal{L}(B \Rightarrow^\ell \star) &= B \\ \mathcal{L}(\star \Rightarrow^\ell B) &= B^\ell \\ \mathcal{L}(T_1 \Rightarrow^\ell T_2) &= I ; \bot^\ell \qquad \text{where } I \sim T_1 \\ \mathcal{L}(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= \mathcal{L}(T_3 \Rightarrow^\ell T_1) \to \mathcal{L}(T_2 \Rightarrow^\ell T_4) \\ \mathcal{L}(T_1 \to T_2 \Rightarrow^\ell \star) &= \mathcal{L}(\star \Rightarrow^\ell T_1) \to \mathcal{L}(T_2 \Rightarrow^\ell \star) \\ \mathcal{L}(\star \Rightarrow^\ell T_3 \to T_4) &= \mathcal{L}(T_3 \Rightarrow^\ell \star) \to^\ell \mathcal{L}(\star \Rightarrow^\ell T_4) \end{align*} A sequence of threesomes is compressed to a single threesome using the composition function: \begin{gather*} e : T_1 \overset{P_1}{\Longrightarrow} T_2 \overset{P_2}{\Longrightarrow} \cdots \overset{P_{n-1}}{\Longrightarrow} T_n \\ \text{becomes} \\ e : T_1 \overset{P}{\Longrightarrow} T_n \\ \text{where } P = P_1; P_2; \cdots; P_{n-1} \end{gather*}

Before we go into the details of the composition function, it helps to see how (well-formed) threesomes correspond to coercions in normal form. With this correspondence in place, we can use the coercion reduction rules to help guide the definition of threesome composition. The function $$\mathit{TC}$$ defined below maps threesomes to coercions in normal form. This function is an isomorphism, so it's inverse maps normal coercions back to threesomes. \begin{align*} \mathit{TC}(B \overset{B}{\Longrightarrow} B) &= \iota_B \\ \mathit{TC}(\star \overset{\star}{\Longrightarrow}\star) &=\iota_\star \\ \mathit{TC}(\star \overset{B^\ell}{\Longrightarrow} B) &= B?^\ell \\ \mathit{TC}(B\overset{B}{\Longrightarrow} \star) &= B! \\ \mathit{TC}(\star \overset{B^\ell}{\Longrightarrow} \star) &= B?^\ell; B! \\ \mathit{TC}(T_1 \overset{I; \bot^\ell}{\Longrightarrow} T_2) &= \mathsf{Fail}^\ell \\ \mathit{TC}(T_1 \overset{I^{\ell_1}; \bot^{\ell_2}}{\Longrightarrow} T_2) &= I?^{\ell_1} ; \mathsf{Fail}^{\ell_2} \\ \mathit{TC} (T_1 \to T_2 \overset{P_1 \to P_2}{\Longrightarrow} T_3 \to T_4)&= \mathit{TC}(T_3 \overset{P_1}{\Longrightarrow} T_1) \to \mathit{TC}(T_2 \overset{P_2}{\Longrightarrow} T_4) \\ \mathit{TC} (\star \overset{P_1 \to^\ell P_2}{\Longrightarrow} T_3 \to T_4)&= (\star \to \star)?^\ell ; \mathit{TC}(T_3 \overset{P_1}{\Longrightarrow} \star) \to \mathit{TC}(\star \overset{P_2}{\Longrightarrow} T_4) \\ \mathit{TC} (T_1 \to T_2 \overset{P_1 \to P_2}{\Longrightarrow} \star)&= \mathit{TC}(\star \overset{P_1}{\Longrightarrow} T_1) \to \mathit{TC}(T_2 \overset{P_2}{\Longrightarrow} \star); (\star \to \star)! \\ \mathit{TC} (\star \overset{P_1 \to^\ell P_2}{\Longrightarrow} \star)&= (\star \to \star)?^\ell ; \mathit{TC}(\star \overset{P_1}{\Longrightarrow} \star) \to \mathit{TC}(\star \overset{P_2}{\Longrightarrow} \star); (\star \to \star)! \end{align*}

We're ready to make precise how two labeled types can be composed to form a single labeled type. The two cases in which one of the labeled types is $$\star$$ are easy: just return the other type: \begin{align*} \star; Q &= Q \\ P; \star &= P \end{align*} Next, suppose we have $$\mathsf{Int}^{\ell_1}$$ followed by $$\mathsf{Int}^{\ell_2}$$. These should compose to $$\mathsf{Int}^{\ell_1}$$ because if the first cast succeeds, so will the second, making the blame label $$\ell_2$$ redundant. In general, for labeled basic types we have the following rule. $$\label{eq:1} B^p; B^q = B^p$$ Suppose instead that the basic types don't match. The right-hand side of the following rule is a bit tricky, so let's think about this in terms of coercions. $$\label{eq:2} B_1^p ; B_2^q = B_1^p ; \bot^q \qquad \text{if } B_1 \neq B_2$$ Suppose $$p=\ell_1, q = \ell_2$$ and these two labeled types come from the threesomes $\star \overset{B_1^{\ell_1}}{\Longrightarrow} \star \overset{B_2^{\ell_2}}{\Longrightarrow} B_2$ The corresponding coercion sequence is $B_1?^{\ell_1} ; B_1! ; B_2?^{\ell_2}$ which reduces to $B_1?^{\ell_1} ; \mathsf{Fail}^{\ell_2}$ and that corresponds to the labeled type for errors, $$B_1^{\ell_1}; \bot^{\ell_2}$$. We also need to consider a mismatch between basic types and function types: \begin{align} B^p; (P \to^q Q) &= B^q; \bot^q \\ (P \to^p Q); B^q &= (\star \to \star)^p; \bot^q \end{align} The rule for labeled function types takes the label $$p$$ for the label in the result and it recursively composes the domain and codomain types. The contra-variance in the parameter type is important for getting the right blame and coincides with the contra-variance in the reduction rule for composing function coercions. $$\label{eq:4} (P_1 \to^p P_2) ; (Q_1 \to^q Q_2) = (Q_1; P_1) \to^p (P_2; Q_2)$$ The following figure shows an example similar to the previous figure, but with function types instead of pair types. The analogy with real trees and line-of-sight breaks down because you have to flip to looking at the trees from back-to-front instead of front-to-back for negative positions within the type.

Lastly we need several rules to handle when the error type is on the left or right. \begin{align} (I^p; \bot^\ell); Q &= (I^p; \bot^\ell) \\ P; (I^q; \bot^\ell) &= I^p ; \bot^\ell \qquad \text{if } I \sim P \text{ and } \mathit{label}(P) = p \\ P; (I^q; \bot^\ell) &= I^p ; \bot^q \qquad \text{if } I \not\sim P \text{ and } \mathit{label}(P) = p \end{align}

What's extra cool about labeled types and their composition function is that each rule covers many different rules if you were to formulate them in terms of coercions. For example, the single rule $$B^p; B^q = B^p$$ covers four situations when viewed as threesomes or coercions: \begin{align*} (B \overset{B}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} B) &= B \overset{B}{\Longrightarrow} B \\ \iota_B; \iota_B &\longrightarrow \iota_B \\ (\star \overset{B^\ell}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} B) &= \star \overset{B^\ell}{\Longrightarrow} B \\ B?^\ell ; \iota_B &\longrightarrow B?^\ell \\ (B \overset{B}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} \star) &= B \overset{B}{\Longrightarrow} \star \\ \iota_B; B! &\longrightarrow B! \\ (\star \overset{B^\ell}{\Longrightarrow} B; B \overset{B}{\Longrightarrow} \star) &= \star \overset{B^\ell}{\Longrightarrow} \star \\ B?^\ell ; B! & \text{ is already in normal form} \end{align*}

We define a threesomes as a source type, middle labeled typed, and a target type. $\begin{array}{llcl} \text{threesomes} & \tau & ::= & T \overset{P}{\Longrightarrow} T \\ \end{array}$ We define the sequencing of threesomes as follows $(T_1 \overset{P}{\Longrightarrow} T_2); (T_2 \overset{Q}{\Longrightarrow}T_3) = T_1 \overset{P;Q}{\Longrightarrow} T_3$ Similarly, we define the notation $$\tau_1 \to \tau_2$$ as $(T_3 \overset{P}{\Longrightarrow} T_1) \to (T_2 \overset{Q}{\Longrightarrow} T_4) = T_1\to T_2 \overset{P\to Q}{\Longrightarrow} T_3 \to T_4$

We can now go back to the ECD machine and replace the coercions with threesomes. Here's the syntax in A-normal form. $\begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \\ \text{definitions} & d & ::= & x=\mathit{op}(e) \mid x : T = e(e) \mid x = e : \tau \\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) : \tau \\ \text{simple values} & \tilde{v} & ::= & k \mid \langle \lambda x{:}T.\, s, \rho \rangle \\ \text{values}& v & ::= & \tilde{v} \mid \tilde{v} : \tau \end{array}$ The cast function, of course, needs to change. \begin{align*} \mathsf{cast}(k, \tau) &= \begin{cases} k & \text{if } \tau = (B \overset{B}{\Longrightarrow} B) \\ \mathbf{blame}\,\ell & \text{if } \tau = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : \tau & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.s,\rho \rangle, \tau) &= \langle \lambda x{:}T.s,\rho \rangle : \tau \\ \mathsf{cast}(k : \tau_1, \tau_2) &= \begin{cases} k & \text{if } (\tau_1;\tau_2) = B \overset{B}{\Longrightarrow} B \\ \mathbf{blame}\,\ell & \text{if } (\tau_1;\tau_2) = B \overset{B^p;\bot^\ell}{\Longrightarrow} T\\ k : (\tau_1;\tau_2) & \text{otherwise} \end{cases} \\ \mathsf{cast}(\langle \lambda x{:}T.s,\rho \rangle : \tau_1, \tau_2) &= \begin{cases} \langle \lambda x{:}T.s,\rho \rangle : (\tau_1;\tau_2)& \text{if } \mathit{middle}(\tau_1;\tau_2) \neq (I^p;\bot^\ell) \\ \mathbf{blame}\,\ell & \text{if } \mathit{middle}(\tau_1;\tau_2) = (I^p;\bot^\ell) \end{cases} \end{align*} And last but not least, here's the transitions for the ECD machine, but with threesomes instead of coercions. \begin{align*} (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2], (T_1 \overset{T_1}{\Longrightarrow}T_1,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle, V(e_2,\rho) = v_2 \\ (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v'_2], (\tau_2,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle : \tau_1 \to \tau_2, \\ & V(e_2,\rho) = v_2, \text{ and } \mathsf{cast}(v_2, \tau_1) = v'_2\\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : \tau; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto (s, [x{:=}v']\rho', \kappa) \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = v' \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho, (\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2], ((\tau_1; \tau_2),\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle, V(e_2,\rho) = v_2 \\ (\mathbf{return}\,e_1(e_2) : \tau_1, \rho, (\tau_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v'_2], (\tau_5,\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_1. s, \rho' \rangle : \tau_3 \to \tau_4,\\ & V(e_2,\rho) = v_2, \mathsf{cast}(v_2, \tau_3) = v'_2, \text{ and} \\ & (\tau_4; \tau_1; \tau_2) = \tau_5 \2ex] (x = e : \tau; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (\tau,(x,s,\rho')::\kappa)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,\tau) = \mathbf{blame}\,\ell \end{align*} We now have an implementation of Lazy UD that is space efficient and relatively efficient in time as well. However, there is one nagging issue regarding the speed of statically-typed code. Notice how there are two transition rules for each kind of function call. The source of the problem is that there are two kinds of values that have function type, closures and closures wrapped in a threesome. In the next post I'll define a unified representation for closures and wrapped closures so that we don't need to dispatch at runtime. ## Tuesday, September 18, 2012 ### Interpretations of the GTLC, Part 2: Space-Efficient Machines I briefly mentioned in my previous post that there are implementation challenges regarding gradual typing. One of those challenges regards space efficiency. In their paper Space-Efficient Gradual Typing, Herman, Tomb, and Flanagan observed two circumstances in which function casts can lead to unbounded space consumption. First, some programs repeatedly apply casts to the same function, resulting in a build-up of casts around the function. In the following example, each time the function bound to k is passed between even and odd a cast is added, causing a space leak proportional to n. let rec even(n : Int, k : Dyn->Bool) : Bool = if (n = 0) then k(True : Bool => *) else odd(n - 1, k : *->Bool => Bool->Bool) and odd(n : Int, k : Bool->Bool) : Bool = if (n = 0) then k(False) else even(n - 1, k : Bool->Bool => *->Bool)  Second, some casts break proper tail recursion. Consider the following example in which the return type of even is $$\star$$ and odd is $$\mathsf{Bool}$$. let rec even(n : Int) : Dyn = if (n = 0) then True else odd(n - 1) : Bool => * and odd(n : Int) : Bool = if (n = 0) then False else even(n - 1) : * => Bool  Assuming tail call optimization, cast-free versions of the even and odd functions require only constant space, but because the call to even is no longer a tail call, the run-time stack grows with each call and space consumption is proportional to n. Herman et al.'s solution to this space-efficiency problem relies on Henglein's Coercion Calculus. This calculus defines a set of combinators, called coercions, that can be used to express casts. The key advantage of coercions is that an arbitrarily long sequence of coercions reduces to a sequence of at most three coercions. The following defines the syntax and typing rules for coercions. The two most important coercions are the injection coercion $$I!$$ from $$I$$ to $$\star$$ and the projection coercion $$I?^\ell$$ from $$\star$$ to $$I$$. The definition of the injectable types $$I$$ depends on the blame tracking strategy: \begin{align*} I & ::= B \mid \star \to \star & (UD) \\ I & ::= B \mid T \to T & (D) \end{align*} \begin{gather*} \begin{array}{llcl} \text{coercions} & c & ::= & \iota_T \mid I! \mid I?^\ell \mid c\to c \mid c;c \mid \mathsf{Fail}^\ell \end{array} \\[2ex] \frac{}{\vdash \iota_T : T \Rightarrow T} \qquad \frac{}{\vdash T! : T \Rightarrow \star} \qquad \frac{}{\vdash T?^\ell : \star \Rightarrow T} \\[2ex] \frac{\vdash c_1 : T_{21} \Rightarrow T_{11} \quad \vdash c_2 : T_{12} \Rightarrow T_{22}} {\vdash c_1 \to c_2 : T_{11} \to T_{12} \Rightarrow T_{21} \to T_{22}} \\[2ex] \frac{\vdash c_1 : T_1 \Rightarrow T_2 \quad \vdash c_2 : T_2 \Rightarrow T_3} {\vdash c_1 ; c_2 : T_1 \Rightarrow T_3} \qquad \frac{}{\vdash \mathsf{Fail}^\ell : T_1 \Rightarrow T_2} \end{gather*} We sometimes drop the subscript on the $$\iota_T$$ when the type $$T$$ doesn't matter. The way in which casts are compiled to coercions depends on whether you're using the D or UD blame tracking strategy. Here's the compilation function for D. \begin{align*} \mathcal{C}_D(\star \Rightarrow^\ell \star) &= \iota \\ \mathcal{C}_D(B \Rightarrow^\ell B) &= \iota \\ \mathcal{C}_D(\star \Rightarrow^\ell I) &= I?^\ell \\ \mathcal{C}_D(I \Rightarrow^\ell \star) &= I! \\ \mathcal{C}_D(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= c_1 \to c_2 \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell T_4) \\ \mathcal{C}_D(T_1 \Rightarrow^\ell T_2) &= \mathsf{Fail}^\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \end{align*} The compilation function for UD is a bit more complicated because the definition of injectable type is more restrictive. \begin{align*} \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell \star) &= \iota \\ \mathcal{C}_{\mathit{UD}}(B \Rightarrow^\ell B) &= \iota \\ \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell B) &= B?^\ell \\ \mathcal{C}_{\mathit{UD}}(\star \Rightarrow^\ell T_3 \to T_4) &= (\star \to \star)?^\ell; (c_1 \to c_2) \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell \star) \\ \text{and } & c_2 = \mathcal{C}_D(\star \Rightarrow^\ell T_4) \\ \mathcal{C}_{\mathit{UD}}(B \Rightarrow^\ell \star) &= B! \\ \mathcal{C}_{\mathit{UD}}(T_1 \to T_2 \Rightarrow^\ell \star) &= (c_1 \to c_2) ; (\star \to \star)! \\ \text{where } & c_1 = \mathcal{C}_D(\star \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell \star) \\ \mathcal{C}_{\mathit{UD}}(T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4) &= c_1 \to c_2 \\ \text{where } & c_1 = \mathcal{C}_D(T_3 \Rightarrow^\ell T_1) \\ \text{and } & c_2 = \mathcal{C}_D(T_2 \Rightarrow^\ell T_4) \\ \mathcal{C}_{\mathit{UD}}(T_1 \Rightarrow^\ell T_2) &= \mathsf{Fail}^\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \end{align*} We identify coercion sequencing up to associativity and identity: \begin{align*} c_1 ; (c_2; c_3) &= (c_1 ; c_2); c_3 \\ (c ; \iota) &= (\iota ; c) = c \end{align*} The following are the reduction rules for coercions. The compilation function $$\mathcal{C}$$ depends on the choice of blame tracking strategy (D or UD). \begin{align*} I_1!; I_2?^\ell & \longrightarrow \mathcal{C}(I_1 \Rightarrow^\ell I_2) \\ (c_{11} \to c_{12}); (c_{21} \to c_{22}) & \longrightarrow (c_{21};c_{11}) \to (c_{12}; c_{22}) \\ \mathsf{Fail}^\ell; c & \longrightarrow \mathsf{Fail}^\ell \\ \overline{c} ; \mathsf{Fail}^\ell & \longrightarrow \mathsf{Fail}^\ell \\ \end{align*} The last reduction rule refers to $$\overline{c}$$, a subset of the coercions that we refer to as wrapper coercions. We define wrapper coercions and coercions in normal form $$\hat{c}$$ as follows. \[ \begin{array}{llcl} \text{optional injections} & \mathit{inj} & ::= & \iota \mid I! \\ \text{optional function coercions} & \mathit{fun} & ::= & \iota \mid \hat{c} \to \hat{c} \\ \text{optional projections} & \mathit{proj} & ::= & \iota \mid I?^\ell \\ \text{wrapper coercions} & \overline{c} & ::= & \mathit{fun}; \mathit{inj} \\ \text{normal coercions} & \hat{c} & ::= & \mathit{proj} ; \mathit{fun}; \mathit{inj} \mid \mathit{proj}; \mathsf{Fail}^\ell \end{array} Here we can easily see that coercions in normal form can always be represented by a coercion with a length of at most three.

Theorem (Strong Normalization for Coercions) For any coercion $$c$$, there exists a $$\hat{c}$$ such that $$c \longrightarrow^{*} \hat{c}$$.

With the Coercion Calculus in hand, we can define a space-efficient abstract machine. This machine is a variant of my favorite abstract machine for the lambda calculus, the ECD machine on terms in A-normal form. Herman et al. define an efficient reduction semantics that relies on mutually-recursive evaluation contexts to enable the simplification of coercions in tail position. Our choice of the ECD machine allows us to deal with coercions in tail position in a more straightforward way. Here's the syntax for our coercion-based calculus in A-normal form. The two main additions are the cast definition and the tail-call statement. $\begin{array}{llcl} \text{expressions} & e & ::= & k \mid x \mid \lambda x{:}T.\, s \\ \text{definitions} & d & ::= & x=\mathit{op}(e) \mid x:T = e(e) \mid x = e : \hat{c}\\ \text{statements} & s & ::= & d; s \mid \mathbf{return}\,e \mid \mathbf{return}\,e(e) : \hat{c} \\ \text{simple values} & \tilde{v} & ::= & k \mid \langle \lambda x{:}T.\, s, \rho \rangle \\ \text{values}& v & ::= & \tilde{v} \mid \tilde{v} : \overline{c} \end{array}$

The evaluation function $$V$$ maps expressions and environments to values in the usual way. \begin{align*} V(k,\rho) &= k \\ V(x,\rho) &= \rho(x) \\ V(\lambda x{:}T.\, s,\rho) &= \langle \lambda x{:}T.\, s, \rho \rangle \end{align*} The following auxiliary function applies a coercion to a value. \begin{align*} \mathsf{cast}(\tilde{v}, \hat{c}) &= \begin{cases} \tilde{v} & \text{if } \hat{c} = \iota_B \\ \mathbf{blame}\,\ell & \text{if } \hat{c} = \mathsf{Fail}^\ell \\ \tilde{v} : \hat{c} & \text{otherwise} \end{cases} \\ \mathsf{cast}(\tilde{v} : \overline{c_1}, \hat{c_2}) &= \begin{cases} \tilde{v} & \text{if } (\overline{c_1}; \hat{c_2})= \iota_B \\ \mathbf{blame}\,\ell & \text{if } \overline{c_1}; \hat{c_2} \longrightarrow^{*} \mathsf{Fail}^\ell \\ \tilde{v} : \overline{c_3} & \text{if } \overline{c_1}; \hat{c_2} \longrightarrow^{*} \overline{c_3} \text{ and } \overline{c_3} \neq \mathsf{Fail}^\ell \end{cases} \end{align*}

The machine state has the form $$(s,\rho,\kappa)$$, where the stack $$\kappa$$ is essentially a list of frames. The frames are somewhat unusual in that they include a pending coercion. We also need a pending coercion for the empty stack, so we define stacks as follows. $\begin{array}{llcl} & \sigma & ::= & [] \mid (x,s,\rho)::\kappa \\ \text{stacks} & \kappa & ::= & (c,\sigma) \end{array}$ The machine has seven transition rules to handle normal execution and two rules to handle cast errors. \begin{align*} (x:T = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v_2], (\iota_T,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s', \rho' \rangle, V(e_2,\rho) = v_2 \\ (x : T_1 = e_1(e_2); s, \rho, \kappa) & \longmapsto (s', \rho'[y{:=}v'_2], (c_2,(x,s,\rho)::\kappa)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T_2. s', \rho' \rangle : (c_1 \to c_2), \\ & V(e_2,\rho) = v_2, \text{ and } \mathsf{cast}(v_2,c_1) = v'_2\\ (x = \mathit{op}(\overline{e}); s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v], \kappa) \\ \text{where }& v = \delta(\mathit{op},V(e,\rho)) \\ (x = e : c; s, \rho, \kappa) & \longmapsto (s, \rho[x{:=}v'], \kappa) \\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,c) = v' \\ (\mathbf{return}\,e, \rho, (c,(x,s,\rho')::\kappa)) & \longmapsto (s, [x{:=}v']\rho', \kappa) \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,c) = v' \\ (\mathbf{return}\,(e_1\,e_2) : c_1, \rho, (c_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v_2], (\hat{c_3},\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle, V(e_2,\rho) = v_2\\ & (c_1; c_2) \longrightarrow^{*} \hat{c_3} \\ (\mathbf{return}\,(e_1\,e_2) : c_1, \rho, (c_2,\sigma)) & \longmapsto (s, \rho'[y{:=}v'_2], (\hat{c_5},\sigma)) \\ \text{where } & V(e_1,\rho) = \langle \lambda y{:}T. s, \rho' \rangle : (c_3 \to c_4),\\ & V(e_2,\rho) = v_2, \mathsf{cast}(v_2, c_3) = v'_2, \text{ and} \\ & (c_4; c_1; c_2) \longrightarrow^{*} \hat{c_5} \2ex] (x = e : c; s, \rho, \kappa) & \longmapsto \mathbf{blame}\,\ell\\ \text{where } & V(e,\rho) = v, \mathsf{cast}(v,c) = \mathbf{blame}\,\ell \\ (\mathbf{return}\,e, \rho, (c,\sigma)) & \longmapsto \mathbf{blame}\,\ell \\ \text{where }& V(e,\rho) = v, \mathsf{cast}(v,c) = \mathbf{blame}\,\ell \end{align*} The space-efficiency of this machine comes from two places. First, the values only ever include a single cast wrapped around a simple value. The cast function maintains this invariant by normalizing whenever a cast is applied to an already-casted value. The second place is the transition rule for tail calls. The coercion in tail position $$c_1$$ is sequenced with the pending coercion $$c_2$$ and normalized to $$c_3$$, which becomes the new pending coercion. While this machine is space efficient, it is not efficient with respect to time. The reason is that naive coercion reduction is an expensive process: one must search for a redex, reduce it, and plug the contractum back in. Furthermore, the associativity of coercions makes searching for a redex more challenging. In the next post I'll discuss a recursive function that directly maps a sequence of two coercions in normal form to its normal form, using ideas from the paper Threesomes, With and Without Blame I coauthored with Philip Wadler. ## Monday, September 17, 2012 ### Interpretations of the Gradually-Typed Lambda Calculus, Part 1 I just got back from Copenhagen, where I gave a tutorial on gradual typing at the Workshop on Scheme and Functional Programming. I very much enjoyed the workshop and giving the tutorial. Thank you for the invitation Olivier! For those of you who couldn't be there, this series of blog posts will include the material from my tutorial. For those of you who were there, this series will include some bonus material: an efficient machine for "Eager D" based on recent work by Ronald Garcia and myself. When I first began working on gradual typing in 2005 and 2006, my focus was on the type system. The main pieces of the type system fell into place that first year, and ever since then I've been thinking about the dynamic semantics. It turns out there are many design choices and implementation challenges regarding the dynamic semantics. In this post I'll restrict my attention to the gradually-typed lambda calculus, as many issues already arise in that setting. I'll quickly review the syntax and type system, then move on to discuss the dynamic semantics. The following defines the syntax for the gradually-typed lambda calculus. Here I'm writing the dynamic type as $\star$. Also, note that a lambda without a type annotation on its parameter is shorthand for a lambda whose parameter is annotated with $\star$. ### The Gradual Type System and Cast Insertion The type system of the gradually-typed lambda calculus is quite similar to the simply-typed lambda calculus. The only differences are in the rules for application. Instead of requiring the argument's type to be identical to the function's parameter type, we only require that the types be consistent, written $\sim$ and defined below. We also allow the application of expressions of type $$\star$$. The lack of contra-variance in how function parameters are handled in the consistency relation not a mistake. Unlike subtyping, the consistency relation is symmetric, so it wouldn't matter if we wrote $T_3 \sim T_1$ instead of $T_1 \sim T_3$. Also, consistency is not transitive, which is why we don't use a separate subsumption rule, but instead use consistency in the rules for application. The dynamic semantics of the gradually-typed lambda calculus is not defined in terms of the surface syntax, but instead it is defined on an intermediate language that extends the simply-typed lambda calculus with explicit casts. We use a non-standard notation for casts so that they are easier to read, so that they go left to right. The casts are annotated with blame labels, which we treat here as symbols, but in a real implementation would include the static location (line and character position) of the cast. We use a single blame label without any notion of polarity because these casts are really just casts, not contracts between two parties. Cast insertion is a type-directed translation, so it is the same as the type system with the addition of an output term. We often abbreviate a pair of casts to remove the duplication of the middle type as follows ### Design Choices Regarding the Dynamics Consider the following example in which a function is cast to the dynamic type and then cast to a type that is inconsistent with the type of the function. \begin{align*} & \mathsf{let}\, f = (\lambda x:\mathsf{Int}. \,\mathsf{inc}\,x) : \mathsf{Int}\to\mathsf{Int} \Rightarrow^{\ell_0} \star \Rightarrow^{\ell_1} \mathsf{Bool}\to \mathsf{Bool}\\ & \mathsf{in} \, f\, \mathsf{true} \end{align*} A few questions immediately arise: • Should a runtime cast error occur during the evaluation of the right-hand side of the let? Or should the runtime error occur later, when f is applied to \textsf{true}? • When the runtime cast error occurs, which cast should be blamed, $\ell_0$ or $\ell_1$? More generally, we want to define a subtyping relation to characterize safe casts (casts that never fail), and the specifics of subtyping relation depend on the blame tracking strategy. Ron, Walid, and I wrote a paper, Exploring the Design Space of Higher-Order Casts (ESOP 2009), that characterized the different answers to the above questions in terms of Henglein's Coercion Calculus. One can choose to check higher-order casts in either a lazy or eager fashion and one can assign blame to only downcasts (D) or the one can share blame between upcasts and downcasts (UD). The semantics of casts with lazy checking is straightforward whereas eager checking is not, so we'll first discuss lazy checking. Also, the semantics of the D approach is slightly simpler than UD, so we'll start with lazy D. We'll delay discussing the Coercion Calculus until we really need it. ### The Lazy D Semantics We'll define an evaluation (partial) function $\mathcal{E}$ that maps a term and environment to a result. That is, we'll give Lazy D a denotational semantics. The values and results are defined as follows: \[ \begin{array}{lrcl} & F & \in & V \to_c R \\ \text{values} & v \in V & ::= & k \mid F \mid v : T \Rightarrow^\ell \star \mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \\ \text{results}& r \in R & ::= &v \mid \mathbf{blame}\,\ell \end{array}

To handle the short-circuiting of evaluation in the case of a cast error (signaled by $\mathbf{blame}\,\ell$), we use the following monadic operators: \begin{align*} \mathbf{return}\,v &= v \\ \mathbf{letB}\,X = M\,\mathbf{in}\, N &= \mathbf{case}\,M\,\mathbf{of}\,\\ & \quad\;\; \mathbf{blame}\,\ell \Rightarrow \mathbf{blame}\,\ell \\ & \quad \mid v \Rightarrow [X{:=}v]N \end{align*}

The primitive operators are given their semantics by the $\delta$ function. \begin{align*} \delta(\mathsf{inc},n) &= n + 1 \\ \delta(\mathsf{dec},n) &= n - 1 \\ \delta(\mathsf{zero?},n) &= (n = 0) \end{align*}

In lazy cast checking, when determining whether to signal a cast error, we only compare the heads of the types: \begin{align*} \mathit{hd}(B) &= B \\ \mathit{hd}(T_1 \to T_2) &= \star \to \star \end{align*}

The following auxiliary function, named cast, is the main event. It is defined by cases on the source and target types $T_1$ and $T_2$. The line for projecting from $\star$ to $T_2$ picks the $\ell$ blame label from the projection (the down-cast), which is what gives this semantics its "D". \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\ \mathsf{cast}(v,B,\ell,B) &= v \\ \mathsf{cast}(v,\star,\ell,\star) &= v \\ \mathsf{cast}(v,\star,\ell,T_2) &= \mathbf{case}\,v\,\mathbf{of}\, (v' : T_3 \Rightarrow^{\ell'} \star) \Rightarrow \\ & \qquad \mathsf{cast}(v',T_3,\ell,T_2) \\ \mathsf{cast}(v,T_1,\ell,\star) &= v : T_1 \Rightarrow^\ell \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &= v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22} \end{align*}

The apply auxiliary function performs function application, and is defined by induction on the first parameter. \begin{align*} \mathsf{apply}(F,v_2) &=F(v_2) \\ \mathsf{apply}(v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4,v_2) &= \mathbf{letB}\,X_3 = \mathsf{cast}(v_2,T_3,\ell,T_1)\,\mathbf{in} \\ & \quad \mathbf{letB}\,X_4 = \mathsf{apply}(v,X_3)\,\mathbf{in} \\ & \quad \mathsf{cast}(X_4, T_2, \ell, T_4) \end{align*}

With these auxiliary functions and monadic operators in hand, the definition of the evaluation function $\mathcal{E}$ is straightforward. \begin{align*} \mathcal{E}(k,\rho) &= \mathbf{return}\, k \\ \mathcal{E}(x,\rho) &= \mathbf{return}\, \rho(x) \\ \mathcal{E}(\lambda x{:}T.\,e, \rho) &= \mathbf{return}\, (\lambda v.\, \mathcal{E}(e, \rho[x\mapsto v])) \\ \mathcal{E}(\mathit{op}(e)) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \delta(\mathit{op},X) \\ \mathcal{E}(e : T_1 \Rightarrow^\ell T_2) &= \mathbf{letB}\, X = \mathcal{E}(e,\rho) \,\mathbf{in}\, \mathsf{cast}(X,T_1 ,\ell, T_2) \\ \mathcal{E}(e_1\,e_2) &= \mathbf{letB}\,X_1 = \mathcal{E}(e_1,\rho)\,\mathbf{in}\\ & \quad \mathbf{letB}\,X_2 = \mathcal{E}(e_2,\rho)\,\mathbf{in}\\ & \quad \mathsf{apply}(X_1,X_2) \end{align*} The semantics for the Lazy D Gradually-Typed Lambda Calculus is defined by the following $\mathit{eval}$ partial function. $\mathit{eval}(e) = \begin{cases} \mathit{observe(r)} & \text{if }\emptyset \vdash e \leadsto e' : T \text{ and } \mathcal{E}(e',\emptyset) = r \\ \bot & \text{otherwise} \end{cases}$ where \begin{align*} \mathit{observe}(k) &= k \\ \mathit{observe}(F) &= \mathit{function} \\ \mathit{observe}(v : T_1\to T_2\Rightarrow^\ell T_3\to T_4) &= \mathit{function} \\ \mathit{observe}(v : T \Rightarrow \star) &= \mathit{dynamic} \\ \mathit{observe}(\mathbf{blame}\,\ell) &= \mathbf{blame}\,\ell \end{align*}

Exercise: Calculate the output of eval for the example program at the beginning of this post.

Similar to object-oriented languages, we can define a subtyping relation that characterizes when a cast is safe, that is, when a cast will never fail. The following is the subtyping relation for the D semantics. \begin{gather*} \frac{}{T <: \star} \qquad \frac{}{B <: B} \qquad \frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4} \end{gather*} This subtyping relation is what I expected to see. The dynamic type plays the role of the top element of this ordering and the rule for function types has the usual contra-variance in the parameter type. The Subtyping Theorem connects the dynamic semantics with the subtyping relation.

Theorem (Subtyping) If the cast labeled with $$\ell$$ in program $$e$$ respects subtyping, then $$\mathit{eval}(e) \neq \mathbf{blame}\,\ell$$.

### The Lazy UD Semantics

One interpretation of the dynamic type $\star$ is to view it as the following recursive type: $\star \equiv \mu \, d.\, \mathsf{Int} + \mathsf{Bool} + (d \to d)$ (See, for example, the chapter on Dynamic Typing in Robert Harper's textbook Practical Foundations for Programming Languages.) In such an interpretation, one can directly convert from $\star \to \star$ to $\star$, but not, for example, from $\mathsf{Int}\to\mathsf{Int}$ to $\star$. Instead, one must first convert from $\mathsf{Int}\to\mathsf{Int}$ to $\star \to \star$ and then to $\star$.

Let I be the subset of types that can be directly injected into $\star$: $I ::= B \mid \star \to \star$ The definition of values for Lazy UD changes to use I instead of T for the values of type $\star$. $\begin{array}{lrcl} \text{values} & v \in V & ::= & k \mid F \mid v : I \Rightarrow^\ell \star \mid v : T_1 \to T_2 \Rightarrow^\ell T_3 \to T_4 \end{array}$ This change in the definition of value necessitates some changes in the cast function. The second and third-to-last lines below contain most of the changes. \begin{align*} \mathsf{cast}(v,T_1,\ell,T_2) &= \mathbf{blame}\,\ell \qquad \text{if } \mathit{hd}(T_1) \not\sim \mathit{hd}(T_2) \\ \mathsf{cast}(v,B,\ell,B) &= v \\ \mathsf{cast}(v,\star,\ell,\star) &= v \\ \mathsf{cast}(v,\star,\ell,T_2) &= \mathbf{case}\,v\,\mathbf{of}\, (v' : I \Rightarrow^{\ell'} \star) \Rightarrow \\ & \qquad \mathsf{cast}(v',I,\ell,T_2) \\ \mathsf{cast}(v,I,\ell,\star) &= v : I \Rightarrow^\ell \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,\star) &= v : T_{11}\to T_{12} \Rightarrow^\ell \star \to \star \Rightarrow^\ell \star \\ & \text{if } T_{11} \neq \star, T_{12} \neq \star \\ \mathsf{cast}(v,T_{11}\to T_{12},\ell,T_{21}\to T_{22}) &= v : T_{11}\to T_{12} \Rightarrow^\ell T_{21}\to T_{22} \end{align*}

The rest of the definitions for Lazy UD are the same as those for Lazy D. The following is the subtyping relation for Lazy UD. With this subtyping relation, the type $$\star$$ does not play the role of the top element. Instead, a type $T$ is a subtype of $$\star$$ if it is a subtype of some injectable type $$I$$. \begin{gather*} \frac{}{\star <: \star} \qquad \frac{T <: I}{T <: \star} \qquad \frac{}{B <: B} \qquad \frac{T_3 <: T_1 \quad T_2 <: T_4}{T_1 \to T_2 <: T_3 \to T_4} \end{gather*}

Theorem (Subtyping) If the cast labeled with $$\ell$$ in program $$e$$ respects subtyping, then $$\mathit{eval}(e) \neq \mathbf{blame}\,\ell$$.

Exercise: Calculate the output of the Lazy UD eval for the example program at the beginning of this post.

In the next post I'll turn to the efficient implementation of Lazy D and UD.

## Wednesday, August 29, 2012

### Rationale for "Type Safety in Five"

In my last post Type Safety in Five Easy Lemmas, I made a claim that the formulation of the operational semantics of a language makes a big difference regarding how many lemmas, and how tedious, the proof of type safety becomes. While I showed that the particular semantics that I used led to a simple proof, with just five easy lemmas, I didn't compare it to the alternatives. Perhaps there's an even better alternative! In this post I discuss the alternatives that I know about and why they lead to more lemmas and more tedium. This post is organized by design decision.

### Why A-normal form?

The syntax of the little language is in A-normal form, that is, it's flattened out so that expressions don't have sub-expressions, but instead, more complex computations have to be built up from several variable assignments. It's much more common to see type-safety proofs on languages that are not in A-normal form, that is, languages with arbitrary nesting of expressions. This nesting of expressions can be handled in one of two ways in the operational semantics: either with extra so-called congruence reduction rules or by using evaluation contexts. The congruence rule approach adds many more cases to the "$\longmapsto$ is safe" lemma (or equivalently, the progress and preservation lemmas). The evaluation contexts approach requires an extra lemma often called the unique decomposition lemma.

You might be thinking, real languages aren't in A-normal form, so one would really need to compile to A-normal form and prove that the compilation is type preserving (as we did in the Work Horse post). This is true, but I'd argue that it's nice to have a separation of concerns and take care of intra-expression control flow in one function and lemma instead of dealing with it throughout the operational semantics and proof of type safety.

### Why environments instead of substitution?

The operational semantics passed around environments that associated values with variables and used the $\mathit{lookup}$ function to find the value for a variable. The more common alternative is to perform substitution, that is, during a function call, to go through the body of the function replacing occurrences of the parameters with their arguments. I'll probably get flamed for saying this, but substitution has a lot of disadvantages compared to environments. First, the substitution function is difficult to define properly. Second, it requires that you prove that substitution is type preserving, which is rather tedious because it touches every kind of expression (or statement) in the language. In comparison, the lemma that we proved about the $\mathit{lookup}$ function didn't mention any expressions. Third, substitution is really slow. Technically, that doesn't effect the complexity of the proof, but it does affect your ability to test your theorems. As I mentioned in the Work Horse post, it's a good idea to implement an interpreter and test whether your theorems hold on lots of example programs before trying to prove your theorems. If your interpreter uses substitution, it will be very very slow. You might instead use environments, but keep substitution in your formal semantics, but then your interpreter and your formal semantics differ so that properties of one may not imply properties of the other.

### Why functions with names (for recursion) instead of fix?

The only way that I know of to implement fix, the fixpoint combinator, is with substitution, and we don't want substitution.

Also, I should highlight how we handle recursion in the operational semantics. First, the $\mathcal{V}$ function doesn't do much, it just captures the current environment. An alternative used by Milner and Tofte is to extend the environment with a binding for the function itself. This alternative makes the proof of type safety more challenging because, instead of using induction, you have to use coinduction. (See their paper Co-induction in relational semantics.) While coinduction is a beautiful concept, it's nice to be able to stick to good old induction. Now, because $\mathcal{V}$ doesn't do much, we need to make up for it in the transition relation. Notice that the transition for function call extends the environment with the function itself. I forget where I learned this trick. If you know of any references for this, please let me know!

### Conclusion

I think that's it for the important design decisions, but I may be forgetting something. If you have any questions regarding the rationale for anything else, please post a comment! So to sum up, if you want your type safety proof to be less tedious, I recommend using A-normal form and formulating your semantics as an environment-passing abstract machine. Oh, and if you want recursive functions, than extend your lambdas with a name and be lazy about extending the environment: wait until the function call to add the function itself to the environment.

## Friday, August 24, 2012

### Type Safety in Five Easy Lemmas

A language is type safe if running a program in the language cannot result in an untrapped error. A trapped error is something like an exception or a message-not-found error, that is, it's an error that is part of the defined behavior of the language. An untrapped error is something like a segmentation fault, that is, the program has run into an undefined state in which the language doesn't define what should happen. A segmentation fault is the underlying operating system catching the error, not the language itself. It is the untrapped errors that hackers take advantage of to break into computer systems. If you want to run untrusted code without getting into trouble, then it's a good idea to only run code that is in a type safe language!

Wright and Felleisen pioneered what has become the most flexible approach to proving that a language is type safe in their 1992 paper A Syntactic Approach to Type Soundness. The general idea is to define a small-step operational semantics for the language and show that if a program is well typed, then it is either done reducing or it can reduce to another well-typed program. The safety of an entire sequence of reductions can then be proved by induction. It is now common practice for language designers to prove type safety for new language designs, or at least for interesting subsets of the languages of interest. Proving type safety does not require advanced mathematics and isn't particularly challenging, except that it is often rather tedious, requiring many technical lemmas in which it is easy to make a mistake.

It turns out that the choice in formulation of the operational semantics can make a significant difference regarding how many lemmas, and how tedious, the proof of type safety becomes. In this blog post, I present an operational semantics that removes the need for many of the standard lemmas. The semantics is based on an abstract machine. The language will be the simply-typed lambda calculus in A-normal form, extended with recursive functions and a few primitive operators. Despite its small size, this language is Turing complete. This language is roughly equivalent to the target language discussed in my previous posts about Structural Induction and the ECD Machine. In the following I present the syntax, operational semantics, and type system of this language. Then I give the proof of type safety. The proof will rely on five lemmas, one lemma for each function or relation used in the operational semantics: the variable $\mathit{lookup}$ partial function, the $\delta$ partial function that defines the behavior of the primitive operators, the $\mathcal{V}$ partial function for evaluating expression, the transition relation $\longmapsto$ for the abstract machine, and the multi-transition relation $\longmapsto^{*}$. The lemma for the transition relation is the main event.

### Operational Semantics

We use the notation $\epsilon$ for the empty list. Given a list $L$, the notation $a \cdot L$ is a larger list with $a$ as the first element and the rest of the elements are the same as $L$. We use lists of key-value pairs (association lists) to represent mapping from variables to types (type environments) and variables to values (environments). The following lookup (partial) function finds the thing associated with a given key in an association list. Next we define the $\delta$ function, which gives meaning to the primitive operators. The function $\mathcal{V}$ maps expressions to values, using environment $\rho$. The values of this language are constants and closures, as defined below. The definition of $\mathcal{V}$ uses the $\mathit{lookup}$ function for variables, the $\delta$ function for primitive operations, and turns functions to closures. A stack $\kappa$ is a list of statement-environment pairs. The state $\sigma$ of the machine is simply a stack. The top of the stack contains the actively-executing statement and its environment. The relation $\longmapsto$ defines transitions between states. There are only three transition rules, for primitive operators, calling functions, and returning from functions. We define $\longmapsto^{*}$ in the usual way, as follows. The semantics of this language is given by the following $\mathit{eval}$ function. A state is final if it is of the form $(\mathbf{return}\,e,\rho) \cdot \epsilon$ and $\mathcal{V}(e,\rho) = v$ for some $v$.

### Type System

The types for the constants is given by the $\mathit{typeof}$ function. The $\Delta$ function maps a primitive operator and argument types to the return type. The following presents the type rules for expressions, definitions, and statements. Our proof of type safety will require that we define notions of well-typed values, well-typed environments, well-typed stacks, and well-typed states.

### Proof of Type Safety

The first lemma proves that when an operator is applied to values of the expected type, the result is a value whose type matches the return type of the operator.

Lemma ($\delta$ is safe)
If $\Delta(o,\overline{T}) = T$ and $\vdash v_i : T_i$ for $i \in \{1,\ldots,n\}$, then $\delta(o,\overline{v}) = v$ and $\vdash v : T$, for some $v$.

Proof. We proceed by cases on the operator $o$.

1. If the operator $o$ is $+$, then we have $T_1=T_2=\mathsf{Int}$ and $T=\mathsf{Int}$. Then because $\vdash v_i : \mathsf{Int}$, we know that $v_i = n_i$ for $i \in \{1,2\}$. Then $\delta(+,n_1,n_2) = n_1 + n_2$ and we have $\vdash (n_1 + n_2) : \mathsf{Int}$.
2. If the operator $o$ is $-$, then we have $T_1=\mathsf{Int}$ and $T=\mathsf{Int}$. Then because $\vdash v_1 : \mathsf{Int}$, we know that $v_1 = n$ for some $n$. Then $\delta(-,n) = -n$ and we have $\vdash -n : \mathsf{Int}$.
3. If the operator $o$ is $=$, then we have $T_1=T_2=\mathsf{Int}$ and $T=\mathsf{Bool}$. Then because $\vdash v_i : \mathsf{Int}$, we know that $v_i = n_i$ for $i \in \{1,2\}$. Then $\delta(=,n_1,n_2) = n_1 = n_2$ and we have $\vdash (n_1 = n_2) : \mathsf{Bool}$.
QED.

The second lemma says that if you have an environment that is well-typed with respect to the type environment, and if a variable x is associated with type T in the type environment, then looking up x in the environment produces a value that has type T.

Lemma ($\mathit{lookup}$ is safe)
If $\Gamma \vdash \rho$ and $\mathit{lookup}(x,\Gamma) = T$, then $\mathit{lookup}(x,\rho) = v$ and $\vdash v : T$ for some $v$.
Proof. We proceed by induction on $\Gamma \vdash \rho$.
1. Case $\epsilon \vdash \epsilon: \qquad (\Gamma=\epsilon, \rho = \epsilon)$
But then we have a contradition with the premise $\mathit{lookup}(x,\Gamma) = T$, so this case is vacuously true.
2. Case $\begin{array}{c}\vdash v : T' \quad \Gamma' \vdash \rho' \\ \hline (x',T') \cdot \Gamma' \vdash (x',v) \cdot \rho' \end{array}$:
Next we consider two cases, whether $x = x'$ or not.
1. Case $x = x'$: Then $\mathit{lookup}(x, \rho) = v$ and $T = T'$, so we conclude that $\vdash v : T$.
2. Case $x \neq x'$: Then $\mathit{lookup}(x,\rho) = \mathit{lookup}(x,\rho')$ and $\mathit{lookup}(x,\Gamma) = \mathit{lookup}(x,\Gamma') = T$. By the induction hypothesis, we have $\mathit{lookup}(x,\rho') = v$ and $\vdash v : T$ for some $v$, which completes this case.
QED.

The next lemma proves that a well-typed expression evaluates to a value of the expected type.

Lemma ($\mathcal{V}$ is safe)
If $\Gamma \vdash e : T$ and $\Gamma \vdash \rho$, then $\mathcal{V}(e,\rho) = v$ and $\vdash v : T$ for some $v$.
Proof. We proceed by cases on the expression $e$.
1. Case $e = c$:
From $\Gamma \vdash c : T$ we have $\mathit{typeof}(c) = T$ and therefore $\vdash c : T$. Also, we have $\mathcal{V}(c,\rho) = c$, which completes this case.
2. Case $e = x$:
From $\Gamma \vdash x : T$ we have $\mathit{lookup}(x,\Gamma) = T$. We then apply the lookup is safe lemma to obtain $\mathit{lookup}(x,\rho) = v$ and $\vdash v : T$ for some $v$. Thus, we have $\mathcal{V}(x,\rho) = v$ and this case is complete.
3. Case $e = \mathsf{fun}\,f(x{:}T_1) s$:
We have $\mathcal{V}(\mathsf{fun}\,f(x{:}T_1) s,\rho) = \langle\mathsf{fun}\,f(x{:}T_1) s, \rho \rangle$. From $\Gamma \vdash \mathsf{fun}\,f(x{:}T_1) s : T$ we have $(x,T_1) \cdot (f,T_1 \to T_2) \cdot \Gamma \vdash s : T_2$, with $T = T_1 \to T_2$. Together with $\Gamma \vdash \rho$, we conclude that $\vdash \langle\mathsf{fun}\,f(x{:}T_1) s, \rho \rangle : T$.
QED.

Now for the fourth and most important lemma. This lemma states that if a state is well typed, then either the state is a final state or the state can transition to a new state of the same type. In the literature, this lemma is often split into two lemmas called progress and preservation. In the setting of an abstract machine, it's convenient to merge these two lemmas into one lemma. Note that the conclusion of this lemmas includes two alternatives: the state is final or it can take a step. The power of this lemma is that it rules out the third alternative, that the state is not final and can't take a step. Such a situation is referred to as "stuck" and corresponds to untrapped errors.

Lemma ($\longmapsto$ is safe)
If $\vdash \sigma : T$, then either $\sigma$ is a final state or $\sigma \longmapsto \sigma'$ and $\vdash \sigma' : T$.
Proof. Because $\vdash \sigma : T$ we know that $\sigma = (s,\rho) \cdot \kappa$, $\Gamma \vdash \rho$, $\Gamma \vdash s : T_1$, and $\vdash \kappa : T_1 \Rightarrow T$. We proceed by cases on $s$ because the transition rule that will apply depends primarily on $s$.
1. Case $s = (x = o(\overline{e}); s')$:
We have $\Gamma \vdash x = o(\overline{e}) : (x,T_1)$ and $(x,T_1) \cdot \Gamma \vdash s' : T_1$. So $\Delta(o,\overline{T}') = T_1$ and $\Gamma \vdash e_i : T'_i$ for $i \in \{ 1,\ldots,n\}$. Because $\mathcal{V}$ is safe, we have $\mathcal{V}(e_i,\rho) = v_i$ and $\vdash v_i : T'_i$ for $i \in \{ 1,\ldots,n\}$. Because $\delta$ is safe, we have $\delta(o,\overline{v}) = v$ and $\vdash v : T_1$ for some $v$. Thus, the current state takes the following transition. We have $(x,T_1) \cdot \Gamma \vdash (x,v) \cdot \rho$
and therefore $\vdash (s', (x,v) \cdot \rho) \cdot \kappa : T$
2. Case $s = (x = e_1(e_2); s')$:
From $\Gamma \vdash x = e_1(e_2); s': T_1$ we have $\Gamma \vdash x = e_1(e_2) : (x,T_2)$ and $(x,T_2) \cdot \Gamma \vdash s' : T_1$. Thus, we also have $\Gamma \vdash e_1 : T_3 \to T_2$ and $\Gamma \vdash e_2 : T_3$. Because $\mathcal{V}$ is safe, there exist $v_1$ and $v_2$ such that $\mathcal{V}(e_1,\rho) = v_1$, $\mathcal{V}(e_2,\rho) = v_2$, $\vdash v_1 : T_3 \to T_2$, and $\vdash v_2 : T_3$. The only way for $\vdash v_1 : T_3 \to T_2$ to be true is for $v_1$ to be a closure. That is, $v_1 = \langle \mathsf{fun}\,f(y{:}T_3) s'',\rho' \rangle$. This closure is well typed, so we have $\Gamma' \vdash \rho'$ and $(y:T_3) \cdot (f,T_3 \to T_2) \cdot \Gamma' \vdash s'' : T_2$. We have what we need to know that the current state $\sigma = (x = e_1(e_2); s', \rho) \cdot \kappa$ transitions as follows. We can deduce $\vdash \sigma : T_2 \Rightarrow T$
and $(y,T_3) \cdot (f,T_3\to T_2) \cdot \Gamma' \vdash (y,v_2)\cdot (f,v_1) \cdot \rho'$
so we have everything necessary to conclude
3. Case $s = (\mathbf{return}\,e)$:
If the stack $\kappa$ is empty, then $\sigma$ is a final state and this case is complete. If the stack is not empty, we have $\kappa = (s',\rho') \cdot \kappa'$. Then, because $\vdash \kappa : T_1 \Rightarrow T$, we have $s' = (x = e_1(e_2); s'')$, $\Gamma' \vdash \rho'$, $(x,T_1) \cdot \Gamma' \vdash s'' : T_2$, and $\vdash \kappa' : T_2 \Rightarrow T$. Because $\Gamma \vdash \mathbf{return}\,e : T_1$, we have $\Gamma \vdash e : T_1$ and therefore $\mathcal{V}(e,\rho) = v$ and $\vdash v : T_1$ for some $v$ (because $\mathcal{V}$ is safe). So the current state $\sigma$ takes the following transition: We have $(x,T_1) \cdot \Gamma' \vdash (x,v) \cdot \rho'$, which is the last thing we needed to conclude that $\vdash (s'', (x,v) \cdot \rho') \cdot \kappa' : T$.
QED.

Lemma ($\longmapsto^{*}$ is safe)
If $\vdash \sigma : T$ and $\sigma \longmapsto^{*} \sigma'$, then $\vdash \sigma' : T$.
Proof. The proof is by induction on $\sigma \longmapsto^{*} \sigma'$.
1. Case $\sigma \longmapsto^{*} \sigma$: We already have $\vdash \sigma : T$.
2. Case $\begin{array}{c} \sigma \longmapsto \sigma_1 \quad \sigma_1 \longmapsto^{*} \sigma' \\\hline \sigma \longmapsto^{*} \sigma'\end{array}$: Because $\longmapsto$ is safe and deterministic (the three transition rules obviously do not overlap), we have $\vdash \sigma_1 : T$. Then by the induction hypothesis, we conclude that $\vdash \sigma' : T$.
QED.

Theorem (Type Safety)
If $\epsilon \vdash s : T$, then either
1. $\mathit{eval}(s) = c$ and $\vdash c : T$, or
2. $\mathit{eval}(s) = \mathtt{fun}$ and $T = T_1 \to T_2$ for some $T_1,T_2$, or
3. $\mathit{eval}(s) = \bot$ (the program does not terminate).

Proof. Suppose that the program terminates. We have $(s,\epsilon) \cdot \epsilon \longmapsto^{*} \sigma$ for some $\sigma$ and $\neg \exists \sigma'.\, \sigma \longmapsto \sigma'$. Because $\longmapsto^{*}$ is safe, we have $\vdash \sigma : T$. Then because $\longmapsto$ is safe, we know that either $\sigma$ is final or it can take a step. But we know it can't take a step, so it must be final. So $\sigma = (\mathbf{return}\,e, \rho) \cdot \epsilon$ and $\Gamma \vdash e : T$. Then because $\mathcal{V}$ is safe, we have $\mathcal{V}(e,\rho) = v$ and $\vdash v : T$. If $v = c$, then we have $\mathit{eval}(s) = c$ and $\vdash c : T$. If $v = \langle \mathsf{fun}\,f(x{:}T) s',\rho'\rangle$, then $T = T_1 \to T_2$, $\mathit{eval}(s) = \mathtt{fun}$, and $\vdash \mathtt{fun}: T_1 \to T_2$.
QED.

That's it! Type safety in just five easy lemmas.