Quantcast
Channel: mkoconnor – XOR’s Hammer
Viewing all articles
Browse latest Browse all 50

A language which does term inference

$
0
0

Many strongly typed languages like OCaml do type inference. That is, even though they’re strongly typed, you don’t have to explicitly say what the type of everything is since a lot of the time the compiler can figure it out by itself. For example, if you define a function which takes an x and adds it to 3, the compiler will figure out that x is an int. (It couldn’t be a float, since it was added to 3 and not 3.0.)

But it often seems like the compiler should be able to infer not just the types of expressions, but the expressions themselves! For example, if the compiler infers that the type of some function f is (int -> int) -> (int list) -> int list (i.e., f is a higher-order function which takes a function from int to int, a list of ints, and produces a list of ints), then f is very probably the map function, defined informally by

map g [x_1;...;x_n] = [g x_1;...;g x_n].

Therefore, if the compiler determines that some expression has that type, and the user has somehow omitted the actual function definition, why not allow the compiler to infer what the expression is?

I made a stab at implementing this type of idea in a toy language I call TermInf (apologies for the weird hosting: I don’t have another hosting service at the moment). It’s a modification of the toy language Poly from Andrej Bauer’s Programming Language Zoo. You’ll need OCaml to compile it. Please feel free to alert me to any bugs or to tell me that my code is horrible.

More details below.

The basic idea is really simple: For any expression e, the expression {e} is also an expression. The compiler will infer the type t1 of e and the type t2 that {e} has to be. It will search for a sequence of coercions taking t1 to t2 and if there is a unique one, it will replace {e} with that sequence of coercions applied to e.

Which functions are coercions is determined by the user; functions can be declared to be coercions or removed from the list of coercions at any point.

I can think of at least three ways this would be useful.

1. Automatically coercing from one base type to another

This is actually the least interesting of the three, but it serves to illustrate how TermInf works.

You can use $show_coercions to show all the current coercions. The identity
function id is always a coercion.

TermInf. Press Ctrl-D to exit.
TermInf> $show_coercions
id

Let’s define a new coercion from bool to int.

TermInf> let_coercion bool_to_int = fun x -> if x then 1 else 0
val bool_to_int : bool -> int
TermInf> $show_coercions
bool_to_int
id

Now we can use the coercion.

TermInf> {true} + 7
- : int = 8

In that instance, the interpreter could determine that the type of {true} had to be int, since it was added to 7. In the following instance, the interpreter can’t determine type of {true}.

TermInf> {true}
Problem with term inference.

But we can always explicitly give a type to any expression, so we can use that to tell the type-inferer what the type of {true} is.

TermInf> {true} : bool
- : bool = true
TermInf> {true} : int
- : int = 1

2. Lifting functions
We can view the function List.map as a coercion, taking a function 'a -> 'b to a function 'a list -> 'b list.

TermInf> let_coercion map = rec map is fun f -> fun l -> match l with [] -> [] | x::ll -> (f x)::(map f ll)
val map : ('a -> 'b) -> 'a list -> 'b list
TermInf> $show_coercions
map
bool_to_int
id

Now we can try it out.

TermInf> let square = fun x -> x * x
val square : int -> int
TermInf> ({square} 3) : int
- : int = 9
TermInf> ({square} [1;2;3]) : int list
- : int list = 1 :: 4 :: 9 :: []
TermInf> ({square} [[1;2];[5;6;7]]) : int list list
- : (int list) list = (1 :: 4 :: []) :: (25 :: 36 :: 49 :: []) :: []

Note that in our case, we had to explicitly tell the interpreter what the return type was, although presumably in practice the interpreter or compiler would usually be able to infer it.

The idea is that we can change the basic structure of the thing passed to {square}, and the term inferer will adapt. Note that in the third case, the term inferer iterated map to produce the required (int list list -> int list list) type.

We can similarly look inside the structure of pairs.

TermInf> let_coercion map_pair = fun f -> fun x -> (f (fst x), f (snd x))
val map_pair : ('a -> 'b) -> 'a * 'a -> 'b * 'b
TermInf> ({square} [(1,2);(3,4)]) : (int * int) list
- : (int * int) list = (1, 4) :: (9, 16) :: []
TermInf> ({square} ([1;2],[3;4])) : (int list) * (int list)
- : int list * int list = (1 :: 4 :: [], 9 :: 16 :: [])

Essentially all variants of map can be added. For example, the function mapi : ((int * 'a) -> 'b) -> 'a list -> 'b list where the function takes the index of the list element can be added. Then the term-inferer will determine which version of map (or sequence of versions of map) is needed based on the function given to it.

3. Term inference in conjunction with phantom types.
I put just enough type aliasing in TermInf to allow you to use phantom types. (For a great introduction to phantom types, see this blog post).

Here’s an example of how type aliasing works in TermInf:

TermInf> type hidden = int
TermInf> let f = (fun x -> x + 7) : hidden -> hidden
val f : hidden -> hidden
TermInf> let x = 3 : hidden
val x : hidden
TermInf> f x
- : hidden = 10
TermInf> f 3
The types hidden and int are incompatible

Something we might like to do with phantom types is have the type system do a static dimensional analysis on our program. Here’s an attempt to do that:

TermInf> type meters
TermInf> type gallons
TermInf> type 'a units = int
TermInf> let add = (fun x -> fun y -> x + y) : 'a units -> 'a units -> 'a units
val add : 'a units -> 'a units -> 'a units
TermInf> let times = (fun x -> fun y -> x * y) : 'a units -> 'b units -> ('a * 'b) units
val times : 'a units -> 'b units -> ('a * 'b) units
TermInf> let one_gal = 1 : gallons units
val one_gal : gallons units
TermInf> let one_m = 1 : meters units
val one_m : meters units

Then we have the following correct behavior:

TermInf> add one_gal one_gal
- : gallons units = 2
TermInf> times one_gal one_m
- : (gallons * meters) units = 1
TermInf> add one_gal one_m
The types gallons and meters are incompatible

But the following is not correct:

TermInf> let x = times one_gal one_m
val x : (gallons * meters) units
TermInf> let y = times one_m one_gal
val y : (meters * gallons) units
TermInf> add x y
The types gallons and meters are incompatible

Of course, the problem is that the interpreter doesn’t know that units commute.
But we can fix this with coercions.

TermInf> let_id_coercion commute = id : ('a * 'b) units -> ('b * 'a) units
val commute : ('a * 'b) units -> ('b * 'a) units
TermInf> add x {y}
- : (gallons * meters) units = 2

We’ve declared commute to be an identity coercion (by using let_id_coercion instead of let_coercion) to help the interpreter when it’s deciding if a term inference is unique or not.

Note that we don’t use term inference on both x and y, because then it couldn’t determine what type to give it.

TermInf> add {x} y
- : (meters * gallons) units = 2
TermInf> add {x} {y}
Problem with term inference.

This version of commute will just commute the two units at the top level, but there are a finite number of identity coercions that you can define that will give you associativity and commutativity (and inverses, if you want). Thus, the type system will be able to perform a static dimensional analysis on your program.

Edit: I should note that I left out several details about how this actually works. For example, the interpreter doesn’t search through all sequences of coercions, since there are infinitely many (and the problem of deciding if there is a unique one between any two given types is undecidable in general). Instead it limits itself to sequences of coercions whose type is never “bigger” that the starting type or the goal type, where “bigger” is defined by a straightforward length function.


Viewing all articles
Browse latest Browse all 50

Trending Articles