The Easiest Way to Build a Type Checker

(jimmyhmiller.com)

70 points | by surprisetalk 3 days ago

7 comments

  • thomasikzelf 4 hours ago
    I recently implemented a Hindley Milner type checker. The algorithm itself is not necessarily super difficult (once you get your head around it ofcourse) but the way it is explained is. It seems like HM is mostly explained by people with a mathematics background that already know the proper notation. I wonder how much overlap there is between people that know the notation and do not know how HM works, probably not much.

    Anyway nice demo. Bi-directional is already quite powerful!

  • azhenley 6 hours ago
    I recently made a toy type checker for Python and had a lot of fun.

    https://austinhenley.com/blog/babytypechecker.html

  • bbminner 3 hours ago
    I have not looked into the HM algorithm much, but is there (an educational or performance wise) advantage over implementing a (dumb) SAT solver and expressing a problem as a SAT problem? It always seemed like the "natural representation" for this kind problem to me. Does knowing that these are types _specifically_ help you somehow / give you some unique insights that won't hold in other similar SAT problems?
    • recursivecaveat 2 hours ago
      Keep in mind one of the most important attributes of a good compiler is clearly explaining to the user what caused compilation failure and why. If you try to solve in a very abstract and general space it could be challenging to give an actionable error message.
      • Quekid5 2 hours ago
        Yup, that's basically it. "SAT says no" isn't a very useful error message.
    • remexre 3 hours ago
      how would you encode a program like

          function f<T>(x: T) { return x; }
          function g(x: number) { return { a: f(x), b: f(x.toString()) }; }
      
      in sat?

      if that's easy, how about length and f in:

          function append<T>(xs: list<T>, ys: list<T>) {
            return match xs {
              Nil() -> ys,
              Cons(hd, tl) -> Cons(hd, append(tl, ys)),
            };
          }
          function flatten<T>(xs: list<list<T>>) {
            return match xs {
              Nil() -> Nil(),
              Cons(hd, tl) -> append(hd, flatten(xs)),
            };
          }
          function map<T, U>(f: (T) => U, xs: list<T>) {
            return match xs {
              Nil() -> Nil(),
              Cons(hd, tl) -> Cons(f(hd), tl),
            };
          }
          function sum(xs: list<number>) {
            return match xs {
              Nil() -> 0,
              Cons(hd, tl) -> hd + length(tl),
            };
          }
          function length<T>(xs: list<T>) { return sum(map((_) -> 1, xs)); }
          function f<T>(xs: list<list<T>>) {
            return length(flatten(xs)) === sum(map(length, xs));
          }
      
      hm-style inference handles polymorphism and type application without a complicated sat encoding
    • octachron 2 hours ago
      For a bounded size of types of sub-expressions, HM inference is quasi-linear in the size of the program, because the constraints appearing in the HM algorithm are only equality between meta-variables.A NP-complete SAT solver is not really a good fit for this kind of simple constraints. Even more so when typechecking often represents a significant part of compilation time.

      (Of course the tricky part of the definition above is that the size of types can theoretically be exponential in the size of a program, but that doesn't happen for programs with human-understandable types)

  • mrkeen 7 hours ago
    I grabbed the code from the article and annotated it with the different cases from the famous picture*

      switch (expr.kind) {
        case "number"/"string"/"var":
          ... [Var]
        case "call":
          ... [App]
        case "function":
          throw new Error("...[Abs]")
        case "let":
          ... [Let]
    
    Looks like most of the hard work's done, and probably wouldn't be too tricky to get [Abs] in there too!

    * https://wikimedia.org/api/rest_v1/media/math/render/svg/10d6...

  • nkrisc 6 hours ago
    In the small example type checker given, would a function of type A -> B -> C be represented something like this?

        { kind: "function", arg: A, returnType: { kind: "function", arg: B, returnType: C}}
    
    Or is that case simply not covered by this bare-bones example? I can't parse the code well enough just in my mind to tell if that would work, but I think it would work?

    EDIT:

    I just noticed the working demo at the bottom that includes an example of a multi-argument function so that answers whether it works.

  • IshKebab 3 hours ago
    Nice! I think it's pretty widely agreed that requiring type annotations at the function level is a good thing anyway. Apparently it's considered good practice in Haskell even though Haskell doesn't require it.

    I've also worked with OCaml code that didn't do it and you lose a lot of the advantages of static typing. Definitely worse.

    Rust got it right.

    • dragonwriter 1 hour ago
      Type annotations on functions in Haskell (or similar languages) are useful for:

      1. leveraging the type checker to verify (aspects of) the correctness of your function, and

      2. communicating intent to humans

      I've found in my own explorations with Haskell that its useful to write with functions with them, then verify that they work, and then remove them to see what the inferred would be (since it already compiled with the annotation, the inferred type will either be identical to or more general than the previously declared type), and generally (because it is good practice to have a declared type), replace the old declared type with the inferred type (though sometimes at this point also changing the name.)

    • toolslive 3 hours ago
      what if your IDE can show the type of any expression as a tooltip ? Would you still think the same?
      • ufo 2 hours ago
        In Haskell, type error messages are always like "types A and B should be equal, but they are not". The problem is that, without type annotations, the compiler cannot know if it is A or B that is wrong, which can result in confusing error messages.

        For example, suppose that you have a bug in the body of a function, but did not provide a type annotation for it. The function might still compile but not with the type you want. The compiler will only notice something is amiss when you try to call the function and it turns out that the function's inferred type doesn't fit the call site.

        Basically, global type inference in the absence of type annotations means that changes in one part of the file can affect inferred types very far away. In practice it's best to use type annotations to limit inference to small sections, so that type errors are reported close to what caused them.

      • IshKebab 14 minutes ago
        Yes absolutely. OCaml's VSCode extension is very good at that so that's the only way I've experienced it.

        The problems are:

        1. Type errors become much less localised and much harder to understand. Instead of "the function you're editing is supposed to return a string but it doesn't" you get "this other function three stack frames away that indirectly calls the function you're editing can't pass some value into println".

        2. The inferred types are as generic as possible, which also means they are as hard to understand as possible.

      • Quekid5 2 hours ago
        I can't speak for the parent poster, but for global function declarations, yes, absolutely.

        It's infuriating when a type error can "jump" across global functions just because you weren't clear about what types those functions should have had, even if those types are very abstract. So early adopters learned to sprinkle in type annotations at certain points until they discovered that the top-level was a good place. In OCaml this pain is somewhat lessened when you use module interface files, but without that... it's pain.

    • Quekid5 2 hours ago
      > I think it's pretty widely agreed that requiring type annotations at the function level is a good thing anyway. Apparently it's considered good practice in Haskell even though Haskell doesn't require it.

      In Haskell-land: At the global scope, yes, that's considered good practice, especially if the function is exported from a module. When you just want a local helper function for some tail-recursive fun it's a bit of extra ceremony for little benefit.

      (... but for Rust specifically local functions are not really a big thing, so... In Scala it can be a bit annoying, but the ol' subtyping inference undecidability thing rears its ugly head there, so there's that...)

      • ufo 1 hour ago
        Languages with local type inference can sometimes omit type annotations from lambdas, if that lambda is being returned or passed as an argument to another function. In those situations we know what the expected type of the argument should be and can omit it.
        • Quekid5 1 hour ago
          Yeah, that's true and that's a good convenience even if it's not full inference. In the case of Scala, the parameter types may often be required, but at least the return type can be omitted, so there's that.
  • tayo42 7 hours ago
    I thought the implementation here was how hindley Milner worked? I guess not?
    • tekknolagi 5 hours ago
      No, HM is unification based and requires no annotations at all.
      • skybrian 5 hours ago
        Apparently it only gets away without annotations if the language doesn’t support subtyping? Here’s an explanation about why bidirectional type checking is better for that:

        https://www.haskellforall.com/2022/06/the-appeal-of-bidirect...

        It seems to me that type-checking that relies on global constraint-solving is usually a bad idea. Annotated function types result in less confusion about what a function does.

        • ufo 1 hour ago
          Indeed. Unification-based type inference doesn't work great when the type constraints are inequalities.