OR-types

  • last night I couldn’t fall asleep because I was thinking how sites like Anilist implement their tag-based search systems.

    2024-04-07
    • my mind was racing, and with enough thinking about scores and weights and sums and products… I arrived at the topic of sum types and product types.

      2024-04-07
    • I was thinking, “this is actually really interesting - why do we call them sum types and product types?” - I had some intuitive understanding as to where these names come from, but I wanted to write down my thoughts for future me, and future generations.

      2024-04-07
    • so I set down an alarm for today, and went to sleep.

      2024-04-07
  • recall that I was faced with the question of “why do we call product types product types, and why do we call sum types sum types?” my intuitive understanding was this:

    2024-04-07
    • so you know how the Boolean operations AND and OR have these truth tables that show you all the outcomes:

      A B AND
      0 0 0
      0 1 0
      1 0 0
      1 1 1
      2024-04-07
    • what if we did that, but with multiplication and addition?

      A B *
      0 0 0
      0 1 0
      1 0 0
      1 1 1
      2024-04-07
      • wait a minute. this looks quite familiar thinking

        2024-04-07
    • if you stick to arguments consisting of just zero and one, you’ll end up with “truth” tables that look very similar to those of AND and OR - with multiplication (product) resembling AND, and addition (sum) resembling OR, with the only difference being that 1 + 1 = 2, unlike 1 OR 1 which is equal to 1. but they’re quite similar!

      2024-04-07
  • now you might ask “what does this have to do with types mr. riki?”

    2024-04-07
    • I’ve always thought of product types as being very similar to a Boolean AND of two types, and sum types as being very similar to a Boolean OR of two types.

      2024-04-07
      • after all, having a struct/record { a: T, b: U }, or - how I’ll represent it in this post - a product T * U, is very similar to saying “I have both a T AND a U value here.”

        2024-04-07
        • we can create a subtype relation based on this Boolean algebra - we’ll say that V is a subtype of T * U if has(T) AND has(U) is true, where has(T) is true if the type V is a subtype of T

          2024-04-07
      • and having a type union T | U seems very similar to saying “I have either a T OR a U value here” - has(T) OR has(U)

        2024-04-07
    • but look again at the truth table for OR:

      A B OR
      0 0 0
      0 1 1
      1 0 1
      1 1 1
      2024-04-07
      • based on this table, we can infer that:

        2024-04-07
        • T is a subtype of T | U, because T is present (1) and U is not (0), therefore has(T) OR has(U) is true;

          2024-04-07
        • U is a subtype of T | U, because T is not present (0) and U is (1), therefore has(T) OR has(U) is true.

          2024-04-07
      • however, this also means T * U is a subtype of T | U, because in T * U, both T is present (1), and U is present (1), and therefore has(T) OR has(U) is true.

        2024-04-07
    • therefore we cannot use this definition for our familiar and beloved sum types T | U! we’ve invented something new here. I’ll call it OR-types, and represent it using T ++ U.

      2024-04-07
    • having OR-types is really frickin’ weird, because it yields a type system which would allow you to do this (in pseudo-Rust):

      fn get_string(or: i32 ++ &str) -> Option<&str> {
          match or {
              x: &str => Some(x),
              _ => None,
          }
      }
      
      2024-04-07
      • the existence of this OR-type would make match really weird, as now it could enter multiple match arms:

        fn print_variants(or: i32 ++ &str) {
            match or {
                x: i32 => println!("{x}"),
                x: &str => println!("{x}"),
            }
        }
        
        fn main() {
            let t = 123 ++ "many hugs";
            print_variants(t);
            // prints:
            // 123
            // many hugs
        }
        
        2024-04-07
    • outside type systems which have bit sets, which are like a more limited version of this, I’ve never seen something like this implemented in practice thinking

      2024-04-07
      • probably because the following is so much simpler and more familiar (in real Rust):

        pub struct IntOrStr<'a> {
            pub int: Option<i32>,
            pub str: Option<&'a str>,
        }
        
        2024-04-07
      • however, with Rust’s somewhat annoying lack of built-in support for bit sets, I wonder just how different programming would be in a language that supports this.

        2024-04-07
    • back from dreamland - in terms of Boolean algebra, sum types would instead be represented by has(T) XOR has(U), which has this truth table:

      A B XOR
      0 0 0
      0 1 1
      1 0 1
      1 1 0
      2024-04-07
      • and based on this table, we can infer that:

        2024-04-07
        • T is a subtype of T | U, because T is present (1) and U is not (0), therefore has(T) XOR has(U) is true;

          2024-04-07
        • U is a subtype of T | U, because T is not present (0) and U is (1), therefore has(T) XOR has(U) is true.

          2024-04-07
      • but unlike our previous hypothetical T + U, T * U is not a subtype of T | U, because if T is present (1) and U is also present (1), has(T) XOR has(U) will be false - which matches more typical sum types.

        2024-04-07
  • I had to write this down like 5 times until I managed to figure out the exact details, because it resembles set-theoretic types a lot, yet it is not the same.

    2024-04-07
    • in this type system, every type T acts like a function - you pass it another type U and it returns whether U is a subtype of T

      2024-04-07
    • this confused me a lot because, from my limited understanding of them, set-theoretic types represents types as sets of possible values - and mathematically any set S can be represented by a function S(x), which is true if x is present in the set

      2024-04-07
    • but again, it is not the same, because the type system I described here is defined based on subtype relations rather than sets of values.

      2024-04-07
    • in the world of set-theoretic types, we get sum types for free, because the type T | U represents the union of the set of all possible T values, and the set of all possible U values.

      2024-04-07
    • as far as I understand, product types have to be represented in a different way - they’d be represented by values themselves - one rank lower than types - because if we took the intersection of all values of T and all values of U, we could end up with an empty set.

      2024-04-07
  • as closing words, I’d just like to say I haven’t looked for any soundness holes in this theoretical type system with OR-types. if you find any, feel free to get in touch - I’ll add a note to this post summarizing what’s wrong with my reasoning.

    2024-04-07