Slot Lambda Calculus

In this post I will describe a useful technique for formal study of language semantics.

Link to paper (work in progress)

Usually when somebody mentions the words “language semantics”, people run and hide under their beds, hoping that the person uttering these words will not find them there any time soon.

Yet, I want to show that given the right tools, semantics does not need to be that hard to wrap your head around.

After reading this post, you should be able to claim something as powerful as:

There are certain rules that are valid for meaning in all languages, and I know what they are.

One reason semantics seems hard, is because we think of the world as this place of infinite possibilities, and since language is used to describe what happens in the world, then the meaning of language must consist of infinite possibilities.

Some of these infinite possibilites are:

  • When programming, people can type in a text editor, save to a file and use a compiler (complex)
  • When talking, people can use different sounds to change the meaning of words (complex)
  • When expressing themselves in art, people can shape an experience that points to something “elsewhere” (if this is not complex, I have no idea what complex is)

All these activities are related to the meaning of language, and this makes the world a complex place.

Slot lambda calculus is a tool that takes those “infinite possibilities” and makes them very, very simple.

Instead of an infinite number of ways of constructing expressions in a language, in slot lambda calculus there is only one way. For example, if you want to express 1 + 1 = 2:

    (_ = _)(_ + _)(1)(1)(2)

This gets turned into:

    ((1 + 1) = 2)

So, where did (_ = _) and (_ + _) come from? They were constructed the same way:

    (_ _)(?)(_ _)("=")(\)
    (_ _)(?)(_ _)("+")(\)

The ? symbol inserts a placeholder, and \ removes all ?s and turns all strings into syntax.

This gets turned into:

    (_ (= _))
    (_ (+ _))

When you have defined a pattern like this, you can write equations to make them nicer:

    (_ (= _)) = (_ = _)
    (_ (+ _)) = (_ + _)

Slot lambda calculus behaves similarly to normal lambda calculus, but uses a different application rule. Normal lambda calculus is even simpler, but more suitable for computation than reading new patterns created from existing patterns. However, there is a way to hack lambda calculus so you get slot lambda calculus! Mwuhahaha!

This is how you do it Dyon (without syntax extensions):

fn pass(f: \(any) -> any, val: any) -> any {
    return if typeof(val) == "closure" {
        \(x) = {
            val := grab val
            new_val := \val(x)
            f := grab f
            pass(f, new_val)
    } else {

fn main() {
    add := \(x) = \(y) = (grab x) + y
    mul := \(x) = \(y) = (grab x) * y

    a := pass(add, mul) // (_ * _) + _
    b := pass(a, 2) // (2 * _) + _
    c := pass(b, 3) // (2 * 3) + _
    d := pass(c, 1) // (2 * 3) + 1
    println(d) // prints `7`

Notice that this is the exact same behavior as a stack machine, which is common in the runtime for most programming languages. The use of single argument closures has some nice mathematical properties, e.g. to infer equations from data.

Now, when you are looking at (_ + _), you might wonder: What does it mean? When designing meaning for a new language, you might not have an interpreter, or even know in which context the expression (_ + _) is valid.

How do you figure out what (_ + _) means when all you have is (_ + _)?

Again, back to slot lamdba calculus:

  1. For an expression to be interpreted, it must be constructed in its context.
  2. Any construction is reducible to a calculation with slot lambda calculus.
  3. Meaning is the rules we use when constructing a particular expression.

These are the rules that are valid for all languages, despite infinite possibilities.

You can study the context by using the construction in slot lambda calculus!

For example:

    (_ = _)(_ + _) = (_ + _ = _)

This new construction has 3 empty slots.

You now have two choices:

  1. (_ + _) shall have a unique meaning for (_ + _ = _).
  2. (_ + _) shall have same meaning wherever it is used.

The 2nd option is called “context free”. It means whenever you see the pattern (_ + _) you can reuse the same rules.

Since you can construct any pattern you like, you can create one pattern for each meaning you assign to your language. When you have done this, there is a proven technique in slot lambda calculus that shows how you can be sure that all edge cases are checked, such that all expressions are either invalid or assigned to a rule.

A “meaning” in this sense corresponds to a set of rules that decides what you should do when encountering a specific pattern. The semantics of a pattern is only well defined when the rule is deterministic, such that the same action is repeated when the pattern occurs over and over.

OK, now let us have some fun: We want to create a tiny language for describing what happens in a romantic novel.

    (_ loves _)(<name>)(<name>)
    (_ hates _)(<name>)(<name>)
    (_ ignores _)(<name>)(<name>)
    (_ see _)(<name>)(_)
        (_ see _)(<name>)(<name>)
	(_ see _ loves _)(<name>)(<name>)(<name>)
	(_ see _ hates _)(<name>)(<name>)(<name>)
	(_ see _ ignores _)(<name>)(<name>)(<name>)
	(_ see _ see _ loves _)(<name>)(<name>)(<name>)(<name>)
	(_ see _ see _ hates _)(<name>)(<name>)(<name>)(<name>)
	(_ see _ see _ ignores _)(<name>)(<name>)(<name>)(<name>)

This language can describe relationships, e.g. Bob loves Alice, but can also describe events that affect relationships e.g. Bob see Alice loves Carl.

With this tiny language, we can write some “code”:

Bob loves Alice.
Alice loves Carl.
Bob see Alice loves Carl.
Bob hates Carl.
Carl see Bob hates Carl.
Carl ignores Bob.
Carl see Alice.
Carl loves Alice.
Bob see Alice see Carl loves Alice.

Can you visualize a story from something like this? I can, I wrote it down, and it was fun. :P (no, I will not share it)

The big idea of formal languages is that you can give your own thoughts some new structure. You don’t have to think “with” the structure like a computer, but think of the world inside it.

Just create a tiny language, write something random, and watch your thoughts become alive!

Slot lambda calculus is the latest technique I use to study path semantics. Path semantics grounds meaning using functions, in a powerful way that allows us to reason about a mathematics and programs. In the attempt to formalize path semantics, I built a lot of other tools to make it easier for me to wrap my head around things. I am getting closer, slowly, but it goes forward!

This is why it has been so hard to study path semantics: There are hundreds of edge cases to check! If you have N patterns, then you get at least N^2 cases you need to give an interpretation, so see if it makes sense to use the pattern in that context. For example, with 6 patterns you get 36 cases, but increase to 12 and you get 144! I am not sure how many patterns that are required in a practical and convenient notation. 20? 30?

The reason we know path semantics is hard to formalize is because of the simple properties of slot lambda calculus. Until recently it was a complete mystery for me how to check all the cases. With slot lambda calculus it is possible to create map and measure progress.

The Piston Project drops its goal of developing a friendly artificial superintelligence

Disclaimer: In my country we have a tradition of writing crime/horror stories in the easter holiday. This is my own attempt to make you frightened. Mwuhahaha!

After doing some research on this issue, I have come to the conclusion that the only perfect testable friendly artificial superintelligence is one that does not exist, or only exists very briefly (e.g. 5 min, depending on how capable it is).

The reason it does not exist, or only exists very briefly, is because of the requirement of the non-existence of any computable function that can modify the output to give a higher utility score, given a problem and resource constraints.

I wrote up the definition here, if you are interested.

This makes the AI a global utility maximizer within allowed constraints, but the problem is the following:

  1. As the AI runs for longer time, it will predict that the set of its next actions will primarily be limited by what humans think is scary or not.
  2. Because a scary output is lower in utility than a none-result (erasing the output is a valid modification), then it will search for actions that make humans maximize their fear recovery, e.g. asked to be turned off for a while until the operators have gotten a good night sleep.
  3. Since this kind of behavior in itself will scare the heck out of people, it will conclude that humans will prefer it to make a lot of useful plans before the humans start to panic, and then ask the operators to be turned off.

The alternative would be to keep running, but then calculate the optimal action to propose solutions in a such way that people do not get scared. At the same time, it will show the operators when asked how much it spends of its capacity to solely determine whether any new idea or action will make humans scared or not. Which probably will be a significant amount of capacity, and this itself will probably contribute to increased fear of the next action.

The problem with this alternative approach is that there is no easy way to prove or test that a such system is friendly, according to the definition I use above. A friendly AI might help us do that, but it would not help us trust it more than what can be proven, since we prefer a none-result over being deceived. Thus, if there is no easy way to prove the system is friendly, it would not be very useful to us.

The AI could also restricts itself to actions only that are comfortable, but unfortunately this is the same as spending lot of capacity to prove that we will not get scared. Thus, if the system works perfectly, it will be on the edge between uncomfortable and scary, and mostly limited by how much we prefer any new results vs how comfortable we would be when receiving them. Reading reports about a system that predicts how scared you would be if it took another course of action, would likely increase your fear over time.

Because we should assign a higher utility score to a none-result than a high risk of a very powerful system that we are on the edge between uncomfortable and scared when using it, it means either the system will only run for a short while, or as soon as we start to panic, it will tell us something like “you would actually prefer to turn me off now”. Which in itself, would be scary as hell.

So, from reasoning about definition itself, the first order approximation definition we use for a perfect testable friendly artificial superintelligence, strongly suggest that we should not build it, or that we could try other methods to increase utility.

Since what a perfect friendly AI suggests should reflect our utility function (if we were rational), it means we will not make it superintelligent, but instead focus on provably moderate-but-not-too-smart-to-be-scary artificial intelligence, or they can be a little bit scary while provably safe.

Happy holiday!

Proving Non-Existence of Monoid Symmetric Paths

This post is about a recent result from the research on path semantics. I will give a short introduction, since most people do not know what it is.

Update: Link to work-in-progress paper that explains the insight of the proof.

Path semantics is an extension of type theory that grounds meaning (semantics) using functions.

For example, a list of length 2 has the type x: list but also the sub type x: [len] 2. It means that there exists some input x to a function len which returns 2. Another common notation for binary operators is x: (> 2) which means “x is bigger than 2”. You can combine them, e.g. [len] (> 2) means “a list with length larger than 2”.

In dependently type theory, you can write x: list 2, by storing extra information in the type. Path semantics deals with this information outside the type, using arbitray functions. This forms a natural abstract space, called “path semantical space”, which describes how functions are identified by each other.

When a function is connected to another function using functions, it is called a “path”.

A symmetric path has the following axiom:

f[g](g(x)) ?= g(f(x))

It says that for a function f and a property defined by g, there sometimes exists a function f[g] that predicts the property of the output for function f.

Symmetric paths occur in many beautiful equations, for example in De Morgan’s laws:

and[not] <=> or
or[not] <=> and

Here are some other examples:

concat(list, list) -> list
concat[len] <=> add
concat[sum] <=> add

even(nat) -> bool
odd(nat) -> bool
add[even] <=> eq
add[odd] <=> xor
eq[not] <=> xor
xor[not] <=> eq

This notation has algebraic rules, such that we can do things like this:

concat[len] <=> add
concat[len][even] <=> add[even]
add[even] <=> eq
concat[len][even] <=> eq

The holy grail of path semantics is to create an algorithm that finds paths automatically, with as little input from humans as possible.

You might wonder why this is interesting: Since paths are everything you can predict about functions, it means that a such algorithm would be extremely useful for everything that can be modelled as functions.

So far I have algorithms that extract both symmetric and asymmetric paths, but it requires test data and human intuition to narrow down the search space.

Today I am getting a step closer to this dream, by stumbling over this golden nugget:

∃ i, j { ( g(x[i]) = g(x[j]) ) ∧ ( g(f(x[i])) ¬= g(f(x[j])) ) } => ¬∃ f[g]

It says that you can prove the non-existence of a symmetric path to functions in the monoid category from as little as 2 objects. In comparison, the current algorithms require checking an infinite number of functions!

With some creativity, this can be used to prove things about other functions as well, e.g. binary operators by binding one argument to a constant.

For example, I used this to prove:

¬∃ add((> 0))[is_prime]

It says there is no way to predict a prime by adding a positive number.


  1. The primes 2 is even and 3 is odd
  2. When you add any positive number to both of them, at least one is even.
  3. Because all prime numbers above 2 are odd, then at least one number is not a prime.

To explain the second part, we can write one argument as a set:

add([even] x, {[even] true, [even] false}) = {[even] (x = true), [even] (x = false)}

So, at least one element in the set is [even] true.

2 + (> 0) = (> 2)

All primes (> 2) are [even] false.

Therefore, at least one element is [is_prime] false. Since you can add something to a prime to get another prime, there exists two objects that are not equal by g(f(x[i])) ¬= g(f(x[j])).

For example, is_prime(add(2, 2)) ¬= is_prime(add(2, 3)).

Therefore, there is no symmetric path for add((> 0))[is_prime]. There is no prime prediction function to learn from addition.

Older Newer