Dyon 0.5 - Mutablity check, short For loop and a bit of Deep Learning

Dyon is a rusty dynamically typed scripting language that uses a lifetime checker instead of garbage collection. It has an object model similar to Javascript, but with option and result types instead of null. The language uses dynamic modules for organizing code, which makes it easy to use for interactive coding.

Previous blog posts:

Dynamo (Dynamo was the old name, the language is now renamed to Dyon)
Scripting without garbage collector
Error handling in Dyon
Dyon 0.4 - Interactive coding

The 0.5 version adds a mutability check, a short For loop, and includes many bug fixes.

In this blog post, I also give a simple pre-training example for deep learning using Dyon.

Why Dyon?

Dyon was started by me (bvssvni, Sven Nilsen) while waiting for the new Gfx version, and it was so fun to work on it that I have not managed to stop. It is a result of continious experimenting and trying out new stuff.

For testing the language in “real” situations, I work on small toy projects that explore an idea. When I find something missing from the language, I write it down and start thinking about it.

Mutability check

Dyon now uses mut to declare mutability. This appears in front of a function argument declaration or when calling a function that mutates the argument.

foo(mut a) mutates the argument a
foo(mut a) has function name foo(mut)
foo(mut a, b) has function name foo(mut,_)
foo(a, b) has function name foo

Local variables are mutable.

Mutability is not part of a type, but added to the function name inside parentheses, e.g.(mut,_). When adding an external function, the mutability information must be added as part of the function name.

Mutability propagates for arguments:

fn foo(mut a) { ... }

// `bar` requires `mut a` because it calls `foo(mut a)`.
fn bar(mut a) { foo(mut a) }

This is designed for:

Explicitly declare when a function mutates a variable
Improve readability and maintenance
Allow a function name to be reused for different mutability patterns

Example:

fn foo(mut a, b) {
    a[0] = clone(b)
}

fn bar(a, b) {
    foo(mut a, b)
}

fn main() {
    a := [4]
    b := 5
    bar(a, b)
}

The mutability check discovers that mut a inside bar has not a corresponding mut a on the argument to bar.

 --- ERROR --- 
In `source/test.rs`:
Requires `mut a`
7,13:     foo(mut a, b)
7,13:             ^

To fix it:

fn bar(mut a, b) {
    foo(mut a, b)
}

You can declare multiple functions with same name using different mutability patterns.

Example:

fn reset_x(pos) -> {
    return [0, clone(pos[1])]
}

fn reset_x(mut pos) {
    pos[0] = 0
}

fn main() {
    pos := [1, 2]
    println(pos)
    println(reset_x(pos))
    reset_x(mut pos)
    println(pos)
}

Short For loop

Dyon now supports a short For loop for counters starting at 0 and incremented until it is greater or equal to a pre-evalutated expression:

for i len(list) { println(list[i]) }

This For loop is approximately 2.9x faster when running on AST than the equivalent traditional For loop:

n := len(list)
for i := 0; i < n; i += 1 { println(list[i]) }

This is designed for:

Reduce typing
Reduce bugs
Improve readability of code

Example: Pre-training for deep learning

The whole code can be found here.

Disclaimer: My education is in computer engineering with speciality in artificial intelligence, but that was a decade ago, so it is a bit rusty. Most of it will probably be inaccurate, but it might serve as an example of the looks and feels of Dyon. :-)

Deep learning revolutionized AI because it was figured out how to pre-train a network on a lot of unlabeled data to produce better results when training on labeled data. The trick is to train the network to produce similar output to the input, so it learns high level representations that explains the data, instead of the labels.

However, the methods and mathematics might be a bit overwhelming for somebody (like me), so I will make it simpler. Instead of doing all the sophisticated tricks that AI researches use, we will just use the following trick for training, which works for allmost any complex system, and is one of the most useful and practical of all mathematical tricks:

Change something with a small number and see what happens!

This is no joke!

Imagine a great panel with many knobs in a complex system, for example an air plane. You turn a knob a little bit, see what happens, and then turns it back if nothing happens. Pick a knob at random, and repeat the process until the air planes go where you want.

In technical terms:

A small change in a number in a deterministic system gives an output when subtracted from previous states corresponds to the automatic partial differential of the whole system, which can then be used to get closer to local minimum with respective to some error function.

So, instead of using back propagation, I just look at some random place in the network, and see if a small change leads to better results! It is not the most efficient algorithm, but since it is so simple, it helps to understand how pre-training works independently of which algorithm you use.

Getting started

In the Terminal, type:

> cargo new --bin deep_learning

This will set up a new Rust project with a “main.rs” for the program.

Add this to your Cargo.toml:

[dependencies]
dyon = "0.5.0

In main.rs, type:

extern crate dyon;

use dyon::{error, run};

fn main() {
    error(run("source/deep_learning.rs"));
}

We will use the .rs file extension to get syntax color, even though it actually is Dyon code.

Then, create a folder “source” in at top level in the project, and add an empty “deep_learning.rs” file, where the rest of the code will go.

Basic things we need

The standard logistic function, or the sigmoid function is used to “smooth out” the output from a neuron. It maps numbers from negative infinite to positive infinity to the range [0, 1].

fn sigmoid(x) -> {
    return 1 / (1 + exp(-x))
}

The weights of the neurons are organized in layers, that maps input signals to output signals:

fn layer(inputs, outputs) -> {
    return [[0; inputs]; outputs]
}

The whole network consists of layers of different sizes, so we put use a list of layers, which we call a “tensor” to store all the weights:

fn tensor(sizes) -> {
    res := []
    for i len(sizes)-1 {
        push(mut res, layer(sizes[i], sizes[i + 1]))
    }
    return clone(res)
}

A way to get the sizes back from a tensor:

fn sizes_tensor(tensor) -> {
    res := []
    push(mut res, len(tensor[0][0]))
    for i len(tensor) {
        push(mut res, len(tensor[i]))
    }
    return clone(res)
}

Randomizing layers and tensors migth be useful:

fn randomize_layer(mut layer) {
    for i len(layer) {
        for j len(layer[i]) {
            layer[i][j] = random()
        }
    }
}

fn randomize_tensor(mut tensor) {
    for i len(tensor) {
        for j len(tensor[i]) {
            for k len(tensor[i][j]) {
                tensor[i][j][k] = random()
            }
        }
    }
}

After we have trained a few layers, we expand the network by removing the output layer, adding a new one, and then slap a fresh output layer on top. This is repeated to create the deep network.

fn expand_tensor_size(mut tensor, size) {
    n := len(tensor)
    outputs := len(tensor[n-1])
    outputs_inputs := len(tensor[n-1][0])
    pop(mut tensor)
    hidden_layer := layer(outputs_inputs, size)
    randomize(layer: mut hidden_layer)
    push(mut tensor, clone(hidden_layer))
    output_layer := layer(size, outputs)
    randomize(layer: mut output_layer)
    push(mut tensor, clone(output_layer))
}

Now you have the basic for starting experimenting.

The rest of the functions, including running data through the network and computing arrays, are listed in the full source.

Training

As promised earlier, here is the training algorithm:

fn train_tensor_input_learning_rate(mut tensor, input, learning_rate) -> {
    eps := 0.0001
    for i 10 {
        w := pick_weight(tensor)
        val := get(tensor: tensor, weight: w)
        output := run(tensor: tensor, input: input)
        error := error_len(output, input)
        set(tensor: mut tensor, weight: w, value: val + eps)
        output2 := run(tensor: tensor, input: input)
        error2 := error_len(output2, input)

        diff_error := error2 - error
        abs_error := sqrt(diff_error^2)
        if abs_error > eps * eps {
            // Normalize error.
            diff_error /= eps
            set(tensor: mut tensor, weight: w, value: val - diff_error * learning_rate)
            return clone(error)
        } else {
            // Reset change.
            set(tensor: mut tensor, weight: w, value: val)
        }
    }
    return 0
}

The training algorithm runs many times, so we add a convenience function:

fn train_data_tensor_iterations(data, mut tensor, iterations) {
    for i := 0; i < iterations; i += 1 {
        random_input := data[floor(random() * len(data))]
        error := train(tensor: mut tensor, input: random_input, learning_rate: 10)
        if (i % 100) == 0 {
            output_data := run(tensor: tensor, data: data)
            print(data: data, output_data: output_data)
            println(error)
            println(sizes(tensor: tensor))
            println("==---== " + to_string(i))
            sleep(0)
        }
    }
}

Data

The most important thing about deep learning is to have some data to train on. I just made up some simple 3x3 patterns, like this:

[1, 0, 1,
 0, 1, 0,
 1, 0, 1]

[0, 0, 0,
 0, 1, 0,
 1, 0, 1]

Since there are 3x3 input signals and output signals, this makes up 3x3 = 9 nodes in input and output layer. The input and output layer has a multiple of this to store weights into next or previous layer.

The thing we want to do, is to make the network try to create the same patterns back using fewer nodes than 9. This means there is not enough information to contain any signal.

When training and expanding the network, it gravitates toward weights that describe some features of the data, and as the network grows deeper with fewer nodes, it has to “squeeze” the information to a compressed representation that captures as much as possible about the expected output.

The network gets expanded after it has gotten sufficiently low enough error, then we train it some more, and expand it again:

fn main() {
    tensor := tensor([9, 7, 9])
    println(tensor)
    randomize(tensor: mut tensor)

    data := [plus(), anti_plus(), center(), cross(), diag1(), diag2(),
             arrow1(), arrow2(), arrow3(), arrow4()]
    train(data: data, tensor: mut tensor, iterations: 20000)
    expand(tensor: mut tensor, size: 5)
    train(data: data, tensor: mut tensor, iterations: 40000)
    expand(tensor: mut tensor, size: 3)
    train(data: data, tensor: mut tensor, iterations: 80000)
    expand(tensor: mut tensor, size: 2)
    train(data: data, tensor: mut tensor, iterations: 80000)
    expand(tensor: mut tensor, size: 3)
    train(data: data, tensor: mut tensor, iterations: 80000)
    expand(tensor: mut tensor, size: 5)
    train(data: data, tensor: mut tensor, iterations: 80000)
    expand(tensor: mut tensor, size: 7)
    train(data: data, tensor: mut tensor, iterations: 80000)
}

Running this will take a while, but you will notice that even all the information passes only two nodes at the smallest, it is able to recreate quite a bit of the original data.

Imagine that one artist Carl tried to copy the works of another artist Alice, but only could get information through speaking with Bob, which was blind. Each time Carl makes a painting, somebody else takes a photo of it and compares it with the original painting. They subtract each pixel in the photo and add the differences together, which results in a single number. This number is all Carl get to learn about his mistakes, and he have to make the best out of it.

Impressive? That is how our brains work! (kind of)

The resulting network after pre-training can be used as ingredient to other algorithms.

For example, what happens in the 2 nodes?

Exercise for the reader: Create an image showing the output for the two nodes using the x and y axis. To solve this, you need to learn how to add external functions for image processing and exporting to a PNG image, and how to compute output from the layer with 2 nodes.

Did you notice?

There was no need for lifetimes in the whole deep learning example!

About Piston in general

The Piston project is a large collaboration between many programmers to build a modular game engine in Rust. Currently, there are 100 repositories and 177 people who have contributed to various projects.

Piston uses a modular core architecture, which makes it easy to swap out window backends, graphics backends, and combine integrated libraries with external ones. There is a saying:

No Rust game project uses most of Piston, but most uses some of it!

The past months we have worked on a very complex upgrade of the 2D graphics ecosystem, which required proper dealing of sRGB color space, draw state redesign and feedback to the Gfx project. In addition we have redesigned the Piston-Window, started optimizing 2D graphics rendering, improved the Conrod UI framework, cleaned up the Glium-Graphics backend, landed several PRs to the Image and the Imageproc libraries, added 2D+3D touch events to the core, and improved dependency control in the Eco tool for reasoning about breaking changes. I plan to write a blog post about this later when the breaking changes start to settle down a bit.

You are welcome to join us!

How to contribute