# Working with and emulating References in Julia

## Using Ref, Array and Tuple types for reference like behavior

Like me you may have vaguely known about using references in Julia, but not bothered much about it.

However when using the Flux machine learning library it became apparent to me that I really needed to understand the difference between values and references in Julia.

Without having a firm grasp of this, I made a number of rookie mistakes.

In my case I had a simple function like this:

`f(x) = a*x + b`

Where I wanted to be able to write other functions that updated the values of the `a` and the `b`. The reason for this is that in machine learning you are typically adjusting some coefficients such as `a` and `b` until some function, `f(x)` in this case gives a desired output.

Say we begin with this setup

`a = 3b = 10f(x) = a*x + b`

At the top level we can change the variables and it changes the variables used by `f`:

`julia> a = 1julia> b = 10julia> f(1)11`

# The Problem

However if we make an update function, we are going to run into problems.

`function update!(x, y)   x -= 1   y -= 2end`

This however is not going to work. If you write `update!(a, b)` in Julia the values do in principle get passed as references. However `x -= 1` is shorthand for `x = x - 1` and what that does is compute a new value `x - 1` and the assignment operator causes the `x` to be bound to this new value.

## Binding

Keep in mind Julia works through binding. Variable names are like labels stuck on values. When the function gets called the value which has the `a` and `b` label stuck on them also get an `x` and `y` label stuck on them.

However we create two new values through arithmetic, and we move the `x` and `y` labels from the `a` and `b` objects and stick them on these new values. That is why this `update!` function does not work.

However when we set `a = 1` outside the scope of `f` it works because the `a` inside `f` is the same label as outside. Hence when we move the `a` label to the 1 value, that directly affects the `f` because uses exactly the same label in its calculations.

But only because the scope of `f` did not create a new `a` variable which would have shadowed the `a` variable defined outside of `f`. Such as in the example below.

`a = 3b = 10function f(x)   a = 4   b = 8   a*x + bend`

In this case `a` and `b` with the values 4 and 8 refers to entirely different variables from the ones defined outside with the values 3 and 10.

# Using the Ref Type

Let us look at how working with references is different. Let us look at two variables which are not references first.

`julia> a = 4julia> b = ajulia> b = 3julia> a4`

What you can see is that although we wrote `b = a` the `b = 3` statement did not also change `a`. `a` is still 4. Why is that? Because we stuck the label `b` on the value 4. However later when when we write `b = 3` we are simply moving the `b` label over to the 3 value.

Of course this would not really be any different if this worked by assignment. When variables work by assignment rather than binding then we are making little boxes called `a` and `b` holding values. In this case `b = a` actually means copy the contents of box `a` into box `b`. That is what happens at assembly level and how we think of C/C++ code. But it is worth being aware of that this is not how you should think of Julia code.

However the Julia `Ref` type gives us a way around this.

`julia> a = Ref(3)Base.RefValue{Int64}(3)julia> b = Ref(10)Base.RefValue{Int64}(10)`

However you cannot use `Ref` types directly:

`julia> a + bERROR: MethodError: no method matching +(::Base.RefValue{Int64}, ::Base.RefValue{Int64})Closest candidates are:  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:529`

This is similar to using a pointer in C/C++. The equivalent in C++ would be to write something like this:

`int *a = new int(3);int *b = new int(10);`

And like with Julia a pointer cannot be used directly in C++ either. Instead to get the value contained within, or pointed to by `a`, you have to write `*a`. This is called dereferencing. So in C++ adding `a` and `b` is done with:

`int answer = *a + *b;`

In Julia you dereference a variable with `[]`. So accessing the value of `a` is written as `a[]`. Maybe this seems like an odd choice for dereferencing operator, but it makes some logical sense. In C++ you could also get the value pointed to by `a`, writing `a`. That is you treat `a` as an array and access the first element. However if there is only one element it seems superfluous to write ``, and `[]` becomes a natural shorthand. So we can add our two values with:

`julia> a[] + b[]13`

We will see later that there is a benefit to treating references similar to arrays with single elements later.

Now we can write a working `f` and `update!` function.

`f(x) = a[] * x + b[]function update!(a, b)   a[] -= 1   b[] -= 2end`

Let us try this out in the REPL environment:

`julia> f(1)13julia> update!(a, b)julia> f(1)10`

## Is Ref builtin to the language?

Like many things integral to how you use Julia, `Ref` is not actually built into Julia. It is library defined just like `missing` and `nothing`. You could easily create a `Ref` type yourself. `Ref` is actually implemented as a struct named `RefValue`

`mutable struct RefValue{T}  x::Tend`

This makes it very different from a pointer in C/C++ which is hardwired into the language.

What makes `Ref` useful is that a number of functions has been defined to work with `Ref` values. The value of using `Ref`, `nothing` or `missing` in Julia is that they are established conventions used by everybody. As should be expected as they are defined in the Julia standard library.

# Using Arrays

Turns out there are many ways of getting reference like behavior in Julia. Both a single element tuple and a single element array gives similar behavior to using a `Ref` type.

`a = b = f(x) = a[] * x + b[]function update!(a, b)   a[] -= 1   b[] -= 2end`

In fact using single element arrays both the `f` and `update!` function can be implemented exactly the same was as before.

However there is a better way of implementing these functions which will mostly work for both types. We got to take a little detour first.

In Julia there is a function called `broadcast` which is a generalization of what `map` does. `map` exists in most modern programming languages. It applies a function to every element of some iterable collection and return a collection with the results.

`julia> map(sqrt, [4, 9, 16])3-element Array{Float64,1}: 2.0 3.0 4.0julia> broadcast(sqrt, [4, 9, 16])3-element Array{Float64,1}: 2.0 3.0 4.0`

These behavior is quite similar at first glance, however `broadcast` has more flexibility in that it can handle combinations of scalars (single values) and vectors (multiple values). But first let us look at what `map` can do and where it falls short.

`julia> map(+, 2, 3)5julia> map(+, , )1-element Array{Int64,1}: 5julia> map(+, [2, 1], [3, 2])2-element Array{Int64,1}: 5 3julia> map(sqrt, 9)3.0`

So you can see `map` can deal sensibly with both scalars and vectors. The problem starts when you mix them.

`julia> map(+, [2, 1], 3)1-element Array{Int64,1}: 5`

This is not really what we would want or expected. In this case `broadcast` does what you would expect.

`julia> broadcast(+, [2, 1], 3)2-element Array{Int64,1}: 5 4`

In fact `broadcast` is so useful that Julia offers a shorthand for calling it, using dot (`.`) as a prefix to an operation you want to broadcast. So we can rewrite the previous broadcast as:

`julia> [2, 1] .+  32-element Array{Int64,1}: 5 4`

You can use this with any function. So if I want to apply square root to every element of an array I can use a dot as a suffix.

`julia> sqrt.([4, 9, 16])3-element Array{Float64,1}: 2.0 3.0 4.0`

Why all this talk about broadcast? Because it gives us a way of dealing with arrays and refs in a unified way.

`f(x) = a*x .+ b`

With this change we can call `f` with different types of arguments transparently.

`julia> f(1)1-element Array{Int64,1}: 12julia> f(2)1-element Array{Int64,1}: 14julia> f(3)1-element Array{Int64,1}: 16julia> f([1 2 3])1×3 Array{Int64,2}: 12  14  16`

This can also be used to represent functions of the form:

`f(x₁, x₂, x₃, ..., xₙ) = a₁x₁ + x₂a₂ + x₃a₃ + ... + xₙaₙ + b`

In this case you see that the number of `x` values need to match the number of `a` values. Here is an example of doing that.

First we set `a₁ = 1`, `a₂ = 10` and `a₃ = 100`, in matrix format.

`julia> a = [1, 10, 100]3-element Array{Int64,1}:   1  10 100`

Just to make it easier to read the results, we set `b` to zero.

`julia> b = 1-element Array{Int64,1}: 0`

And this is how we simulate calling `f` with 3 arguments.

`x = [1, 2, 3]3-element Array{Int64,1}: 1 2 3julia> f(x)1-element Array{Int64,1}: 321`

So we called `f` with multiple argument and got one result. This is where matrix multiplication becomes useful. As you noticed in the function definition we use `*` instead of `.*` because we are doing matrix multiplication and not element-wise multiplication.

So this is how we can call `f` twice with 3 arguments in one go.

`julia> x = [1 1; 2 1; 3 1]3×2 Array{Int64,2}: 1  1 2  1 3  1julia> f(x)1×2 Array{Int64,2}:321  111`

So you can think of the number of columns in the input as the number of times we call `f`. While the number of rows are the number of arguments provided to `f` on each call.

What is the point of being able handle inputs as large matrices?

To speed up calculations it helps to be able to organize data as large matrices with simple operations on. CPUs have special instructions to do that kind of calculations and GPUs on graphics cards are even better at it.

The interesting thing about broadcast is that it also works for the `Ref` type. So while this doesn't work:

`julia> Ref(3) + Ref(4)ERROR: MethodError: no method matching +(::Base.RefValue{Int64}, ::Base.RefValue{Int64})Closest candidates are:  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:529`

`julia> Ref(3) .+ Ref(4)7`

However if we want `f` to work with references we cannot use matrix multiplication as that makes no sense for a reference.

`a = Ref(2)b = Ref(1) f(x) = a .* x .+ b`

In a machine learning context the most practical way to handle data which you need a reference to, is to treat the data like arrays. If it is just a single value, then an array with a single element will do.

It allows us to write the `update!` function like this.

`function update!(a, b)   a .-= 1   b .-= 2end`

It will work regardless of how many elements there are in `a` and `b`.

One thing worth being aware of is the difference between using the broadcast dot and not using it. Often they seem to give similar results.

`julia> [2, 1] + [3, 2]2-element Array{Int64,1}: 5 3julia> [2, 1] .+ [3, 2]2-element Array{Int64,1}: 5 3`

But there are some important differences, which matters when implementing `update!`. In this example we are not using broadcast.

`julia> a = 1-element Array{Int64,1}: 3julia> b = a1-element Array{Int64,1}: 3julia> b += 1-element Array{Int64,1}: 13julia> a1-element Array{Int64,1}: 3`

Notice that while `b` originally pointed to the same array as `a` the obviously don't do that at the end. That is because the normal assignment operator `=` and derivatives such as `+=` and `*=` causes a variable name to be bound to a new value. Hence `b` gets bound to result of `b + `. Thus `b` is no longer bound to the same object that `a` is bound to.

What we actually want is a replacement of the contents of the object both `a` and `b` is bound to. That is what the broadcast version of our assignment operator gives us.

`julia> a = 1-element Array{Int64,1}: 3julia> b = a1-element Array{Int64,1}: 3julia> b .+= 1-element Array{Int64,1}: 13julia> a1-element Array{Int64,1}: 13`

Thus one way to think of broadcast is that it is a way manipulating the contents of a box rather than the box itself.

# Keeping track of values associated with references

Imagine you have a value associated with your `a` and `b`. If these where value types it would be hard to keep track of associated values.

`a = 3b = 4dict = Dict(a => "foo", b => "bar")`

Consider this failed attempt at storing values `"foo"` and `"bar"` associated with the variables `a` and `b`. We want to lookup the value for `a` and do:

`julia> dict[a]"foo"`

So far, so good, but this approach hits the wall as soon as we change the value of `a`.

`julia> a = 66julia> dict[a]ERROR: KeyError: key 6 not found`

However if we use reference style values and an `IdArray` we avoid this problem.

`julia> a = ;julia> b = ;julia> c = a;julia> dict = IdDict(a => "foo", b => "bar")IdDict{Array{Int64,1},String} with 2 entries:   => "bar"   => "foo"julia> a[] = 88 julia> dict[a]"foo"`

Despite changing the contents of `a` we still get correct lookup. That is because `IdArray` is based on object equally which we compare with `===`.

`julia> a === bfalsejulia> a === ctrue`

In machine learning library like Flux `IdDict` is used to keep track of gradients calculated for different parameters of a function. Say I have the parameters `a` and `b` used in function and I have returned the gradient of this function in `gs`. Then I can get the derivative with respect to `a` with `gs[a]` and the one for `b` with `gs[b]`.

This makes it easy too lookup related values compared to if we say had to remember some numerical order.

--

--

Geek dad, living in Oslo, Norway with passion for UX, Julia programming, science, teaching, reading and writing.