Working with and emulating References in Julia
Using Ref, Array and Tuple types for reference like behavior
Like me you may have vaguely known about using references in Julia, but not bothered much about it.
However when using the Flux machine learning library it became apparent to me that I really needed to understand the difference between values and references in Julia.
Without having a firm grasp of this, I made a number of rookie mistakes.
In my case I had a simple function like this:
f(x) = a*x + b
Where I wanted to be able to write other functions that updated the values of the a
and the b
. The reason for this is that in machine learning you are typically adjusting some coefficients such as a
and b
until some function, f(x)
in this case gives a desired output.
Say we begin with this setup
a = 3
b = 10
f(x) = a*x + b
At the top level we can change the variables and it changes the variables used by f
:
julia> a = 1
julia> b = 10
julia> f(1)
11
The Problem
However if we make an update function, we are going to run into problems.
function update!(x, y)
x -= 1
y -= 2
end
This however is not going to work. If you write update!(a, b)
in Julia the values do in principle get passed as references. However x -= 1
is shorthand for x = x - 1
and what that does is compute a new value x - 1
and the assignment operator causes the x
to be bound to this new value.
Binding
Keep in mind Julia works through binding. Variable names are like labels stuck on values. When the function gets called the value which has the a
and b
label stuck on them also get an x
and y
label stuck on them.
However we create two new values through arithmetic, and we move the x
and y
labels from the a
and b
objects and stick them on these new values. That is why this update!
function does not work.
However when we set a = 1
outside the scope of f
it works because the a
inside f
is the same label as outside. Hence when we move the a
label to the 1 value, that directly affects the f
because uses exactly the same label in its calculations.
But only because the scope of f
did not create a new a
variable which would have shadowed the a
variable defined outside of f
. Such as in the example below.
a = 3
b = 10
function f(x)
a = 4
b = 8
a*x + b
end
In this case a
and b
with the values 4 and 8 refers to entirely different variables from the ones defined outside with the values 3 and 10.
Using the Ref Type
Let us look at how working with references is different. Let us look at two variables which are not references first.
julia> a = 4
julia> b = a
julia> b = 3
julia> a
4
What you can see is that although we wrote b = a
the b = 3
statement did not also change a
. a
is still 4. Why is that? Because we stuck the label b
on the value 4. However later when when we write b = 3
we are simply moving the b
label over to the 3 value.
Of course this would not really be any different if this worked by assignment. When variables work by assignment rather than binding then we are making little boxes called a
and b
holding values. In this case b = a
actually means copy the contents of box a
into box b
. That is what happens at assembly level and how we think of C/C++ code. But it is worth being aware of that this is not how you should think of Julia code.
However the Julia Ref
type gives us a way around this.
julia> a = Ref(3)
Base.RefValue{Int64}(3)
julia> b = Ref(10)
Base.RefValue{Int64}(10)
However you cannot use Ref
types directly:
julia> a + b
ERROR: MethodError: no method matching +(::Base.RefValue{Int64}, ::Base.RefValue{Int64})
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:529
This is similar to using a pointer in C/C++. The equivalent in C++ would be to write something like this:
int *a = new int(3);
int *b = new int(10);
And like with Julia a pointer cannot be used directly in C++ either. Instead to get the value contained within, or pointed to by a
, you have to write *a
. This is called dereferencing. So in C++ adding a
and b
is done with:
int answer = *a + *b;
In Julia you dereference a variable with []
. So accessing the value of a
is written as a[]
. Maybe this seems like an odd choice for dereferencing operator, but it makes some logical sense. In C++ you could also get the value pointed to by a
, writing a[0]
. That is you treat a
as an array and access the first element. However if there is only one element it seems superfluous to write [0]
, and []
becomes a natural shorthand. So we can add our two values with:
julia> a[] + b[]
13
We will see later that there is a benefit to treating references similar to arrays with single elements later.
Now we can write a working f
and update!
function.
f(x) = a[] * x + b[]
function update!(a, b)
a[] -= 1
b[] -= 2
end
Let us try this out in the REPL environment:
julia> f(1)
13
julia> update!(a, b)
julia> f(1)
10
Is Ref builtin to the language?
Like many things integral to how you use Julia, Ref
is not actually built into Julia. It is library defined just like missing
and nothing
. You could easily create a Ref
type yourself. Ref
is actually implemented as a struct named RefValue
mutable struct RefValue{T}
x::T
end
This makes it very different from a pointer in C/C++ which is hardwired into the language.
What makes Ref
useful is that a number of functions has been defined to work with Ref
values. The value of using Ref
, nothing
or missing
in Julia is that they are established conventions used by everybody. As should be expected as they are defined in the Julia standard library.
Using Arrays
Turns out there are many ways of getting reference like behavior in Julia. Both a single element tuple and a single element array gives similar behavior to using a Ref
type.
a = [3]
b = [10]
f(x) = a[] * x + b[]
function update!(a, b)
a[] -= 1
b[] -= 2
end
In fact using single element arrays both the f
and update!
function can be implemented exactly the same was as before.
However there is a better way of implementing these functions which will mostly work for both types. We got to take a little detour first.
Broadcast
In Julia there is a function called broadcast
which is a generalization of what map
does. map
exists in most modern programming languages. It applies a function to every element of some iterable collection and return a collection with the results.
julia> map(sqrt, [4, 9, 16])
3-element Array{Float64,1}:
2.0
3.0
4.0
julia> broadcast(sqrt, [4, 9, 16])
3-element Array{Float64,1}:
2.0
3.0
4.0
These behavior is quite similar at first glance, however broadcast
has more flexibility in that it can handle combinations of scalars (single values) and vectors (multiple values). But first let us look at what map
can do and where it falls short.
julia> map(+, 2, 3)
5
julia> map(+, [2], [3])
1-element Array{Int64,1}:
5
julia> map(+, [2, 1], [3, 2])
2-element Array{Int64,1}:
5
3
julia> map(sqrt, 9)
3.0
So you can see map
can deal sensibly with both scalars and vectors. The problem starts when you mix them.
julia> map(+, [2, 1], 3)
1-element Array{Int64,1}:
5
This is not really what we would want or expected. In this case broadcast
does what you would expect.
julia> broadcast(+, [2, 1], 3)
2-element Array{Int64,1}:
5
4
In fact broadcast
is so useful that Julia offers a shorthand for calling it, using dot (.
) as a prefix to an operation you want to broadcast. So we can rewrite the previous broadcast as:
julia> [2, 1] .+ 3
2-element Array{Int64,1}:
5
4
You can use this with any function. So if I want to apply square root to every element of an array I can use a dot as a suffix.
julia> sqrt.([4, 9, 16])
3-element Array{Float64,1}:
2.0
3.0
4.0
Using Broadcast with References
Why all this talk about broadcast? Because it gives us a way of dealing with arrays and refs in a unified way.
f(x) = a*x .+ b
With this change we can call f
with different types of arguments transparently.
julia> f(1)
1-element Array{Int64,1}:
12
julia> f(2)
1-element Array{Int64,1}:
14
julia> f(3)
1-element Array{Int64,1}:
16
julia> f([1 2 3])
1×3 Array{Int64,2}:
12 14 16
This can also be used to represent functions of the form:
f(x₁, x₂, x₃, ..., xₙ) = a₁x₁ + x₂a₂ + x₃a₃ + ... + xₙaₙ + b
In this case you see that the number of x
values need to match the number of a
values. Here is an example of doing that.
First we set a₁ = 1
, a₂ = 10
and a₃ = 100
, in matrix format.
julia> a = [1, 10, 100]
3-element Array{Int64,1}:
1
10
100
Just to make it easier to read the results, we set b
to zero.
julia> b = [0]
1-element Array{Int64,1}:
0
And this is how we simulate calling f
with 3 arguments.
x = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> f(x)
1-element Array{Int64,1}:
321
So we called f
with multiple argument and got one result. This is where matrix multiplication becomes useful. As you noticed in the function definition we use *
instead of .*
because we are doing matrix multiplication and not element-wise multiplication.
So this is how we can call f
twice with 3 arguments in one go.
julia> x = [1 1; 2 1; 3 1]
3×2 Array{Int64,2}:
1 1
2 1
3 1
julia> f(x)
1×2 Array{Int64,2}:
321 111
So you can think of the number of columns in the input as the number of times we call f
. While the number of rows are the number of arguments provided to f
on each call.
What is the point of being able handle inputs as large matrices?
To speed up calculations it helps to be able to organize data as large matrices with simple operations on. CPUs have special instructions to do that kind of calculations and GPUs on graphics cards are even better at it.
The interesting thing about broadcast is that it also works for the Ref
type. So while this doesn't work:
julia> Ref(3) + Ref(4)
ERROR: MethodError: no method matching +(::Base.RefValue{Int64}, ::Base.RefValue{Int64})
Closest candidates are:
+(::Any, ::Any, ::Any, ::Any...) at operators.jl:529
You can add the values held by a reference using broadcast.
julia> Ref(3) .+ Ref(4)
7
However if we want f
to work with references we cannot use matrix multiplication as that makes no sense for a reference.
a = Ref(2)
b = Ref(1)
f(x) = a .* x .+ b
When to use broadcast
In a machine learning context the most practical way to handle data which you need a reference to, is to treat the data like arrays. If it is just a single value, then an array with a single element will do.
It allows us to write the update!
function like this.
function update!(a, b)
a .-= 1
b .-= 2
end
It will work regardless of how many elements there are in a
and b
.
One thing worth being aware of is the difference between using the broadcast dot and not using it. Often they seem to give similar results.
julia> [2, 1] + [3, 2]
2-element Array{Int64,1}:
5
3
julia> [2, 1] .+ [3, 2]
2-element Array{Int64,1}:
5
3
But there are some important differences, which matters when implementing update!
. In this example we are not using broadcast.
julia> a = [3]
1-element Array{Int64,1}:
3
julia> b = a
1-element Array{Int64,1}:
3
julia> b += [10]
1-element Array{Int64,1}:
13
julia> a
1-element Array{Int64,1}:
3
Notice that while b
originally pointed to the same array as a
the obviously don't do that at the end. That is because the normal assignment operator =
and derivatives such as +=
and *=
causes a variable name to be bound to a new value. Hence b
gets bound to result of b + [10]
. Thus b
is no longer bound to the same object that a
is bound to.
What we actually want is a replacement of the contents of the object both a
and b
is bound to. That is what the broadcast version of our assignment operator gives us.
julia> a = [3]
1-element Array{Int64,1}:
3
julia> b = a
1-element Array{Int64,1}:
3
julia> b .+= [10]
1-element Array{Int64,1}:
13
julia> a
1-element Array{Int64,1}:
13
Thus one way to think of broadcast is that it is a way manipulating the contents of a box rather than the box itself.
Keeping track of values associated with references
Imagine you have a value associated with your a
and b
. If these where value types it would be hard to keep track of associated values.
a = 3
b = 4
dict = Dict(a => "foo", b => "bar")
Consider this failed attempt at storing values "foo"
and "bar"
associated with the variables a
and b
. We want to lookup the value for a
and do:
julia> dict[a]
"foo"
So far, so good, but this approach hits the wall as soon as we change the value of a
.
julia> a = 6
6
julia> dict[a]
ERROR: KeyError: key 6 not found
However if we use reference style values and an IdArray
we avoid this problem.
julia> a = [3];
julia> b = [4];
julia> c = a;
julia> dict = IdDict(a => "foo", b => "bar")
IdDict{Array{Int64,1},String} with 2 entries:
[4] => "bar"
[3] => "foo"
julia> a[] = 8
8
julia> dict[a]
"foo"
Despite changing the contents of a
we still get correct lookup. That is because IdArray
is based on object equally which we compare with ===
.
julia> a === b
false
julia> a === c
true
In machine learning library like Flux IdDict
is used to keep track of gradients calculated for different parameters of a function. Say I have the parameters a
and b
used in function and I have returned the gradient of this function in gs
. Then I can get the derivative with respect to a
with gs[a]
and the one for b
with gs[b]
.
This makes it easy too lookup related values compared to if we say had to remember some numerical order.