“Algebra” roughly translates to “relationships” and linear means “line-like,” thus “Linear Algebra” means “line-like relationships”.
So why does algebra from school seem to be just about letters rather than numbers.
In my head algebra is just about arranging and rearranging letters according to specific rules. However here is a way of looking at it.
Consider the equation below. You don’t know what value of the
x and the
(x + y)² = x² + 2xy + y²
However we can still workout the relationship between
(x + y)² and
x² + 2xy + y². We know that both expressions are equal, and we know the relation between the value of
y and each side of the equation. We could write this as a function:
f(x, y) = (x + y)²
Linear relations are relation similar to what you can describe on a line. Linear relations are predictable. You double the input and you get double the output.
This is not a linear relation
f(x) = x²
If I double the input, the output is quadrupled. This is an example of a linear relation:
f(x) = 4x
However this is not a linear relation, even if you make think so.
f(x) = 2x + 4
f(1) is 6, while
f(2) is 8. We doubled the input but did not get double the output.
We can make it linear by adding another parameter.
f(x, y) = 2x + 4y
If we double the inputs, we will now be doubling the output as well.
f(2x, 2y) = 2*2x + 2*4y
2(2x + 4y) = 2f(x, y)
Thus we can make linear functions with some flexibility by just adding a number of arguments.
The Matrix and Vector Connection
A problem with these functions is when they have a lot of arguments it becomes repetitive and a lot to write.
That is why matrices and vectors are used. We can express the previous function
f as dot product.
[2, 4] ⋅ [x, y] = 2x + 4y
f = [2, 4]
f ⋅ [x, y] = 2x + 4y
Thus you can think of the
f vector as the operation being done on the input vector
But where does matrices get into the picture? The problem with the trusted old function notation is that you get writing cramps spelling it all out if you are working with a lot of data.
Matrices are basically the spreadsheet app of the 1700s. They needed a compact way to work with lots of numbers. Say you want to apply the function
f to a lot of numbers. We could write:
f(3, 4) = 2*3 + 4*4 = 22
f(2, 6) = 2*3 + 4*4 = 28
f(12, 1) = 2*12 + 4*1 = 28
Instead of this we could treat
f as a row vector and the input as a matrix. Each column in the matrix represents a collection of input parameters.
In this case we will use syntax from the Julia language, where
[x y z] represents a row vector and
[x, y, z] represents a column vector. Using Julia syntax I can present the previous calculations in a more compact form:
f = [2 4]
X = [3 2 12; 4 6 1]
f * X = [2*3 + 4*4 2*3 + 4*4 2*12 + 4*1]
f * X = [22 28 28]
Benefits of a Matrix
This also provides a hint to why the function or operation we perform on the input should not be a vector but a matrix. What happens if you want to perform multiple operations on the same input?
Say we want to calculate total cost of buying some items,
z. Say x costs 150 dollars, y costs 50 and z costs 300. We could express this as:
costx(x) = 150x
costy(y) = 50y
costz(z) = 300z
And we also need to pay taxes with the rates 20%, 10% and 30% respectively. We can express this as:
taxx(x) = 1.2x
taxy(y) = 1.1y
taxz(z) = 1.3z
And then to add it all up we use
sum defined as:
sum(x, y, z) = taxx(costx(x)) + taxy(costy(y)) + taxz(costz(z))
As you can probably see this quickly becomes cumbersome and unwieldy. In particular if we are not dealing with 3 items but with 100 different items. Now imagine we have to do this calculation for 50 different shipments of these same 100 different products.
There is a way around this. To get the gist of the idea, let us play with some alternative ways of defining our functions:
cost(x, y, z) = 150x + 50y + 300z
tax(x, y, z) = 1.2x + 1.1y + 1.3z
Now if you want just the cost of the
x items, you simply write
cost(x, 0, 0). If you want the cost of
y items, that is
cost(0, y, 0). We can utilize this in our
sum(x, y, z) = tax(cost(x, 0, 0), 0, 0) +
tax(0, cost(0, y, 0), 0) +
tax(0, 0, cost(0, 0, z))
We can express this more elegantly using a matrix. The idea here is that every row in the matrix is an operation you are performing on the input which gives one result. I can then define
cost as matrix instead:
cost = [150 0 0; 0 50 0; 0 0 300]
cost * [x, y, z] = [150x, 50x, 300z]
You can see that
cost can be used to produce 3 different results, one for x, y and z. Let us define
tax the same way and see how we can add up everything using matrices.
tax = [1.2 0 0; 0 1.1 0; 0 0 1.3]
tax * cost * [x, y, z] = [1.2 * 150 * x, 1.1*50*y, 1.3*300*z]
tax * cost * [x, y, z] = [180x, 55y, 390z]
Let us look at how this would look in the Julia programming language using the bundled REPL (read-evaluate-print-loop) environment.
julia> cost = [150 0 0; 0 50 0; 0 0 300]
150 0 0
0 50 0
0 0 300
julia> tax = [1.2 0 0; 0 1.1 0; 0 0 1.3]
1.2 0.0 0.0
0.0 1.1 0.0
0.0 0.0 1.3
We order items in two different batches. First we order one item of each and in the second batch we order 3, 2 and 1 item respectively.
julia> amounts = [1 3;1 2; 1 1]
This gives us the following prices for each item for each batch
julia> tax * cost * amounts
The next thing we need is a matrix that will add up the results. In this case we only one single result for each batch. That means there is only one row in this matrix. Each row in a matrix must be seen as an operation you want to perform on the input.
julia> sum = [1 1 1]
1 1 1
julia> sum * tax * cost * amounts
We can verify the correctness of this result:
julia> 180 + 55 + 390
The beauty of this approach is that we are able to combine multiple functions calculating cost, tax etc through matrix multiplication. This allows us a compact representation of the operations we perform. The functions defined for scalar operations became clunky to use as the number of items and operations grow.
Examples of Useful Matrix Operations
Getting used to thinking of a matrix as a set of operations performed on some input that which may also be a matrix takes some time getting used to. But here are some examples to hammer home the point.
julia> inputs = [3, 2, 1]
This will pick just the
julia> picky = [0 1 0]
0 1 0
julia> picky * inputs
Basically for each row, you put a 1 for the value you want to pick and 0 for the ones you don’t want. This way we can define a matrix that changes the order of elements, by first picking
y and finally
julia> swapper = [0 0 1; 0 1 0; 1 0 0]
0 0 1
0 1 0
1 0 0
julia> swapper * inputs
Or how about doubling every input element:
julia> doubler = [2 0 0; 0 2 0; 0 0 2]
2 0 0
0 2 0
0 0 2
julia> doubler * inputs
Thus far we have only represented functions of the form:
f(x) = ax
But we want the flexibility of representing functions on the form:
f(x) = ax + b
b is some constant. Turns out this is quite easy to do. You simply require all inputs to have a last element which always has the value 1. We can then represent
f like this:
f = [a b]
f * [x, 1] = ax + b
This works for any number of arguments.
f = [a b c d]
f * [x, y, z, 1] = ax + by + cz + d
With this background it is perhaps easier to understand how affine transformations used in computer graphics to translate, rotate and scale objects work.
If you are translating a point by
(dx, dy) in 2D, then you would use the following matrix:
1 0 dx
0 1 dy
0 0 1
Why does this work? You think of each row as an operation. The first operation produces the new x-coordinate and the second operation produce the y-coordinate.
We can look at it as two functions
y coordinates as inputs and producing new coordinate
x' = f(x, y, 1)
y' = g(x, y, 1)
f has to do is to pick the x-coordinate, because it is supposed to produce the first coordinate. Picking
x is done with
1 0 0.
g representing the second row picks
0 1 0.
Remember the trick with constants. By having
z = 1 always, we can use the third argument to produce offsets. So we define
1 0 dx to pick
x and add an offset
dx to it.
But why do we need the third row?
Notice how we need 3 columns to be able to offer an offset such as
dy. For that to work the input needs to have 3 rows where the last row is 1.
Remember in matrix multiplication the columns in the first matrix has to match the rows in the second argument. If this is not obvious to you, read my story on why matrix multiplication works the way it does.
Thus if our operation matrix did not have a third row, we would get the problem that a two-row column vector would be output. Remember each row in the affine matrix will produce a component. With only two rows, you would get just a two component output.
That means your output could not be multiplied with another affine matrix. Nor could you multiply two affine matrices with each other. You can multiply a 3x3 matrix with a 3x3 matrix because the rows and columns match each other.
However you cannot multiply a 2x3 matrix with a 2x3 matrix. Thus there would have been no way of combining multiple matrices into one matrix. Say you want to do a scaling, rotation and translation. You can multiple all these three matrices into on matrix. They you can reuse this matrix, multiplying it with a million points if you want to.