Flux vs TensorFlow Misconceptions
Do you not get what the big deal is about the Flux Machine Learning Library? Let us clear up some common misconceptions.
TensorFlow is the 800-pound Gorilla of Machine Learning that almost everybody in the field have heard about and have some familiarity with. But there is a tiny uppity little upstart called Flux which is kicking ass and taking names, causing it to grab some attention. Yet it is still a very misunderstood machine learning library.
Emmett Boudreau is fellow Julia fan here on medium has written a number of articles on the topic. One of his latest Is Flux Better Than TensorFlow made me realize reading the comments that people really don’t grasp what the big deal about Flux is.
If you are living and breathing Julia, there is a lot of things you take for granted. So here are some great questions which deserve answers.
How is Julia Performance Relevant for Flux?
William Heymann asks:
Since tensorflow compiles to a graph and runs optimized code on a cpu, tpu or gpu. Does the language speed matter? From what I have seen tensorflow is the same performance in c++ and python
This is true. Once you have a graph setup, TensorFlow will run fast. There isn’t much Julia and Flux can do to one up TensorFlow. But there is a number of assumptions baked into the framing of this question.
TensorFlow is based on having graph nodes (ops) pre-made in high performance C++. The Python interface is really just used to shuffle around and organize these pre-made nodes into a graph, which is then executed.
If you need a particular kind of functionality which does not exist as a node already, then you have a couple of choices:
- Try to mimic the needed functionality by composing existing nodes in Python.
- Write your own custom node (op) in C++, and to all the busy work of registering it with the TensorFlow system, to make it available from Python.
Contrast this with Flux, where you don’t assemble a graph made up of pre-made nodes. You basically just write regular Julia code. You have the whole Julia language is at your disposal. You can use if-statements, for-loops call any number of other functions. Use custom data types. Almost anything is possible.
Then you specify what part of this regular Julia function represent parameters that you want to tweak. You hand this function to Flux and tell Flux to train on this function feeding it inputs and checking the output. Flux then uses your chosen training algorithm to update parameters until your function produce the desired output (super simplified).
Of course this Julia function could be made up of a bunch of classic sigmoid functions arranged in classic neural networks. You can do some convolutions etc. But that is just one option. The function you hand to Flux can be anything. It could be a whole ray-tracer if you wanted, where training tweaks some parameters of the ray-tracer algorithm to give desired output.
The function could be some advance scientific model you already built. No problem, as long as you can specify the parameters to Flux.
Basically the graph Flux deals with is the same abstract syntax tree graph of regular Julia code. You don’t have to build up a graph manually with all sorts of restrictions like in TensorFlow.
If you have some scientific model in Python which you want to tweak through machine learning e.g. you cannot just feed it as-is to TensorFlow. No, you would have to go over this code and figure out a way to replace everything it does by building a complex graph in TensorFlow only using nodes supplied by TensorFlow. You don’t have the full power of Python available.
Let us expand on this question by answering Gershom Agim:
You mention Julia’s a faster language but show no examples of where/how that comes into play considering TF, numpy and the whole ML ecosystem are just monolithic wrappers around c++, fortran and cuda libraries
Emmett Boudreau does indeed state that Julia is faster than Python and that may not seem to matter as you are just running C++ code in TensorFlow.
But as I elaborated on above, by having this two-language split you incur a lot of inflexibility. Every problem must be expressed in terms of nodes assembled into a graph. If some function does not exist you have to find a way to create it with existing nodes or go through all the hassle of doing this in C++.
Manually describing code for solving a machine learning problem by assembling a graph is not natural. Imagine writing regular code like that. Instead of writing say:
a = 3
b = 4
c = a + b - 10
You would have to write something like this:
a = VariableNode("a")
b = VariableNode("b")
c = AddNode(a, SubNode(b, Constant(10)))
execute_graph(c, ["a" => 3, "b" => 4])
The latter is the a TensorFlow inspired way of approaching the problem. The former is a more Julia inspired way of writing code. You just write regular code. Julia is very much like LISP under the hood. Julia code can easily be manipulated as data. There is no need to invent a whole separate abstract syntax tree for machine learning purposes. You just use the Julia language itself as it is.
Because Julia is a high performance language. You can run your machine learning algorithms straight on pure Julia code. You don’t need a bunch of C++ code snippets glued together in a graph arranged by Python code.
Flux Looks Small and Incomplete
On twitter Timothy Lau asked me this question.
I like flux but last I checked it seemed like it was stuck trying to catch up to the features and newer methods that keep getting added to pytorch and tensorflow.
I had to do some follow up questions to clarify what he meant. One of his examples was a list of activation functions. You can see a list of activation functions for TensorFlow here. These are functions such as
softmax. In Flux at the time the list was very short, as with the list of many other types of functions in Julia.
This made Timothy Lau conclude that Flux was incomplete and not ready for prime time. It lacked so many functions TensorFlow has.
The problem is that this is actually not an apples to apples comparison. If you look at the current list of activation functions in Flux, you will notice that they are not actually from Flux at all but from another library called NNlib.
And this is where things get interesting, NNlib is a generic library containing activation functions. It was not made specifically for Flux. Here is an example of how the
relu function has been defined in NNlib:
relu(x) = max(zero(x), x)
There is nothing Flux specific about this code. It is just plain Julia code. In fact this is so trivial you could have written it yourself. This is in significant contrast to TensorFlow activation functions, which must be part of the TensorFlow library. That is because these are nodes (or ops as TF calls them) written in C++ which must adhere to a specific interface that TensorFlow expects. Otherwise these activation function cannot be used in a TensorFlow graph.
That means that e.g. PyTorch activation functions have to be reimplemented to fit the interfaces PyTorch expects. The net effect is that in the Python world one often ends up with massive libraries, because you need to reimplement the same things over and over again.
In the Julia world in contrast the same activation functions can be implemented once in one tiny library such as NNlib and reuse in any number of Julia machine learning libraries including Flux. The net effect of this is that Julia libraries tend to be very small. They don’t have to be big, because a lot of the functionality you need for any given workflow comes from other Julia libraries. Julia libraries are extremely composable. That means you can replicate what Python does with one massive library by simply combining a bunch of small well defined libraries.
For instance running on a GPU, using preallocated arrays, or running on a tensor processing unit is handled all with entirely separate libraries. It is not built into Flux. These other libraries aren’t even made specifically to give that functionality to Flux. They are generic libraries.
Thus you cannot compare Flux alone to TensorFlow. You got to compare TensorFlow to basically the whole Julia eco-system of packages. This may give you some sense of why Julia is rapidly gaining such enthusiastic adherents. Highly composable micro-libraries is extremely powerful.