Source Code of Python and Julia Compared
How much effort is required to reproduce Julia functionality, such as high performance Just-in-Time compilation?
I was just curious about how large the repositories for different programming languages were. In particular I was curious about how much effort it has taken to reproduce some of the JIT functionality of Julia in Python through different means such as Numba an PyPy.
I cloned the repos without full history like this:
❯ git clone --depth 1 https://github.com/JuliaLang/julia.git
❯ git clone --depth 1 https://github.com/python/cpython.git
❯ git clone --depth 1 https://github.com/numba/numba.git
❯ git clone --depth 1 https://github.com/mozillazg/pypy.git
You need a tool to count number of lines. I use cloc
:
❯ brew install cloc
Getting the Data
Let us get the details before I will do a summary of the findings. I edit these output and remove the source code with very few lines.
Julia
❯ cloc julia
--------------------------------
Language files code
--------------------------------
Julia 765 250611
C 86 53281
Markdown 139 34755
C++ 42 31119
C/C++ Header 63 12421
Scheme 11 7991
--------------------------------
SUM: 1553 414812
--------------------------------
CPython
The standard C code implementation of Python.
❯ cloc cpython
-----------------------------------------
Language files code
-----------------------------------------
Python 1977 589058
C 326 363973
C/C++ Header 424 184327
reStructuredText 746 116010
-----------------------------------------
SUM: 3941 1355086
-----------------------------------------
Numba
An LLVM based JIT compiler which allows you to decorate Python function to JIT compile them. Mainly intended for numerical work. Not general purpose.
❯ cloc numba
-----------------------------------
Language files code
-----------------------------------
Python 644 165495
reStructuredText 98 11970
C 23 8433
C/C++ Header 18 6731
-----------------------------------
SUM: 837 196747
-----------------------------------
PyPy
A full JIT implemented in Python. No decorators needed.
❯ cloc pypy
------------------------------------
Language files code
------------------------------------
Python 4323 1460473
C 144 37618
C/C++ Header 187 28981
reStructuredText 197 17405
------------------------------------
SUM: 4962 1575144
------------------------------------
MyPy
Static code analysis in Python. Doesn’t actually run Python. Just analysis Python code with type annotations.
❯ cloc mypy
----------------------------------
Language files code
----------------------------------
Python 1164 135679
C/C++ Header 28 15459
C++ 10 5886
reStructuredTex 54 5262
C 12 2708
----------------------------------
SUM: 1312 167610
----------------------------------
Conclusion
We can see form the overviews the following:
- Julia — 250k Julia code. About 100k C/C++ code, and 8K Scheme code.
- CPython — 590k Python code. About 550k C/C++ code.
- Numba — 165k Python and 15k C/C++.
- PyPy — 1460k Python
- MyPy — 135k Python and 24k C/C++
One of my intentions from looking at this, is that I wanted to show that reproducing what you got in Julia in Python is not easy feat. E.g. PyPy tries to bring Just-in-Time compilation to Python but ends up being much larger than Julia itself. The comparison may not be fair since Julia utilizes LLVM.
However we got Numba which also uses LLVM but has a much smaller scope than Julia. Yet it is comparable in size. A proper comparison will require more digging and analysis since part of the Python standard library has been reimplemented in Numba.
What motivated me to make this comparison? Because I remember in the past that people would often complain about the Julia creators for creating Julia. Many insisted that they should just have put their efforts into making Python faster.
I hope this comparison can make some sense of why the world isn’t that simple. When you design a language from scratch you can make design choices which makes a lot of optimization easier to write.
We can see many of the challenges of efforts such as Numba and PyPy. They have to duplicate a lot of efforts from CPython. E.g. Numba has to reimplement Python standard library functions and make sure those stay in sync with CPython implementations.
Imagine someone trying to bring Julia style technology to Python. It would mean yet another duplication effort and code to synchronize. This is not in any way limited to Python. I am currently spending time doing Swift and there as well I can sense the weight of history.