Source Code of Python and Julia Compared

How much effort is required to reproduce Julia functionality, such as high performance Just-in-Time compilation?

3 min readJun 27, 2021

I was just curious about how large the repositories for different programming languages were. In particular I was curious about how much effort it has taken to reproduce some of the JIT functionality of Julia in Python through different means such as Numba an PyPy.

I cloned the repos without full history like this:

❯ git clone --depth 1 https://github.com/JuliaLang/julia.git
❯ git clone --depth 1 https://github.com/python/cpython.git
❯ git clone --depth 1 https://github.com/numba/numba.git
❯ git clone --depth 1 https://github.com/mozillazg/pypy.git

You need a tool to count number of lines. I use cloc:

❯ brew install cloc

Getting the Data

Let us get the details before I will do a summary of the findings. I edit these output and remove the source code with very few lines.

Julia

❯ cloc julia
--------------------------------
Language       files        code
--------------------------------
Julia            765      250611
C                 86       53281
Markdown         139       34755
C++               42       31119
C/C++ Header      63       12421
Scheme            11        7991
--------------------------------
SUM:            1553      414812
--------------------------------

CPython

The standard C code implementation of Python.

❯ cloc cpython
-----------------------------------------
Language             files           code
-----------------------------------------
Python                1977         589058
C                      326         363973
C/C++ Header           424         184327
reStructuredText       746         116010
-----------------------------------------
SUM:                  3941        1355086
-----------------------------------------

Numba

An LLVM based JIT compiler which allows you to decorate Python function to JIT compile them. Mainly intended for numerical work. Not general purpose.

❯ cloc numba
-----------------------------------
Language         files         code
-----------------------------------
Python             644       165495
reStructuredText    98        11970
C                   23         8433
C/C++ Header        18         6731
-----------------------------------
SUM:               837       196747
-----------------------------------

PyPy

A full JIT implemented in Python. No decorators needed.

❯ cloc pypy
------------------------------------
Language          files         code
------------------------------------
Python             4323      1460473
C                   144        37618
C/C++ Header        187        28981
reStructuredText    197        17405
------------------------------------
SUM:               4962      1575144
------------------------------------

MyPy

Static code analysis in Python. Doesn’t actually run Python. Just analysis Python code with type annotations.

❯ cloc mypy
----------------------------------
Language        files         code
----------------------------------
Python           1164       135679
C/C++ Header       28        15459
C++                10         5886
reStructuredTex    54         5262
C                  12         2708
----------------------------------
SUM:             1312       167610
----------------------------------

Conclusion

We can see form the overviews the following:

Julia — 250k Julia code. About 100k C/C++ code, and 8K Scheme code.
CPython — 590k Python code. About 550k C/C++ code.
Numba — 165k Python and 15k C/C++.
PyPy — 1460k Python
MyPy — 135k Python and 24k C/C++

One of my intentions from looking at this, is that I wanted to show that reproducing what you got in Julia in Python is not easy feat. E.g. PyPy tries to bring Just-in-Time compilation to Python but ends up being much larger than Julia itself. The comparison may not be fair since Julia utilizes LLVM.

However we got Numba which also uses LLVM but has a much smaller scope than Julia. Yet it is comparable in size. A proper comparison will require more digging and analysis since part of the Python standard library has been reimplemented in Numba.

What motivated me to make this comparison? Because I remember in the past that people would often complain about the Julia creators for creating Julia. Many insisted that they should just have put their efforts into making Python faster.

I hope this comparison can make some sense of why the world isn’t that simple. When you design a language from scratch you can make design choices which makes a lot of optimization easier to write.

We can see many of the challenges of efforts such as Numba and PyPy. They have to duplicate a lot of efforts from CPython. E.g. Numba has to reimplement Python standard library functions and make sure those stay in sync with CPython implementations.

Imagine someone trying to bring Julia style technology to Python. It would mean yet another duplication effort and code to synchronize. This is not in any way limited to Python. I am currently spending time doing Swift and there as well I can sense the weight of history.