Yet Another Math Programming Consultant: Julia vs Python

Wednesday, September 20, 2023

Julia vs Python

I gave a talk to economists (i.e., not professional programmers) about a Julia project we were working on. Julia is famous for its speed. It uses LLVM [1] as back-end for its JIT (Just In Time) compilation. As seeing is believing, here is an example algorithm which is used to compare performance between different languages and tools. This example was chosen as it is small, easy to explain, and easy to program while still showing meaningful time differences.

We have a square \([-1,+1]\times[-1,+1]\) and an inscribing circle with radius \(1\). See the dynamic figure below. Their areas are \(4\) and \(\pi\) respectively. The idea is to draw \(n\) points \[\begin{align}& x_i \sim U(-1,+1) \\ & y_i \sim U(-1,+1)\end{align}\]Let \(m\) be the number of points inside the circle, i.e. with \[x_i^2+y_i^2\lt 1\] Obviously, from the ratio of the areas, we have \[\frac{m}{n} \approx \frac{\pi}{4}\] It follows that an estimate of \(\pi\) is \[\hat{\pi}=4\frac{m}{n}\]

Simulation with n=1000

Here, I look at five different implementations of this simulation, in increasing speed:

A straightforward implementation in Python using a for loop. This is very slow.
A GAMS version of this algorithm. Just for the heck of it.
A set-driven GAMS version replaces the inner loop with a sum. This version was suggested by [4].
A Python version that uses the Numba JIT compiler [2]. Numba uses the same LLVM [1] back-end as Julia. This is 10 times as fast. Numba does not support f-strings, so the print statement must be changed. This version was not in my talk. I added it for completeness. And out of curiosity.
A Python implementation using NumPy [3]. Here, we use arrays instead of explicit loops. Of course, NumPy will still use loops behind the scenes, but that is done inside a more high-performance C library implementation. This formulation gives a big performance boost. Lesson: in Python, loops are very expensive. This variant is not very close to our original version. It is, however, significantly faster than the Numba version.
This is the same intuitive implementation as in 1., but now using Julia. This is the fastest of all, by a large margin. The looped version is very fast and it will require some esoteric code to improve on this. [5]

A single simulation uses 1 million points. We repeat this 10 times.

1. Python loop	Output: This is a scalar loop executed in pure Python. It is the slowest version.
2. GAMS scalar loop	Output: A bit faster than the Python loop.
3. Set-driven GAMS formulation	Output: This is twice as fast as the "for i" loop in the previous GAMS fragment.
4. Python/Numba loop	Output: Here we use the Numba JIT compiler to speed up the loop. We achieve a speed-up of about 10x. Nothing to sneeze at.
5. Python/NumPy without loops	Output: Here we use NumPy arrays so we don't need loops. This is faster than the Numba example. This gives us a 25x speed-up.
6. Julia loop	Output: Note the use of \(\in\) and \(4m\). This Julia version is very fast. We recorded here a 150x speed-up compared to the first example. The main expenditure here is the random number generation.

The performance is summarized in the following table:

Implementation	Time for 10 simulations (seconds)	Approximate speed-up
1. Pure Python with loops	4.66
2. GAMS scalar loop	2.27	2x
3. Set-driven GAMS	1.17	4x
4. Python/Numba with loops	0.45	10x
5. Python/NumPy without loops	0.18	25x
6. Julia with loops	0.03	150x

I find these numbers to be rather astounding.

It is a bit of a surprise to me that GAMS is about 2x as fast as Python, using the same loop-based algorithm. As GAMS is not a general-purpose programming language, including it here is a bit of a gimmick. When programming in Python, it is not unusual to write several different implementations: one that works and then some faster versions. It certainly is beneficial in this case. An argument for using Julia could be that our first, intuitive, implementation is already very fast.

Conclusion

Explicit loops in Python are expensive. We have to rewrite things considerably to get decent performance. This is not the case for Julia: loops are cheap. A first, straightforward implementation in Julia performs better than our Python attempts.

All versions here are serial. Sometimes, NumPy may do some parallelization under the hood.

References

The LLVM Compiler Infrastructure, https://llvm.org/
https://numba.pydata.org/
https://numpy.org/
Thomas Rutherford, University of Wisconsin, Private Communication
https://julialang.slack.com/archives/C67910KEH/p1696273686727619?thread_ts=1696266934.797769&cid=C67910KEH demonstrates some Julia code using VectorizedRNG.

7 comments:

AnonymousSeptember 26, 2023 at 9:27 PM
Hello, how did you create the animated graphs?
ReplyDelete
Replies
AnonymousOctober 24, 2023 at 11:43 AM
You're testing the speed of random number generation.
ReplyDelete
Replies
AnonymousJuly 23, 2024 at 9:44 PM
Its a small point, but I came across an article discussing adding together large arrays with numpy https://www.labri.fr/perso/nrougier/from-python-to-numpy/#temporary-copy. Thought you (and your audience) might find it interesting.
ReplyDelete
Replies

Add comment

Wednesday, September 20, 2023

Julia vs Python

1. Python loop

2. GAMS scalar loop

3. Set-driven GAMS formulation

4. Python/Numba loop

5. Python/NumPy without loops

6. Julia loop

Conclusion

References

7 comments: