Python in quant finance: How to speed it up to compete with C++?
Python is a popular coding language because it’s easy to use. But can it really compare to the speed of C++ which banks already use? Saeed Amen, Founder, Cuemacro, takes a closer look at various ways to make Python work in quant finance… however, it comes with trade-offs.
Python is increasingly being used in quant finance. It is not very surprising given that it can be quicker to write code in Python than in other languages which have traditionally been used in quant finance such as C++. There is also a lot of support for machine learning in Python with libraries such as scikit-learn, TensorFlow and PyTorch.
However, speed is important in quantitative finance, whether it’s in generating a price, creating a risk report, fitting a model etc. Indeed, speed is one of the reasons why the many pricing libraries are written in C++ within banks. If it takes you a week to come up with a risk report, it’s not going to be that useful for a trading desk, and equally making traders wait a massive amount of time for a pricing model to compute, likely means lost business.
So how can we speed up Python code, if you’ve already spent time to optimise the “pure” Python? We can rewrite parts of the code which are bottlenecks in the Python code using languages such as C/C++ or Rust. Indeed, many common Python libraries adopt this approach. Underneath, TensorFlow uses a lot of C++, and the Python wraps around this. Meanwhile, Polars, a newer time series library, uses Rust underneath, a Python wrapper. The cost is clearly the additional time it would take to write in a language such as C++ versus say Python, and kind of removes the benefits of using Python in the first place (quicker to write). However, if it’s something that used extensively in your organisation, you might think that it’s worth the effort.
What solutions exist where we can avoid having to rewrite large parts of our codebase in another language? Numba is potentially one solution, which uses a JIT (just-in-time) compiler. It can translate Python to optimised machine code at runtime using an LLVM. We just need to add decorators to our code to the functions we want to speed up to “jit” them. It can accelerate many NumPy functions and speed up for loops considerably and also can make them parallel. If the original code is sufficiently simple, a decorator will be enough to get the speed up.
However, there are many caveats, and in many instances, adding a “JIT” decorator may not be sufficient for Numba to speed up a function. Numba doesn’t understand Pandas, so we would need to rewrite any parts of code that use Pandas to instead use NumPy arrays. Whilst Numba does support a lot of NumPy functionality, there are some features within NumPy it won’t support, so again you might have to rewrite it.
If your code ends up calling SciPy, there is an additional package Numba-SciPy which allows the acceleration of some SciPy functions, but it is less mature than the core Numba package, and the support is not deep. More broadly, you also need to make sure that whatever code is being called by your own functions can also be “jitted”. With all the buzz around GPUs, it’s good to know that Numba not only supports CPUs, but also allows the targeting GPU. However, if you want your code to be run GPUs, the rewriting can become more complicated.
When it comes to rewriting code to run in parallel, we could stick to “pure” Python without Numba, using threading (for IO heavy operations) or multiprocessing (for compute heavy computations). This can be an option if your code cannot be “jitted” to run under Numba. The downside is that it is likely to be less efficient.
There isn’t one foolproof way of speeding up your Python code. Instead, there is a menu of different approaches, each with their own tradeoffs, some of which we have discussed above. The key point I would emphasis is that when it comes to speeding up code, you should focus your energy on those parts of the code which are particularly slow, and become a bottleneck in your execution. Optimising bits of code more broadly which are not really bottlenecks takes time, which could be spent elsewhere. It can also make the code more difficult to understand and hence maintain.