This site is part of the Informa Connect Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

Quant Finance
Computational & Numerical Efficiency

A brief introduction to Automatic Adjoint Differentiation (AAD)

Posted by on 09 January 2019
Share this article

Antoine Savine, Quantitative Research at Danske Bank, gives us a 101 on Adjoint Differentiation and outlines the value of Automatic Adjoint Differentiation.

Virtually unknown to finance just over ten years ago (it was introduced in Giles and Glasserman’s pioneering paper ‘Smoking Adjoints’), AAD is now considered a key ingredient in every decent in-house or vendor derivatives risk management system. In the recent years, AAD has been the subject of many talks and workshops at QuantMinds and other conferences and countless research papers, most notably by Giles and Glasserman, Luca Capriotti or Uwe Naumann. And yet, it remains largely mysterious and generally misunderstood in the risk management community.

What AAD does in finance is compute risk sensitivities of complex transactions or large trading books or netting sets with ‘magical’ speed compared to conventional methods for the production of risks by ‘bumping’ market variables one by one.

AD computes many derivatives sensitivities very quickly. Nothing more, nothing less.

So what is Adjoint Differentiation (AD, also called automatic differentiation or reverse differentiation) exactly? AD is an application of the chain rule for derivatives to compute differentials in constant time. This means that AD computes all the differentials of a scalar function of many variables, in a time similar to one evaluation of this function, independently on the number of inputs.

Hence, AD is an algorithm to calculate derivative sensitivities, very quickly. Nothing more, nothing less. In general, and although AD computes derivatives analytically, its results remain virtually identical to bumping, and AD cannot magically differentiate discontinuous functions.

All AD does is quickly compute many differentials. And the difference it makes for financial risk management is massive. In the context of CVA (or more generally XVA or other regulatory amounts), an accurate evaluation may take several seconds and up to minutes, depending on the size of the netting set, even implemented on parallel CPU or GPU with the smart algorithms explained by Jesper Andreasen in his famous talk ‘Calculate CVA on your iPad Mini’ on QuantMinds 2015 (called Global Derivatives back then). In addition, a CVA on a large netting set may easily depend on thousands of market variables: all the yield curves, spread curves, market prices, volatility surfaces of all underlying assets and all currencies in the netting set. To compute its risk sensitivities by conventional means, we would have to repeat the evaluation thousands of times, bumping one market variable by a small amount at a time, something only imaginable on a large data centre overnight.

AD can compute all these sensitivities in around five times one evaluation, within seconds to a few minutes, in real time, on a trader workstation.

It is the same technology which powers computer vision that allows banks to compute many complex risks in real time.

AD is also known in the field of Machine Learning under the name ‘back-propagation’ or simply backprop. Backprop is a key ingredient powering Deep Learning. Deep neural networks are trained by optimizing loss over thousands to millions of parameters. They can only learn in reasonable time if the gradient of the loss to all the parameters is produced very quickly, and backprop is what makes it possible. Its is therefore the same technology that powers our phones to identify our friends in pictures, and allows investment banks to compute many complex risks in real time.

Backprop is, of course, implemented in all decent deep learning frameworks, including the popular TensorFlow.

Even if one manages to understand the ideas behind the method, there are often formidable challenges in actually implementing AAD.

In his preface to Modern Computational Finance, freely available on SSRN, Leif Andersen gives an entertaining and informative perspective:

“The history of AAD is an interesting one, marked by numerous discoveries and re-discoveries of the same basic idea which, despite its profoundness, has had a tendency of sliding into oblivion […] As one starts reading the literature, it soon becomes clear why AAD originally had a hard time getting a foothold: the technique is hard to comprehend; is often hidden behind thick computer science lingo or is buried inside applications that have little general interest. Besides, even if one manages to understand the ideas behind the method, there are often formidable challenges in actually implementing AAD in code, especially with management of memory or retro-fitting AAD into an existing code library.”

Although AD is based on the simple mathematics of the chain rule for derivatives, it appears that even supremely intelligent people, who manipulate fractional Brownian motions for a living, struggle to understand it.

The reason is that, to achieve such ‘magical’ speed, AD computes derivatives in the reverse order. Every calculation, even CVA on a book of millions of complex trades evaluated over thousands of simulated scenarios, can be split into a (long) sequence of elementary operations: add, subtract, multiply, log, exp, sqrt, etc. applied to one or two previously computed results. If we denote xi the result of operation number i and xN the final result of the calculation, and we want to compute all the ∂xN/∂xi we note that ∂xN/∂xN=1 and ∂xN/∂xi=∑∂xN/∂xj⋅∂xj/∂xi, where the sum is taken over all successors of xi, that is, all the operations xj, j>i that take xi as an argument of an elementary operation, which derivatives ∂xj/∂xi are trivially known. These two remarks immediately lead to an algorithm to compute of all the derivatives ∂xN/∂xi, traversing all the operations involved in the calculation exactly once, but in the reverse order. We refer to the numerous QuantMinds talks and workshops for details, and a textbook like the recent Modern Computational Finance book for a complete explanation.

Even the most complex calculation is a (long) sequence of additions, multiplications, logs, exponentials, square roots, etc. To implement AAD, all those operations must be recorded.

Now, so far, we introduced Adjoint Differentiation (AD) and explained how it computes many differentials very quickly. In practice, to implement adjoint calculus by hand is tedious, prone to error, and a maintenance nightmare in a professional library in constant evolution. AAD (where the first A stands for ‘automatic’) is a computer programming technique that applies operator overloading and template meta-programming to implement AD automatically, behind the scenes, over calculation code.

It follows that AAD is both a (simple but somewhat hard to comprehend) mathematical algorithm, and a (highly challenging) computer programming practice. For AD to be automatically applied in reverse order over the operations involved in a calculation, the calculation graph must be produced in memory, where all the operations must be recorded. This requirement may seem unimaginable. It was, indeed, responsible for the late adoption of AAD in finance, and remains a major challenge for the implementation of AAD on GPU.

It turns out that, with the adequate design patterns and programming techniques, especially check-pointing (split the calculation into pieces, differentiate one piece at a time and stitch derivatives back together) and expression templates (Sokol calls it ‘tape compression’ in his QuantMinds talks because it shrinks the graph stored in memory - an extreme application is the ‘tapeless AD’ presented by Uwe Naumann on QuantMinds 2017 and suitable for GPU) the recording and traversal of the graph may be implemented with reasonable RAM footprint and cache efficiency. This is explained in deep detail in the Modern Computational Finance book, and reflected in the professional C++ companion library freely available on GitHub.

The takeaway is that AAD is a two-faceted technology: a mathematical algorithm paired with a programming technique, which, combined, produce spectacular results, although it takes skill and hard work to overcome the many challenges on the road to a practical, professional implementation.

Antoine Savine is the author of the Modern Computational Finance books with John Wiley and Sons (2018), and a regular speaker and chairman at QuantMinds International conferences, where he has been explaining and promoting AAD for many years. He is one of the key contributors to Danske Bank’s front office and XVA system, which earned the In-House System of the Year 2015 Risk award.

Share this article

Sign up for Quant Finance email updates