AI and LLMs are invaluable in finance for dealing with the wealth of natural language format documents that banks have. One example is term sheets. This data is unformatted, and it lacks proper representation in the database, but nevertheless it is still data.
This year, CompatibL introduced an open-source package called TradeEntry AI to help deal with unformatted data for what-if analysis. The accuracy of this AI-based approach is expected to be similar to the accuracy of humans. AI makes mistakes, just as humans make mistakes. So, the goal is not to be 100% accurate (which is not possible), but rather for AI to be as accurate as humans.
Term sheets tell the bank when to send a payment and the amount of this payment. If this data is interpreted incorrectly, this can have severe operational consequences. But this data is stored in natural language form in documents that are often over 100 pages long and lacking a consistent format.
With AI, it is possible to go through all of these documents and extract the useful data using a rigorous format in which each step that is taken in reading and comprehending the documents and extracting their data into a database can be then reviewed. As the format is auditable, it satisfies the bank’s compliance requirements.
How does TradeEntry AI work?
In the trading desk of any bank, there are experts whose job is to enter trades into the system and calculate their risk. This requires expertise, but it is a very repetitive job, and these experts would rather be doing something else. So, this job—essentially converting natural language to data—is a perfect application for AI.
TradeEntry uses generative AI not to generate documents, but rather to understand them and to extract their data. The job of entering trades is time pressured, so use of AI (which should be very reliable) means a human expert can just review and validate the results, which is a lot faster than entering trades data manually from scratch.
To build reliable workflows for LLMs, you need to treat LLMs like humans. This may sound bizarre, but it is exactly what our finding was. When building a workflow with LLMs, the experts could ask themselves, “Would a human, e.g., a junior colleague, have enough context or background to do this, or would they need to get something explained?” And once we approached the build from this angle, the reliability of the model improved from 70-80% to 95-99%+.
For instance, sometimes you need to create a monetization schedule for a trade from a natural language format. And LLMs can often misfire: they add a line, they sometimes misread a number, and users get frustrated because they think it is the computer program that malfunctions.
But that is because LLMs process data similarly to humans. If you ask a human to summarize a 300-page book, they can recall the general meaning of the book. But if you ask someone to read half a page of text, especially one full of numbers without any particular pattern, and then recall it word for word (including punctuation) without errors, a human would not be able to do it.
So, when you try to ask an LLM to do this, it is not surprising that it too fails. This is because of the way it processes the data, converting it into meaning, and then converting it back into text on retrieval. Neuroscientists think that that's exactly how the human brain works, and it has a limitation.
Even given a proper explanation of details, if you ask an LLM to do everything at once, then there is a ceiling to its accuracy. So, you need to provide all the necessary knowledge to the model step by step. And by training LLMs in the same way you would train a human, you get to this 99%+ accuracy in trade entry.
Moreover, instead of asking the model to understand the monetization schedule and send it back as data, once it has been scanned, the expert can simply ask, “Please mark where the trade is in this document,” and then use normal Python code without AI to extract it. Once you know where the trade data is, it is very easy to parse it. The key is to recognize where it is within the document and what it is.
What were the key challenges in adapting LLMs for automation of trade entry for what-if analysis?
Almost every bank has a safety committee whose first task is to validate each version of the AI that goes into the bank, checking for things such as bias, accuracy, etc. So, CompatibL had to go back and build a solution around the standard foundation model, e.g., Llama from Meta, or GPT-4 from Open AI, that can run on Azure. These models do not have to be reapproved until the vendor releases a new version.
CompatibL also invented something called reverse lookup, a technique where knowledge is put ‘next’ to the model. In other words, it is not inside the model and thus must be approved, but it is more like a way of applying the model. And the reverse lookup database can be used dynamically. When the model is about to answer a question, it is provided with examples that can help, and, more importantly, some examples where the model previously gave the wrong answer and was corrected.
With reverse lookup it is possible to actually get better results than fine tuning, while also being able to make changes and roll out very rapidly without having to go through the approval process again.
This is probably the most important research breakthrough that CompatibL has accomplished in AI in general, and it has a broader applicability in banking, not just for trade entry, but also generally for how people can use these models.