This site is part of the Informa Connect Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

Quant Finance
search
AI

Bigger is not always better: Forecasting commodities with NLP data

Posted by on 04 November 2022
Share this article

SESAMm has a large data lake of more than 20 billion articles (growing by 5–10 million a day) and 14 years of data in 100 languages. But its size alone is not what makes it good; it’s a refined process to find the exact data you want that makes it better.

Here’s an example to help explain the point. We’re sometimes asked for help researching data to forecast and monitor the commodities market, even by large companies with their own commodities desk of traders and quant researchers. Why would they seek help from outside their firm?

Simply put, traders want an edge. They want information advantages that others are likely to miss, so they look to alternative data from various sources, anything that adds value and is from different angles. And, as it turns out, commodities are a more challenging segment to analyze when it comes to alternative text data. Unlike for companies, commodity texts are scarcer and need more domain knowledge to unravel their implications. A simple sentiment analysis doesn’t bring enough relevant information.

For a more in-depth view, join us as we discuss NLP-derived alternative data, its benefits, challenges for researchers, and why bigger isn’t always better in the world of data.

NLP-derived alternative data challenges

NLP-derived alternative data is not like financial data, like accounting data, where you know exactly what you’re manipulating. Sentiment on a given stock, for example, could be calculated on up to hundreds of thousands of texts. Hence, it may look like an abstract indicator compared to published earnings or only a stock price.

As a result of NLP data’s unstructured nature, it isn’t straightforward to exploit. Financial professionals may not have the key to reveal their full potential without investing much time on technical topics far from financial analysis like NLP or data engineering. They start with simple but limited techniques like word counts. However, the more they work with NLP data and the more they embrace its potential, the more specific and business-oriented their specifications become. It then requires advanced NLP algorithms to address them.

Machine learning at the heart of the process

The asset management industry is still figuring out how best to incorporate recent advances in artificial intelligence. Due to its availability, quants have tended to focus almost exclusively on structured information like market and macroeconomic data, company financials, or analyst estimates. They used machine learning algorithms to harvest the most from that data. Though, there isn’t much alpha left because that data provides only an incomplete picture and so much money chasing few easy-to-deal-with datasets.

In contrast, fundamental investors build insight by reading 10-Ks, earnings call transcripts, and tracking companies in the news and discussions. However, quants struggled to leverage such unstructured data, whose volume is growing exponentially. In the end, such large and unstructured information can only be harnessed with machines and algorithms. For example, NLP techniques can be used to extract topics, controversies, virality, and more. All of a sudden, you have insight that’s more than the “what” but also includes the “how” and sometimes the “why.” You can now see stocks through new dimensions inaccessible to traditional asset managers.

“Garbage in, garbage out” rings true

As the basic computing mantra states, “garbage in, garbage out,” it matters your data source and the algorithms you apply. That’s what makes SESAMm so unique.

Not only do we have a massive data lake, everything from our knowledge graphs to our machine-learning algorithms is optimized to provide the most accurate NLP-derived alternative data possible for the financial market. In other words, better data is better than bigger data.

Size, speed, and coverage

And, of course, the qualities every quant desires are size, speed, and coverage. At SESAMm, we have the size, but we have the speed and coverage, too. We gather data on more than 40 thousand listed and four million private companies in near real time. And did we mention we do that in more than 100 languages?

Bigger is not always better…but it helps

We know, we know. We titled this article “Bigger is Not Always Better In The World of Data,” but there’s this one factor when bigger is better, and that’s when you need to filter data. The truth is you can get more specific when you have more data. And this factor is a considerable advantage compared to when you have less data because when you try to specify even a little with less data, you might find you don’t have enough data left for analysis.

Because SESAMm’s data lake is +20 billion articles, messages, and posts large, we can get specific about what data we’re looking for. And this ability, specificity (technically not bigger), is exactly what enables us to provide unique insights to asset managers.

The challenge

While SESAMm is great at extracting sentiment from web data, we can’t apply it to commodities with consistency. The problem with sentiment when it comes to commodities is that it doesn’t reflect a price change in any direction. For example, a well-calibrated stock sentiment can often be correlated to idiosyncratic price changes: positive can equal a price increase, and negative can equal a price decrease. In comparison, texts about commodities describe mostly negatively perceived events like an embargo, the consequences of a social crisis, or the demand shock related to COVID-19. In this context, the sentiment does not provide the necessary dichotomy to justify price changes.

Most data providers perform sentiment analysis and keyword matching, but there’s no point in doing the same. Instead, SESAMm’s TextReveal® extracts content based on relevant thematics. We also trained a specific machine-learning algorithm to detect any commodity’s key events. In doing so, we create a unique event database that gives our clients a difficult-to-bridge information gap for its competitors. Then, we link those events to supply, demand, and other issues, and by connecting these issues, we can break down and infer a price movement.

It’s easy to see that a supply shortage created a price hike, and a demand shortage did the opposite, but detecting supply and demand shortages is the tricky part. The combination of domain expertise, domain knowledge, and machine learning allows us to complete such tasks and create a final signal with consistent results. Ultimately, this signal could be produced by gathering and analyzing alternative data which can be used to monitor and forecast commodities.

Machine learning can generate alpha

All in all, machine learning helps extract insights from data. And with specificity—the right data and the right algorithm—NLP-derived alternative data helps generate alpha, a valuable tool to create a trading strategy.

But the caveat is, because today it’s easy to deploy machine learning algorithms, it’s also easy to hide bad data issues and bad problem specifications. In other words, in the wrong hands, machine learning can give you good-looking past signals but bad live results. Artificial intelligence has a transformative effect on asset management. However, applying it to finance requires technical expertise and financial market knowledge, the latter being harder to acquire. So, it’s better to have an experienced team with domain knowledge and a financial background—like SESAMm.

About SESAMm

SESAMm is a leading NLP technology company serving global investment firms, corporations, and investors, such as private equity firms, hedge funds, and other asset management firms, by providing datasets or NLP capabilities to generate their own alternative data for use cases, such as ESG and SDG, sentiment, private equity due diligence, corporation studies, and more.

Email contact@sesamm.com with any questions or to request a demo.

Share this article

Sign up for Quant Finance email updates

keyboard_arrow_down