This site is part of the Informa Connect Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

QuantMinds International
17 - 20 November 2025
InterContinental O2London
Structured data: To use or not to use?

With the promise of order and organisation, structured data is an ideal that many don’t have the resources to achieve. But what can we do with unstructured data to derive actionable insights from them? Saeed Amen, Cofounder at Turnleaf Analytics, explores the scenarios where trade-offs can be made and how LLMs changed the game.

What is structured and unstructured data?

Data comes in all sorts of varieties. One way to classify data is to categorise them into unstructured and structured data. Structured data essentially comes in a clear common format. One of the most forms of structured data in finance is a time series of prices. Every record has the same format, a timestamp and a price. By contrast unstructured data is not formatted in a consistent way for each record. Most data tend to be unstructured, in particular if we think of data on the web.

Examples of structuring datasets

We can think of web pages as unstructured data. Every web page will have a different format. In order to structure a web page into some sort of common form, we need to do a lot of work. We need to exclude parts of the webpage which are irrelevant from a content perspective (such as the navigation bar) and focus on the body text, cleaning up the original source. Then to be able to add metadata to describe the body text. What is the title of the article? What are the topic keywords? When was the article written? What is the source of article? What is the sentiment and overall tone of the article? We could try to manually do all of this, but of course, it really isn’t feasible, so we need to use NLP models, particularly with the volume of text you are likely to ingest. However, we can at least verify the output of NLP models, using our own judgement.

For our text example could then aggregate the articles about topics of interest and then, for example, construct sentiment time series for a particular topic of interest. In our satellite image example, we could geofence particular areas of interest, let’s say copper mines, and use computer vision, for example, to understand the activity of the mine, by counting the number of trucks or trains exiting the mine daily to create a time series if repeated over a long period of time. Again, just like our NLP example above, we’ll need models to structure the images.

We can then combine our time series extracted from the structured data with other time series of interest (e.g. based on market data) and construct forecasts. Whilst this is one specific example, it generally illustrates the process of dealing with unstructured data into some sort of forecast or trading signal.

Back to the question: Structured or unstructured?

If you had a choice to use unstructured data or structured data, what would it be? Ultimately, we want to work with structured data. Hence, we might conjecture that we should always use structured data. But I would say: it really depends. It boils down to answering the question: could we structure the original unstructured data in a better way, and to what degree?

If you enough resources, you could argue that you can always do a better job, but no one has unlimited resources. A data vendor who is geared up to do a particular task, has economies of scale. Spending time structuring a large complex dataset could be spent evaluating a totally different dataset. It boils down to the usual question: would you “build or buy”?

It can also be argued that the “rawest” form of data really isn’t always necessary. Take for example, web scraping. Do we really need to have the original HTML, and work from that? If a vendor has already cleaned up the HTML and extracted the body text in a satisfactory manner, then what additional value can we add doing that ourselves at this particular task? We can end up with diminishing returns.

Where do LLMs fit into structuring?

The advent of LLMs has made structuring datasets, in particular text-based ones, easier… or at least, it reduced the barrier to entry. Consequently, LLMs are in need of much more demand for text data, as folks explore doing their own metadata tagging for things like sentiment, for example.

I would argue though that there are lots of NLP models for specific tasks which could also be useful – it isn’t always necessary to use an LLM, even if they are all the fashion! HuggingFace hosts many NLP models both for specific applications and LLMs as well.

Of course, there are still some caveats to using LLMs for trading applications, notably the difficulty with making sure there isn’t look ahead bias, as well questions such as computing costs, hallucinations etc.

We can ask LLMs something such as tagging topics and to help us structure text. However, it is another thing to ask an LLM “please can you forecast the Fed funds rate?”, where it might just end up answering the question based on a news article in the training set written after a Fed meeting. Admittedly, the look ahead bias could impact task specific NLP models too, depending on the use case.

Conclusion

Ultimately, if we are trying to extract a trading signal, it is often much easier to use structured data. Having access to raw data enables us to have more control over the process of structuring the data. LLMs have made it easier to deal with “rawer” data and to structure them, but I would still argue there is a compromise to be made. If every dataset you use needs to be structured, it will limit your ability to look at other datasets. It also depends on the level of structuring that needs to be done. If a vendor has already done some of the heavy lifting for you, then you can focus on those areas where you can add the most value.

Explore the latest quant techniques and innovations at QuantMinds International this November!


Related news