Extracting financial intelligence from multilingual text using advanced NLP techniques with TextReveal® Streams

SESAMm’s financial intelligence platform TextReveal® Streams is used by quantitative and fundamental asset managers to optimise trade timing and identify new investment opportunities.
Private equity deal and credit teams also use the data for deal sourcing, due diligence, and managing portfolio ESG risk and reporting.
The data is generated using Natural Language Processing (NLP) and Artificial Intelligence (AI) algorithms that scan over four million online sources of content daily, as well as an extensive library going back to 2008. This process is done in English and eight other major languages.
SESAMm’s text analysis engine tracks sentiment and volume metrics for a broad set of predefined and custom themes, covering thousands of public and private companies and their related products, brands, identifiers, and nicknames.
The system does not capture personal data (no PII) and respects global privacy laws. It also doesn't contain any Material Non-Public Information (MNPI).
TextReveal® Streams focuses on public listed companies in major global equity indices, a broad range of other asset classes, over 25,000 private companies, and ESG Risks in 90 categories for the entire universe of companies.
From Data extraction to Granular Insight Aggregation
At the heart of the text analytics process is SESAMm’s proprietary Knowledge Graph, a vast map connecting and integrating 70 million entities and their related keywords.
Entities within the Knowledge Graph are updated weekly to ensure that changes are properly tracked. For example, if the CEO of a company changes over time, the system is aware of it.
SESAMm uses the knowledge graph of entities combined with various cutting-edge NLP techniques to interpret the text in its data lake.
These include Named Entity Recognition (NER), Named Entity Disambiguation (NED), Lemmatization, Embeddings, and Cosine Similarity.
Processing Language
NER identifies entities based on their context and usage, enabling distinctions to be made between, for example, “Elon”, the name of Tesla’s CEO, and the university of the same name in North Carolina.
NED can recognise words with more than one meaning to identify the correct sense in which a term is intended. The word “bank”, for example, could refer to either a large financial institution or the side of a river. If accompanied by terms such as “withdraw” and “money”, for example, it probably means the former; if accompanied by “river” or “water’s edge”, it means the latter.
Lemmatization is an NLP process that standardizes the meaning (semantics) and shape (morphology) of words concerning their meanings. For example, “Tesla,” “his firm,” “the company,” and “it” are all noun phrases that can refer to a single entity, even though they have different forms. Lemmatization understands all these as signifying the same thing.
Embeddings are numerical representations of a word that enable its manifold contextual meanings to be calculated relationally. Embeddings typically arrange words in numbered vectors with hundreds of dimensions that encode the contexts in which words appear and thus also encode their meanings.
Vectors can be compared, scaled, added, and subtracted. The classic simple example of how this works is that the vector representations of King and Queen bear the same relation to each other as the representations of Man and Woman — once you subtract the vector representing Royal.
Using embeddings to convert words to vector representations means the similarity of two words can quickly be gauged by comparing the angle between them, a process known as Cosine Similarity.
Similar to a correlation coefficient, two vectors aligned in the same orientation will have a similarity measurement of 1, signifying they mean the same thing, while two orthogonal vectors have a similarity of 0. If two vectors are diametrically opposed, the similarity measurement is -1 (opposite meanings).
SESAMm uses two primary algorithms to undertake embeddings - GloVe and BERT.
GLoVe combines the global statistics of matrix factorisation techniques like Latent Semantic Analysis (LSA) with the local context-based learning of word2vec, BERT processes entire sentences rather than sequences of words. SESAMm uses BERT for multilingual NLP of its extensive foreign language text because it was pre-trained using Wikipedia in over 102 languages.
The virtue of using these NLP techniques is their capacity to identify new keywords not already in the Knowledge Graph-based on vector similarity.
Generating Actionable Insights
Once text data is analysed, it is aggregated and condensed into metrics providing a daily aggregated view for each entity, highlighting trends at a sentence, article, and entity level.
Below are some use cases of how SESAMm’s data can be applied in real-world financial situations.
Data in Action: SESAMm ESG Scores For Equity Trading
SESAMm’s ESG scores were incorporated into a strategy for trading the Stoxx600 Index.
The long/short strategy delivered a Sharpe ratio of approximately 1 with annualized returns of 6.1% and 5.9%, respectively.
Returns were particularly robust over the 3 years up to 2020: +6.0% in 2018, +7.3% in 2019, and +11.3% in 2020.
ESG Sentiment and Volume as predictive Indicators — Wirecard
SESAMm’s ESG data can also be used in a discretionary approach.
An example of this is the Wirecard scandal which broke on June 21, 2020, when newswires reported the major German payment processor had filed for bankruptcy after admitting €1.9 billion ($2.3 billion) of purported escrow deposits simply didn't exist. Not surprisingly, Wirecard’s share price plummeted after the revelation.
SESAMm’s TextReveal® Streams platform’s ESG Scores (Sentiment) metrics for Wirecard in the chart below show a steady increase in negative sentiment for 'Governance' the most relevant ESG factors in this case, in late March-early April, and then just before the scandal broke in early June when it clearly diverged higher than the other two.
The relatively high rate-of-change of negative Governance sentiment as it peaked in early June may also have provided an early warning signal.
SESAMm’s TextReveal® Streams platform can be used in a wide variety of investment use cases and custom projects.
Access the full NLP Best Practices report https://www.sesamm.com/nlp-best-practices-for-financial-and-esg-insights/
To request a demo or for any other questions, please visit our website www.sesamm.com, contact info@sesamm.com, or call one of our representatives.
Disclaimer
The contents of this document do not constitute an offer or solicitation to buy services or shares in any fund.
The information in this document does not constitute investment advice or an offer to invest or to provide management services and is subject to correction, completion, and amendment. Past performance is not indicative of future results.