HomeMioTech ResearchArticle Detail

The Mathematics In English

In this article, we speak to Evan Schnidman, CEO of Prattle about his startup, NLP and the adoption of technology in the finance sector in the U.S.

Evan Schnidman, CEO of Prattle 2019-02-22

What is Prattle?

Prattle is a research automation firm, specialising in the analysis of complex market moving language. It started based on my academic research. I started my academic career as a game theorist modeling small group decision making during the financial crisis. I was particularly interested in modeling small group decision making and the most interesting small group of decision makers in the world at the time was the Federal Reserve. So, I set out to model how the FED makes decisions and how those decisions affect financial markets. I ended up backing into a challenging problem. First of all, financial market responses are by definition idiosyncratic. They don’t actually have a systematic response that is modelable, largely because the market response is based on language and linguistic cues. So, I set out to develop a way to provide a more comprehensive, unbiased means of analyzing so-called FED speak. I delved into sentiment analysis literature and discovered that the state of that literature wasn't advanced enough to analyze complex, nuanced language. I began working with my business partner, Bill MacMillan to develop a method that could provide comprehensive, unbiased, quantitative analysis of the most nuanced, market-moving language. Since then, we’ve gone on to apply it to not just Central Bank communications, but to corporate communications, with our technology processing around 5 million documents a day.

What datasets are you looking at?

We only look at primary source content, particularly publically available content. We’re processing everything in the corporate sphere like earnings calls, regulatory filings, speeches by corporate officers, investor days, press releases and everything on the corporate website. It’s all publicly available information because the goal is to identify what language is affecting markets and specifically how the market is likely to respond to each communication.

What started off as a dissertation, ended up being a much broader tool kit for analyzing complex language. Methodologically, the real innovation here is that we are mathematically modelling the patterns in language. We are mapping how every word, phrase, sentence and paragraph interrelates and how that language in turn relates to market response. Instead of identifying a few buzzwords that might be positive or negative, we’re actually screening the data in clusters of language, in much the same way the human brain processes language when we read it, write it or think it. We don’t think in terms of individual words or phrases, we tend to think in strings and clusters of language, so we built a system that does just that.

In fact, at Prattle we have built and maintain a unique lexicon for every publicly traded company in the U.S. Each company has its own specific products and services, as well as its own set of management personalities and institutional language. Human analysts know this and account for it when examining corporate communications, so does our system.

What was financial landscape like when it comes to technology when you first started?

We launched the company in late 2014, in a large part running a research business where we also sold quantitative signals based on our central bank analysis. When I think about it, this was what the market needed at that point, to transition into becoming comfortable with the tools we have today. After speaking with prospective clients, we found out that what they were really struggling with was the overwhelming amount of corporate communications as well as central bank communications. They simply could not keep up with the flow of earnings calls during peak earnings season. We ultimately decided to spin out the research business and raise money to scale as a technology company. We spent the bulk of 2017 building out the infrastructure to analyze every primary source corporate communication from every publicly traded company in the U.S.

One of the big challenges for Bill and I has been that when we first started Prattle we simply didn’t realise how far ahead of the curve we were back in 2014. Sentiment analysis was around but it was terrible. We knew our system was more sophisticated, but I don’t think we realised how far ahead we were with both in our NLP engine and broader data analytics. I think, coming from academia, I had assumed that the top quant/hedge funds in the world would be at the forefront of technology, but as it turns out, there’s still a pretty steep learning curve for them. They have long established infrastructures and legacy costs. For us, coming in as new entrants, building from scratch, we were able to act much faster and stay significantly ahead of the curve relative to where the market is, technologically.

What were the challenges that you faced setting up Prattle?

The biggest challenge for us is that anybody who ever looked at a sentiment analysis tool before was immediately predisposed to hate what we were doing. The truth is the vast majority of sentiment analysis is terrible. We had to explain in every meeting how technologically, we had really reinvented the wheel. So that was a huge challenge for us, having to explain to people that what we do is methodologically distinct. On top of that we got a lot of people that say, for example, I as a trader might have been wrong, and you as an automated system might’ve been right, but you were right for the wrong reasons. You hear that a lot. But if our system keeps being right, and that person keeps being wrong, they’re missing out on an opportunity to make money in their chosen profession of trading. Eventually, this notion of who is right and wrong becomes a hard thing to say, and that’s when people start listening.

What I found is that skepticism doesn’t necessarily mean that they won’t take the meeting. We’ve spent a lot of time talking to prospective clients who had no interest in actually subscribing. A lot the time is spent educating the market, getting people comfortable with the idea and proving that it’s credible. Not just that it works, but the underlying theory behind it is sound, the methodology is sound, and of course, that the output, both the data and the automated research tools we provide, is helpful, useable and tradable as a signal. I feel the hedge fund community has largely turned the corner and are now utilizing many of these tools. I suspect that a lot of asset management firms and banks will start following suit in the next couple of years.

What was the biggest inefficiency you saw in the market?

What we realised was that the way people consume information is wildly inefficient. There’s too much information out there to rely on search engines and other traditional means of aggregating information. On top of that if you look at the way traditional equity analysis happens, 20 years ago an analyst covered maybe 10-12 stocks. Now it’s maybe 20 stocks, and its rising in large part due to the european financial regulations which are forcing all firms to hard dollar research. As a result, people are not doing the deep dive analysis that they used to do; they’re not diving into every earnings call, they’re not reading every transcript, they’re not learning the financials of companies. Either that, or they’re wildly overworked.

As the number of stocks covered by an analyst continue to rise, the way that work has been conducted historically is just not sustainable. They cannot do good research with the same old methods. We saw an opportunity to provide a solution to this pain point: we offer the ability to not have to dial into every earnings call, to not have to read every transcript. You get an automated research report, that tells you the highlights of the call, who spoke, what percentage of time, the sentiment of every speaker, and the most salient remarks (the couple of sentences that are most likely to move stock prices or are most likely to show up traditionally on analyst reports). To get that delivered directly to your inbox or as an alert on your phone, to really streamline your workflow, that's where we see that this can be a real efficiency tool. Beyond that, it allows you to step back a moment, and think about how you make decisions, how are you analysing this, are you being comprehensive, are you being unbiased? Mitigating human cognitive bias is imperative when conducting investment research, but it is very hard to do via traditional research methods.

Can you share a particular use case?

Our initial work on FED communications sparked a lot of interest. It wasn’t just a novel way of interpreting transcripts, it allowed us to present information mathematically. FED watchers were no longer left interpreting communications as seeming hawkish or feeling dovish, they could see that our system scored a communication as a standard deviation more hawkish or dovish. It provided some level of precision when analyzing the FED’s communication strategy. After analyzing over hundreds of speeches, papers and interviews, we were able to gauge the FED’s view on the future path of interest rates. By analyzing the data set of Fed communications from 1998 to 2005 coupled with subsequent market reactions, we were able to score certain strings and clusters of language as hawkish or dovish depending on how bonds, currencies and the equity prices react.

When it comes to earnings calls, we’ve done a huge amount of work analyzing language around risk, taxes, even lawsuits. That said, the most interesting use case is probably the research that Bloomberg showcased in September of 2018 where we were able to identify gender representation in US calls. We’ve found that only 8% of the language is attributable to women. Even controlling for role, men tend to speak for a longer than women when they hold the same position. This has all kinds of implications for corporate governance.

Over the last 3 years we’ve been analyzing central bank communications from the G10 currency central banks, and using our data we’ve been 98% accurate of projecting the next policy move from the G10 central banks. To put that into perspective, same day futures pricing was only 92% accurate over the same period. That means we are more accurate than the market is at predicting the next policy move from Central Banks. It’s because we can be comprehensive and unbiased, two things human beings struggle with. As a human analyst with a fair amount of experience analyzing central bank policy, I can tell you I haven’t always personally agreed with all the calls we made, and by and large I was personally wrong, and our technology was right. Our technology is giving clients the ability to look at things with fresh eyes. The end result is that a combination of our comprehensive, unbiased technology with a human expert results in better analysis. Moreover, using automated tools to streamline the research process and provide checks on human cognitive biases, results in more efficient, better research.

What do you think is in store for NLP in the next couple of years?

Historically, NLP, especially sentiment analysis, has been driven by identifying buzzwords. This caricature of language is only modestly useful, but it highlights a crucial step to teaching machines to understand context, specifically Named Entity Recognition (NER). NER is incredibly important because it allows us to know that an article is about Apple the company as opposed to apple the fruit. Neural nets models have done a lot for improving NER and allowing for proper attribution, not just identification of named entities. As this technology disseminates people will be able to do higher order modeling of language much more efficiently. This higher order modeling of language patterns is what Prattle has been doing for several years, so it is great to see the market moving in our direction.

That said, modeling language properly is hard and mathematically intensive. So, the broader data science toolkit is increasingly important for NLP. In fact, at Prattle we have built our own backend data science platform to go from static models to fully deployed production level code simply by navigating a series of dropdown menus. This has allowed us to iterate on model development and model deployment more efficiently than ever before, thereby allowing us to continually improve upon our analysis. We think these types of data science tools will be vital across all data and NLP intensive industries going forward.

What do you think can be improved in terms of the rate of technology adoption in the US?

In my experience, the financial services sector struggles with the rate of tech adoption. It seems that most of the biggest players in the sector do a better job right now of marketing their AI capabilities than actually using basic data science tools. That said, I think this will change over the next 3-5 years. I already see major hedge funds in the US taking the lead, adopting and implementing AI architecture. It will take longer for the banks and asset managers to follow suit, but they will eventually get there.

What is in store for Prattle?

It’s an exciting time for Prattle. We already have comprehensive, global coverage on Central Banks. We’re currently looking to expand our language capabilities and our analysis of equities globally, not just in the US. Apart from publicly traded companies, we’re also looking at analysing privately held companies. There is a lot you can glean from the pre-IPO market, especially this year with the likes of Uber, Airbnb waiting in the wings.

We’re also broadening our scope to cover Risk Analysis. So adapting our NLP capabilities to analyse regulatory filings, legal proceedings, subpoenas, shareholder lawsuits etc.

And perhaps most interesting, we are seeing growing interest from corporate clients, specifically investor relations teams that craft financial disclosure language. They’re particularly interested in seeing how something they’ve written would fair in the market based on historical data.