NLP News Cypher | 04.12.20

Down the Rabbit Hole

Published in

Towards AI

6 min readApr 13, 2020

I called it RABBIT. My demo is finito. We built an app for those who are interested in streaming APIs, online inference, and transformers in production.

**update: 04.13.20: rabbit.quantumstat.com

The web app, (of which I’ve shown a glimpse in the past) attempts to do a very difficult balancing act. One of the hardest bottlenecks in deep learning today is leveraging state of the art models in NLP (transformers, RAM expensive) and being able to deploy them in production without making your server or bank account explode. I think I may have figured it out, at least for this app 😎.

What is it?

RABBIT streams tweets from dozens of financial news sources (the usual suspects: Bloomberg, CNBC, WSJ and more) and runs 2 classifiers over them in real-time!

What am I classifying? 1st model classifies 21 topics in finance:

The 2nd model classifies whether the tweet is either bullish, bearish or neutral in stance. What does this mean? It means that if you are an investor/trader holding gold, and the tweet mentions that the price of gold is up, this would be labeled bullish, the inverse bearish, and if you don’t care either way, it’s neutral. In fact, this app is supposed to be personalized to an individual user. Because what you will see is a demo for a general audience, I tried to generalize as much as possible with this classifying schema.

As a result, this assumes first-order logic in classification. Meaning, my logic is not assuming n-order effects. For example, if you hold oil, and oil goes up in price, this is considered bullish, (even though it is possible that the reason oil went up is because of some geo-political conflict which could have negative impact on the market (bearish), this is a hypothetical n-order effect).

What does it run on?

I’ve architected the back-end with the option of expanding both compute and connections if required. The transformers are the distilled version of RoBERTa that were fed over 10K tweets from a custom dataset. Currently, I’m leveraging message queues and an asynchronous framework to help me push tweets out to the user. Shout-out to Adam King for sparking the idea during one of our digital fireside chats. (FYI, you can check out his infamous GPT-2 model here: talktotransformer.com)

RABBIT uses a web-socket connection for the streaming capabilities and is run only on 4 CPU cores. While this compute may seem small, when married with this architecture, it’s actually lightning fast (even while doing online inference with 2 transformers!). Since the web-sockets are connected to the browser and data serving is uni-directional, scaling to the client-side is fairly robust.

Errata

Very recently there’s been some domain shift due to the coronavirus altering the news cycle (which has decreased the accuracy of the models). I will continually add more data to mitigate this, even though for now, it performs reasonably well.

Fin

Will officially release it tomorrow, April 13th. Check my Twitter for the update. FYI, the app is best experienced during weekly trading hours when the stock market is open so you can see it stream really fast (even though technically you can check it out anytime you want).

Proud of this work. It’s cheap, it’s powerful and it’s fast.

Possible future approaches will be to create a language-model from scratch, and then fine-tune it on the custom dataset I mentioned above. Additionally, would be nice to add more data in a dashboard with a live stock market stream.

How was your week? 😎

This Week:

Bare Metal
Colab of the Week, on Self-Attention
Hugging Electra
Colbert AI
A Very Large News Dataset
A Token of Appreciation
Dataset of the Week: X-Stance

Bare Metal

AI chip makers are betting that NLP models keep getting bigger and bigger although their chips are becoming smarter. The metal peeps say they want to isolate NN inputs to individual cores as opposed to batching them. The consequence is only neurons in your network that “need” to fire will do so since they are isolated:

“Companies are fixated on the concept of “sparsity,” the notion that many neural networks can be processed more efficiently if redundant information is stripped away. Lie observed that there is “a large, untapped potential for sparsity” and that “neural networks are naturally sparse.”

Startup Tenstorrent shows AI is changing computing and vice versa | ZDNet

Tenstorrent is one of the rush of AI chip makers founded in 2016 and finally showing product. The new wave of chips…

www.zdnet.com

With this knowledge, the new AI chips don’t need to train as long and can drop out of training earlier on. 🧐

Colab of the Week, on Self-Attention

I’ll let you explore this one:

Google Colaboratory

Edit description

colab.research.google.com

Hugging Electra

The new method for training language models with relatively low compute is now on the Hugging Face library. You may remember ELECTRA’s provoking performance 👇

huggingface/transformers

ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer…

github.com

It didn’t take long for developers to leverage ELECTRA, the Simple Transformers library, which is built on top of 🤗's Transformers, already has it:

Understanding ELECTRA and Training an ELECTRA Language Model

How does a Transformer Model learn a language? What’s new in ELECTRA? How do you train your own language model on a…

towardsdatascience.com

Colbert AI

GPT-2 strikes back with a bit of humor. Developers Abbas Mohammed and Shubham Rao created this model by extracting monologues from the Late Show’s video captions on YouTube. They provided a nice Colab notebook with excellent documentation for anyone who wants to do something similar. (you may need to get your own dataset 😢)

Colab:

Google Colaboratory

Edit description

colab.research.google.com

A Very Large News Dataset

Found this gem on Reddit. With the average developer getting closer and closer to training their own language models from scratch, super big datasets will grow more popular among NLP developers in the long term. This dataset holds 2.7 million news articles from the past 4 years:

All the News 2.0 : 2.7 million news articles - Components

An update to the popular All the News dataset published in 2017. This dataset contains 2.7 million articles from 26…

components.one

A Token of Appreciation

A relatively new tokenizer came to my attention this week boasting it’s speed versus other well known tokenizers (it was written in C++). If you want to compare how fast it does versus the others (Hugging Face, SentencePiece and fastBPE), check out their benchmark results:

Main Repo:

VKCOM/YouTokenToMe

YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte…

github.com

Benchmarks:

VKCOM/YouTokenToMe

YouTokenToMe will be compared with Hugging Face, SentencePiece and fastBPE. These three algorithms are considered to be…

github.com

Dataset of the Week: X-Stance

What is it?

“The x-stance dataset contains more than 150 political questions, and 67k comments written by candidates on those questions.” The comments are in English, German, French and Italian.

Sample:

Where is it?

ZurichNLP/xstance

Documentation and evaluation script accompanying the paper "X-Stance: A Multilingual Multi-Target Dataset for Stance…

github.com

Every Sunday we do a weekly round-up of NLP news and code drops from researchers around the world.
If you enjoyed this article, help us out and share with friends!
For complete coverage, follow our Twitter: @Quantum_Stat

NLP News Cypher | 04.12.20

Down the Rabbit Hole

What is it?

What does it run on?

Errata

Fin

This Week:

Bare Metal

Startup Tenstorrent shows AI is changing computing and vice versa | ZDNet

Tenstorrent is one of the rush of AI chip makers founded in 2016 and finally showing product. The new wave of chips…

Colab of the Week, on Self-Attention

Google Colaboratory

Edit description

Hugging Electra

huggingface/transformers

ELECTRA is a new method for self-supervised language representation learning. It can be used to pre-train transformer…

Understanding ELECTRA and Training an ELECTRA Language Model

How does a Transformer Model learn a language? What’s new in ELECTRA? How do you train your own language model on a…

Colbert AI

Google Colaboratory

Edit description

A Very Large News Dataset

All the News 2.0 : 2.7 million news articles - Components

An update to the popular All the News dataset published in 2017. This dataset contains 2.7 million articles from 26…

A Token of Appreciation

VKCOM/YouTokenToMe

YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte…

VKCOM/YouTokenToMe

YouTokenToMe will be compared with Hugging Face, SentencePiece and fastBPE. These three algorithms are considered to be…

Dataset of the Week: X-Stance

ZurichNLP/xstance

Documentation and evaluation script accompanying the paper "X-Stance: A Multilingual Multi-Target Dataset for Stance…

Written by Ricky Costa