Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Follow publication

Photo by Edward Ma on Unsplash

Member-only story

A Look at Data Augmentation | Towards AI

Unsupervised Data Augmentation

Edward Ma
Towards AI
Published in
4 min readAug 5, 2019

--

The more data we have, the better the performance we can achieve. However, it is very too luxury to annotate a large amount of training data. Therefore, proper data augmentation is useful to boost up your model performance. Authors of Unsupervised Data Augmentation (Xie et al., 2019) proposed Unsupervised Data Augmentation (UDA) assistants us to build a better model by leveraging several data augmentation methods.

In natural language processing (NLP) field, it is hard to augmenting text due to high complexity of language. Not every word we can replace it by others such as a, an, the. Also, not every word has synonym. Even changing a word, the context will be totally difference. On the other hand, generating augmented image in computer vision area is relative easier. Even introducing noise or cropping out portion of image, model can still classify the image.

Xie et al. conducted several data augmentation experiments on image classification (AutoAugment) and text classification (Back translation and TF-IDF based word replacing). After generating large enough data set of model training, the authors noticed that the model can easily over-fit. Therefore, they introduce Training Signal Annealing (TSA) to overcome it.

Augmentation Strategies

This section will introduce three data augmentation in computer vision (CV) and the natural language processing (NLP) field.

AutoAugment for Image Classification

AutoAugment is found by google in 2018. It is a way to augment images automatically. Unlike the traditional image augmentation library, AutoAugment is designed to find the best policy to manipulate data automatically.

You may visit here for model and implementation.

Generated result by AutoAugment (Cubuk et al., 2018)

Back translation for Text Classification

Back translation is a method to leverage the translation system to generate data. Given that we have a model for translating English to Cantonese and…

--

--

Published in Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Written by Edward Ma

Focus in Natural Language Processing, Data Science Platform Architecture. https://makcedward.github.io/

No responses yet

Write a response