Data Augmentation for Image Classification Applications Using Deep Learning

저자 Johanna Pingel, August 22, 2019

13 회 조회 (최근 30일) | 0 좋아요 | 1 댓글

This post is from Oge Marques, PhD and Professor of Engineering and Computer Science at FAU. Oge is an ACM Distinguished Speaker, book author, and 2019-20 AAAS Leshner Fellow. He also happens to be a MATLAB aficionado and has been using MATLAB in his classroom for more than 20 years. You can also follow him on Twitter (@ProfessorOge)

The popularization of deep learning for image classification and many other computer vision tasks can be attributed, in part, to the availability of very large volumes of training data. The general consensus in the machine learning and deep learning community is that, all other things being equal, the more training data you have the better your model (and, consequently, its performance on the test set) will be.

There are many cases, however, in which you might not have a large enough data set for training a model that can learn the desired classes in a way that will generalize well to new examples, i.e. will not overfit to the training set. This can often be due to the difficulty and cost associated with the acquisition and labeling of images to build large training sets, whether cost is expressed in terms of dollars, human effort, computational resources, or the time consumed in the process.

In this blog post we will focus on a technique called data augmentation, which is used to augment the existing dataset in a way that is more cost-effective than further data collection. In the case of image classification applications, data augmentation is usually accomplished using simple geometric transformation techniques applied to the original images, such as cropping, rotating, resizing, translating, and flipping, which we'll discuss in more detail below.

The effectiveness and benefits of data augmentation have been extensively documented in the literature: it has been shown that data augmentation can act as a regularizer in preventing overfitting in neural networks [1, 2] and improve performance in imbalanced class problems [3]. Moreover, the winning approaches to highly visible challenges and competitions in image classification (such as ImageNet) throughout the years have used data augmentation. This has also motivated recent work on the development of methods to allow a neural network to learn augmentations that best improve the classifier [4].

Data Augmentation Implementation in MATLAB

Image data augmentation can be achieved in two ways [5]:

offline augmentation: which consists of performing the transformations to the images (potentially using MATLAB's batch image processing capabilities [6]) and saving the results on disk, thereby increasing the size of the dataset by a factor equal to the number of transformations performed. This can be acceptable for smaller datasets.
online augmentation or augmentation on the fly: which consists of performing transformations on the mini-batches that would be fed to the model during training. This method is preferred for larger datasets, to avoid a potentially explosive increase in storage requirements.

MATLAB provides an elegant and easy-to-use solution for online image data augmentation, which consists of two main components:

augmentedImageDatastore: which generates batches of new images, after preprocessing the original training images using operations such as rotation, translation, shearing, resizing, or reflection (flipping).
imageDataAugmenter: which is used to configure the selected preprocessing operations for image data augmentation.

The following image data augmentation options are available in MATLAB using the imageDataAugmenter object:

Rotation
Reflection around the X (left-right flip) or Y (upside-down flip) axis
Horizontal and vertical scaling
Horizontal and vertical shearing
Horizontal and vertical translation

A few points are worth mentioning here:

When you initialize the imageDataAugmenter variable, you can choose one or more options, e.g., only X and Y reflection and horizontal and vertical scaling, as in the code snippet below.

imageAugmenter = imageDataAugmenter( ...
    'RandXReflection', true, ...
    'RandXScale',[1,2], ...
    'RandYReflection', true, ...
    'RandYScale',[1,2]);

The values that you pass as parameters to some of the options (e.g., [1 2] for the X and Y scaling above) are meant to represent a range of values from which a random sample will be picked during the preprocessing step, if that transformation is applied to an image.
There is also an option in imageDataAugmenter for providing a function that determines the range of values for a particular parameter, for example a random rotation between -5 and 5 degrees (see code snippet below).

imageAugmenter = imageDataAugmenter('RandRotation',@() -5 + 10 * rand);

There are two ways to access the actual preprocessed images (for inspection and display, for example):

Starting in R2018a, there are read/preview methods on augmentedImageDatastore that allow you to obtain an example batch of images (see code snippet below, which produces a tiled image such as the one in Fig.1, using the Flower Classification example: [8] )

imageAugmenter = imageDataAugmenter('RandRotation',@() -20+40*rand);

augImds = ... augmentedImageDatastore(imageSize,imds,'DataAugmentation',imageAugmenter);

% Preview augmentation results 
batchedData = preview(augImds);
imshow(imtile(batchedData.input))

Starting in R2018b, a new method (augment) was added to the imageDataAugmenter, which serves two purposes: it functions as a standalone function-object as well as a configuration object for augmentedImageDatastore (see code snippet below, which might** produce the left-right flipped image such as the one in Fig. 2).

In = imread(which('peppers.png'));
Aug = imageDataAugmenter('RandXReflection',true);
Out = augment(Aug,In);
figure, montage({In, Out})

** Since the flip operation is randomly applied to the input image, you might have to run the snippet several times until an actual flip occurs.

Fig 1. Preview of augmented images processed with random rotation between -20 and 20 degrees.

Fig 2. Example of random reflection ('RandXReflection') around the vertical axis.

The augmentedImageDatastore and the imageDataAugmenter integrate nicely with the neural network training workflow, which consists of [7]:

Choose your training images, which you can store as an ImageDatastore, an object used to manage a collection of image files, where each individual image fits in memory, but the entire collection of images does not necessarily fit. This functionality, available since R2015b, was designed to read batches of images for faster processing in machine learning and computer vision applications.
Select and configure the desired image preprocessing options (for example, range of rotation angles, in degrees, or range of horizontal translation distances, in pixels, from which specific values will be picked randomly) and create an imageDataAugmenter object initialized with the proper syntax.
Create an augmentedImageDatastore, specifying the training images, the size of output images, and the imageDataAugmenter to be used. The size of output images must be compatible with the size expected by the input layer of the network.
Train the network, specifying the augmentedImageDatastore as the data source for the trainNetwork function. For each iteration of training, the augmented image datastore generates one mini-batch of training data by applying random transformations to the original images in the underlying data from which augmentedImageDatastore was constructed (see Fig. 3).

Fig 3. Typical workflow for training a network using an augmented image datastore (from [7]).

For a complete example of an image classification problem using a small dataset of flower images, with and without image data augmentation, check my MATLAB File Exchange contribution [8].

To summarize, data augmentation can be a useful technique when dealing with less than ideal amounts of training data. The references below provide links to materials to learn more details.

Thanks again to Oge for going in-depth into data augmentation. Do you have any questions for Oge? Leave a comment below!

References

[1] P. Y. Simard, D. Steinkraus, and J. C. Platt, "Best practices for convolutional neural networks applied to visual document analysis," in 2013 12th International Conference on Document Analysis and Recognition, vol. 2. IEEE Computer Society, 2003, pp. 958-958.
[2] D. C. Ciresan, U. Meier, L. M. Gambardella, and J. Schmidhuber, "Deep, big, simple neural nets for handwritten digit recognition," Neural computation, vol. 22, no. 12, pp. 3207-3220, 2010.
[3] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "Smote: synthetic minority over-sampling technique," Journal of artificial intelligence research, vol. 16, no. 1, pp. 321-357, 2002.
[4] J. Wang and L. Perez, "The Effectiveness of Data Augmentation in Image Classification using Deep Learning", 2017. https://arxiv.org/pdf/1712.04621.pdf
[5] B. Raj, Data Augmentation | How to use Deep Learning when you have Limited Data - Part 2. https://medium.com/nanonets/how-to-use-deep-learning-when-you-have-limited-data-part-2-data-augmentation-c26971dc8ced
[6] Mathworks. "Batch Processing Using the Image Batch Processor App". https://www.mathworks.com/help/images/batch-processing-using-the-image-batch-processor-app.html
[7] Mathworks. "Preprocess Images for Deep Learning". https://www.mathworks.com/help/nnet/ug/preprocess-images-for-deep-learning.html
[8] O. Marques, "Image classification using data augmentation version 1.1.0", MATLAB Central File Exchange, 2019. https://www.mathworks.com/matlabcentral/fileexchange/68728-image-classification-using-data-augmentation