Jumpstart your DCASE Challenge 2021 using MATLAB

作者 Johanna Pingel, June 28, 2021

1 次查看（过去 30 天） | 0 个赞 | 0 个评论

The following post is from Brian Hemmat, Audio Signal Processing Developer at MathWorks.

The Detection and Classification of Acoustic Scenes and Events (DCASE) community creates a yearly workshop and series of events that advance the state-of-the-art in computational scene and event analysis by bringing together researchers from both academic and industrial backgrounds.

Each year, new and updated datasets and competitions are released that explore different applications, requirements, and goals. This year, the DCASE 2021 Task 1a challenge is to perform low-complexity acoustic scene classification that is robust to various recording devices, such as studio-quality microphones and those on smart phones and video cameras. The goal is to classify audio into one of 10 acoustic scenes such as airport, metro station, traveling by a tram, or public square. Samples were collected from cities like Prague, Paris, and Barcelona.

Creating Baseline in MATLAB

The official baseline for Task 1a was released in Python, using TensorFlow for deep learning, and does preprocessing with a provided DCASE utility toolbox.

I reimplemented the baseline in MATLAB. The MATLAB implementation is contained within a single script making it easy for non-experts to explore the data, understand the baseline implementation, and modify it for submission. Audio Toolbox in MATLAB offers functions and apps to extract audio features (audioFeatureExtractor) and augment data (audioDataAugmenter), making it easy to explore modifications to the system.

Part of the challenge is to develop a model with a 128 KB upper bound for non-zero parameters. This may be accomplished by developing a small model to begin with, by pruning a model, or by quantizing a model from the standard 32-bit floating point used for training to a smaller number of bits. This MATLAB baseline code leverages the dlquantizer object and quantizes the network to use 8-bit integers with the Deep Learning Toolbox Model Quantization Library.

Note: if you don’t already have access to MATLAB, Deep Learning Toolbox, and Audio Toolbox, you can get a free 30 day trial.

Quantizing the Baseline

Applying quantization is a straightforward task using dlquantizer. To use it, you specify the network you want to calibrate and the execution environment, and then calibrate with calibration data.

quantObj = dlquantizer(net,'ExecutionEnvironment','GPU');

The dlquantizer object requires image datastores to perform calibration. Wrap the features and labels in augmentedImageDatastore objects.

augsimdsTrain = augmentedImageDatastore([numFeatures,numHops],trainFeatures,trainLables);
augsimdsTest = augmentedImageDatastore([numFeatures,numHops],testFeatures,testLables);

Use the training set to calibrate the dlquantizer object.

calResults = calibrate(quantObj,augimdsTrain);

One tip to keep in mind: Currently, dlquantizer does not support audioDatastore input. To use it with audio-based data (in this case, mel spectrograms), you must place the training data in memory, and then wrap it an augmentedImageDatastore, as shown in the code above. Then, you specify the augmentedImageDatastore as the calibration data to use when calibrating the network.

Mel spectrogram provides visualization for audio data

One advantage of using dlquantizer is that it quantizes to int8, a low-precision data type that can be deployed to many embedded systems. It achieves this quantization result with minimal loss of accuracy, effectively creating the same network as the Python baseline which was quantized to float16.

Showing the Deep Network Quantizer app in action. You can use the app version of dlquantizer to quickly see which layers are quantized, and the dynamic range of the weights, biases and activations, based on the dataset

Tools to Get Started

The goal of this baseline code is to inspire your solution to this interesting challenge in MATLAB. Extended capabilities in Audio Toolbox are available to jumpstart your design exploration. The result is a smaller, more contained baseline that is easier to improve upon. While the Python baseline quantizes to float16, int8 provides a smaller model with faster inference.

The baseline is on GitHub and you can download a free trial of MATLAB.

We hope that this baseline encourages new members to join the DCASE community, participate in the yearly competitions, and advance the state-of-the-art.

Download the code or fork, and get started! Let me know if you have any questions in the comments below.