For today’s blog post, Liping Wang joins us to talk about the IEEE SP Cup 2025 and how to kick off your project on deepfake face detection in MATLAB with the starter code. Over to you, Liping…
Hello, future innovators and AI enthusiasts! Are you ready to dive into the world of deepfakes and make your mark in the field of signal processing? We at
MathWorks invite you to participate in the
IEEE Signal Processing Cup challenge in 2025, “
Deepfake Face Detection In The Wild” (DFWild-Cup). This provides you a great chance to tackle real-world problems using cutting-edge AI techniques.
Why Is This Challenge Important?
With the rise of synthetic data generation, deepfakes have become a significant threat, capable of manipulating public opinion and even leading to identity theft. This challenge is your opportunity to develop methods to identify whether facial images are real or fake, using data captured in diverse, real-world scenarios.
What’s in It for You?
Participating in this challenge not only allows you to apply your skills to a pressing global issue but also gives you a chance to compete for a
US$5,000 grand prize at the
IEEE ICASSP 2025, the world’s largest technical conference on signal processing. Imagine presenting your work on such a prestigious platform!
Ready to Get Started?
We’ve prepared a MATLAB starter code to help you kick off your project. Here’s a quick guide on how to set up your environment and start experimenting with deepfake detection! To request your complimentary MATLAB license and access additional learning resources, please visit our
website. You also can find our self-paced online courses on MATLAB and AI at
MATLAB Academy.
Load Data
First things first,
register your team and then find the instructions on how to download the training and validation datasets. Store the archives in a subfolder named
datasetArchives in your current directory.
The code below will help you unzip the archives and organize the datasets into ‘real’ and ‘fake’ categories:
datasetArchives = fullfile(pwd,“datasetArchives”);
datasetsFolder = fullfile(pwd,“datasets”);
if ~exist(datasetsFolder,‘dir’)
untar(fullfile(datasetArchives,“train_fake.tar”),fullfile(datasetsFolder,“train”));
untar(fullfile(datasetArchives,“train_real.tar”),fullfile(datasetsFolder,“train”));
untar(fullfile(datasetArchives,“valid_fake.tar”),fullfile(datasetsFolder,“valid”));
untar(fullfile(datasetArchives,“valid_real.tar”),fullfile(datasetsFolder,“valid”));
Create Image Datastores
Image datastores are essential for handling large collections of images efficiently. An image datastore allows you to store extensive collections of image data, including those that exceed memory capacity, and efficiently read image batches during neural network training.
Here’s how you can set them up for your training and validation datasets. You need to specify the folder with the extracted images and indicate that the subfolder names correspond to the image labels in the function imageDatastore, and then shuffle the images.
trainImdsFolder = fullfile(datasetsFolder,‘train’);
validImdsFolder = fullfile(datasetsFolder,‘valid’);
imdsTrain = shuffle(imageDatastore(trainImdsFolder, …
IncludeSubfolders=true, …
LabelSource=“foldernames”));
imdsValid = shuffle(imageDatastore(validImdsFolder, …
IncludeSubfolders=true, …
LabelSource=“foldernames”));
By checking the size of files in the image data stores, you can see the training data store contains 262160 images while the validation one contains 3072 images. Since we do not have a test dataset for evaluating the performance now, we use the splitEachLabel function to partition the training image datastore into two new datastores, i.e. 10% for training and 2% for testing.
[imdsTrain,imdsTest] = splitEachLabel(imdsTrain,0.1,0.02,“randomized”);
Now let us get the class names and the number of classes, and then display some sample facial images as follows.
classNames = categories(imdsTrain.Labels);
numClasses = numel(classNames);
numImages = numel(imdsTrain.Labels);
idx = randperm(numImages,16);
I = imtile(imdsTrain,Frames=idx);
Load or Create a Network
Now that your data is ready, the next step is to load a pre-trained network or create a new one. Using a pre-trained network like ResNet or VGG can save time and improve performance, especially if you’re new to deep learning. MATLAB provides several pre-trained models you can use as a starting point.
Here’s a simple way to load a pre-trained network. As an example, we use the function
imagePretrainedNetwork to load a pre-trained ResNet-50 neural network with a specified number of classes. Note that you need to install the addon “
Deep Learning Toolbox Model for ResNet-50 Network” in advance of running the code.
net = imagePretrainedNetwork(“resnet50”,NumClasses=numClasses);
Prepare Data for Training
Preparing your data involves resizing images to match the input size of your network and augmenting them to improve model robustness. MATLAB makes it easy with built-in functions including imageDataAugmenter and augmentedImageDatastore.
Data augmentation techniques like rotation, scaling, and flipping can help make your model more generalizable. Here we perform additional augmentation operations including randomly flipping the training images along the vertical axis and randomly translating them up to 30 pixels horizontally and vertically on the training images.
inputSize = net.Layers(1).InputSize;
imageAugmenter = imageDataAugmenter( …
RandXTranslation=pixelRange, …
RandYTranslation=pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, …
DataAugmentation=imageAugmenter);
The validation and the testing images only need to be resized, so you can use an augmented image datastore without specifying any additional preprocessing operations to do the resizing automatically.
augimdsValid = augmentedImageDatastore(inputSize(1:2),imdsValid);
augimdsTest = augmentedImageDatastore(inputSize(1:2),imdsTest);
Train Neural Network
With your data and network ready, it’s time to train your model.
To do transfer learning, the last layer with learnable parameters requires retraining. This is usually a fully connected layer or a convolutional layer with an output size that matches the number of classes. To increase the level of updates to this layer and speed up convergence, you can increase the learning rate factor of its learnable parameters using the setLearnRateFactor function, i.e. set the learning rate factors of the learnable parameters to 10.
net = setLearnRateFactor(net,“res5c_branch2c/Weights”,10);
net = setLearnRateFactor(net,“res5c_branch2c/Bias”,10);
Next, define training options such as the optimizer, learning rate, and number of epochs. Your choices require empirical analysis. You can use the
Experiment Manager app to explore different training options with experiments. As an example, we set the training options as follows:
- Train using the Adam optimizer.
- To reduce the level of updates to the pre-trained weights, use a smaller learning rate. Set the learning rate to 0.0001.
- Validate the network using the validation data every 5 iterations. For larger datasets, to prevent validation from slowing down training, increase this value.
- Display the training progress in a plot and monitor the accuracy metric.
- Disable the verbose output.
options = trainingOptions(“adam”, …
InitialLearnRate=0.0001, …
ValidationData=augimdsValid, …
Plots=“training-progress”, …
Then train the neural network using the
trainnet function. You can use cross-entropy loss for image classification.
To train a model with GPUs, you need a Parallel Computing Toolbox™ license and a supported GPU device. Please find more information on supported devices at
GPU Computing Requirements.
By default, the trainnet function will use a GPU if one is available. Otherwise, it will use the CPU. You also can set the ExecutionEnvironment parameter in the training options to specify the execution environment.
net = trainnet(augimdsTrain,net,“crossentropy”,options);
Test Neural Network
Then evaluate your trained model on the test data set to see how well it performs on unseen data.
To make predictions with multiple observations, you can use the
minibatchpredict function, which will also use a GPU automatically if one is available.
YTestScore = minibatchpredict(net,augimdsTest);
You can use the scores2label function to convert the prediction scores to labels.
YTest = scores2label(YTestScore,classNames);
Then let us evaluate the classification accuracy as the percentage of correct predictions for the test data and visualize the classification accuracy in a confusion chart.
accuracy = mean(TTest==YTest);
Create submissions
When you have a model that you’re satisfied with, you can use it on the submission test dataset and create a submission!
The evaluation dataset will be released later. So, now we use the test data store created from the training data set instead to showcase how to create the required submissions.
testImgSize = size(augimdsTest.Files,1);
fileId = cell(testImgSize,1);
fileId{i,1} = augimdsTest.Files{i}(1,end-10:end-4);
resultsTable = table(fileId, YTestScore(:,2));
outPutFilename = ‘mySubmission.txt’;
writetable(resultsTable,outPutFilename,‘Delimiter’,‘\t’,‘WriteVariableNames’,false,‘WriteRowNames’,false)
zip([pwd ‘/mySubmission.zip’],outPutFilename)
Conclusion
Congratulations on setting up your deepfake detection model! By participating in the IEEE SP Cup 2025, you’ll gain invaluable experience in AI and signal processing, all while contributing to a crucial area of research. This is your chance to learn, innovate, and showcase your skills on an international stage.
Don’t forget to request your MATLAB license and explore additional resources on our
website. We’re excited to see how you tackle this challenge! Feel free to reach out to us via
studentcompetitions@mathworks.com if you have any questions. We can’t wait to see what you create!
Comments
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.