Tennis Analysis with AI: Interactive Ground Truth Labeling

Posted by Sivylla Paraskevopoulou, October 1, 2025

46 views (last 30 days) | 0 Likes | 0 comment

This blog post is from Cory Hoi, Engineer at MathWorks Engineering Development Group.

With the rapid advancement of artificial intelligence (AI), harnessing its power is now more accessible than ever. I imagine that the arrival of the personal computer was equally transformative. We are now seeing AI advancements in areas like computer vision and natural language processing (NLP) being applied in chatbots, healthcare research, transportation, education, and sports. It is everywhere we look – so integrated into our daily lives that many of us hardly notice its presence. For example, just turn on your TV to your favorite sports broadcast.

For me, it’s tennis. At this past Wimbledon, line calls were fully automated, eliminating the need to argue with judges over close calls. Ball tracking and real-time data analysis have also become integral parts of the game, representing a leap forward from sports analytics just five years ago. Players now use AI to study playing patterns and refine their tactics. Yet, as an avid fan, I can’t help but wonder: how does all of this actually work?

In this two-part blog post series, I will show you how to build and leverage deep neural networks in MATLAB for object detection. This first blog post focuses on the initial steps of labeling the data. It will go over the many tools available in MATLAB to ease the typical pains of data labeling.

Object Detection in Sports

As you may have guessed, my sport of choice is tennis. However, the object detection methods described here have many applications, in other sports like basketball and football, and beyond sports like in autonomous vehicles. Object detection in tennis has greatly improved my viewing experience in recent years. This is evident when looking at instant replays of tennis points with the ball clearly marked and tracked.

Data preparation is an arguably crucial but often overlooked step in any AI task. It involves labelling the dataset to create ground truth data for training, and also preparing the dataset into the correct form.

Interactive Video Labeling

In MATLAB, video labeling is made easy with the Video Labeler app. The app allows you to interactively label shapes or regions of interest (ROI) with rectangles, polylines, pixels, and polygon ROI labels. In this post, we are using Video Labeler to label the tennis ball, and also the tennis court in our dataset.

Open the Video Labeler app from the Apps tab, under Image Processing and Computer Vision. Create a new project and import the video from the trainingData folder.

After loading the video, the app will display the first frame in the middle of the screen. In the panel just below, you can navigate between frames with the left and right arrows.

Video Labeler

Manual vs Automatic Labeling

In MATLAB, there are numerous ways to label a dataset. For example, you can define bounding boxes around the people playing tennis and give them the label name “person”. You can also define lines to label the lines on the court as linear objects. However, for this project, let’s label pixels by identifying the tennis ball, give it the label name “ball”, and assign the Color to green.

After adding the label, it will appear in the ROI Label Definitions pane to the left. This allows you to easily switch between object labels if there are multiple objects in a single image. There are multiple manual and automatic algorithms available.

Manual algorithms such as Polygon and Brush, allow you to exactly define the area to label. While automation algorithms leverage a range of techniques to speed up and ease typical pains of the labeling process. Some additional automation algorithms include Superpixel, Segment Anything, and Assisted Freehand.

For example, in the following video, the ball and the person are manually labeled with the Brush and the Polygon. The ball is easy to label since it only takes a single click with the brush tool. However, labeling the person is more challenging. The polygon tool uses straight lines that don't automatically snap to the person's edges. This label could be improved by using much shorter line segments, though this would require more time.

Manually labeling the ball and person

One downside of manually labeling frames are miss-clicks. In these instances, you can delete the labels by right clicking the frame and clicking Delete All Pixel Labels or use the shortcut (Ctrl + Shift + Delete). Alternatively, you can use one of the semi-automated algorithms to speed up the process. For example, in the following video, the Assisted Freehand algorithm automatically detects edges along the person. Every left click creates a vertex point and results in a new pivot point for a subsequent line. Using left clicks more frequently will result in higher accuracy.

Automatic object labeling with SAM

SAM automatically detects the edges and segments in the frame with just a few clicks. In the example above, the singles tennis court is labeled around the tennis player while the doubles court, background, and out-of-bound regions are not labeled.

You can also consider other automation algorithms for labeling:

The Flood Fill algorithm is efficient in labeling a group of connected pixels that have a similar color. However, when colors are similar in values, the Flood Fill tool is not recommended.
The Smart Polygon algorithm might be a good alternative for discriminating between similarly colored objects. It estimates the shape of an object of interest within the polygon that you draw. This is useful when the object is not a simple polygon.
The Superpixel algorithm overlays a grid of super pixels with adjustable sizes. For example, in the following GIF, the pixelated grid is initially too large. After the grid is refined, the court can be more accurately labeled.

Automatic labeling with Assisted Freehand

Saving Ground Truth

Continuing to label the frames is straightforward. To label the next frame, click the next frame button. After all the frames have been labeled, the project can be saved, and the data can be exported. Saving the project allows you to reopen it from the same state it was left.

To export the data, click the green checkmark in the toolbar and export the data to a file. This will create a gTruth MAT-file that can be loaded into the workspace when training the neural network later.

Conclusion

This workflow illustrated how the Video Labeler app can be used to label different shapes. It offers a variety of labeling algorithms, each with advantages and disadvantages in different labeling scenarios.

Stay tuned for the next blog post, where the labeled dataset is used to train and test a deep neural network that tracks the tennis ball.