Student Lounge

Sharing technical and real-life examples of how students can use MATLAB and Simulink in their everyday projects #studentsuccess

Hacking a YouTube Creator Assistant with MATLAB

Are you a content creator that is looking to make your content thumbnails more eye catching? Joining us today is Nathan Fong and Stuart Fong from Queens University in Canada! Read on to learn more about how their hack can help you! Over to you’ll guys..
Hi everybody, Nathan and Stuart Fong here! We are second-year computer science students at Queen’s University located in Ontario, Canada. We are hackathon lovers who enjoy learning about data science and machine learning. On July 15-17, we participated in the SelfieHacks II hackathon hosted by Major League Hacking (MLH) and created a project called YouTube Creator Assistant, which won the prize for best use of MATLAB.
YCA Team Photo.jpeg
Nathan (left) and Stuart (right)

Inspiration:

Going into SelfieHacks II, we had no ideas on what to make, but we knew that we wanted to create a project that empowered content creators. During our brainstorming session, we wondered “What is something that all content creators struggle with?,” where we came up with the idea to help content creators to grow their communities. We then narrowed the scope to helping YouTube creators, and helping their content reach a wider audience.
If we wanted to track how many people are actively engaging with a channel, one of the best indicators is the view count of their videos. As views and subscribers are main indicators of the success of a video or channel, we wanted to make a tool that increases these numbers. This then increases the exposure of their videos to new users, allowing the channel to grow. Thinking in this way, we finally came up with our project idea, which we called YouTube Creator Assistant.

Breaking down the problem:

We started with looking at the YouTube homepage and identifying what elements would persuade a user to click on a certain video over another, such as the title and thumbnail. In our program, we wanted to take these components of the video to generate a predicted view count. The user can test various combinations of components such as thumbnails, titles, video duration and categories to maximise the number of views. While editing can be done with trial and error after the video has been published, our solution allows it to be done beforehand. Views can then be gained more easily during the time that is most crucial: right at the beginning.

How did we implement it?:

We used the Youtube Thumbnail and Youtubers Saying Things datasets, but before we could use them, we had to clean the data. To start off, some of the columns were unneeded such as the video link and transcript. While we could use the video itself to pull some features, we decided against it for now and deleted the columns. Moving on, some of the variables were not in usable formats, where viewer and subscriber counts were abbreviated, and the video length was in HH:MM:SS format. Fixing this in MATLAB was very convenient as we could open the data table beside us, allowing us to see changes in real-time.
YCA Data.png
To create our model, we first looked at the types of data we had, which included images for the thumbnails, language data for the titles, and tabular data for the rest of the information. For the thumbnails, we used a convolutional neural network (CNN) to identify eye-catching elements of the image (AKA clickbait). Next for the titles, we extracted features that we thought were useful, such as the length and the percentage of capital letters. Finally for the tabular data, we used a fully connected neural network to predict how each variable relates to the resulting number of viewers. Then, we combined the outputs of the two networks, giving us the predicted viewer count.
After being introduced to MATLAB during a workshop at Local Hack Day: Build 2022, we wanted to try using one of the tools, Deep Network Designer, to build our neural networks. While using it, we saw how easy it was to prototype our model. The drag-and-drop interface allowed us to quickly change or swap out our layers without lowering the readability of our code. The process was as easy as creating how our model looked, choosing the input and output datastores, and then starting the training.
YCA Network.png
To deploy our model, we wanted to use Gradio as it is a web interface that we were more familiar with. The problem with this is that our model was created using MATLAB while Gradio uses the Python programming language. Luckily, MATLAB offers something called MATLAB Engine, which allows us to run MATLAB code in Python. To do this, we first installed it, and then imported it using the following code:
import matlab.engine
eng = matlab.engine.start_matlab()
We were then able to take inputs from our Gradio web app in Python, feed them into our MATLAB model, and output the predicted view count as a Python integer.

Results:

We tested our model by creating a fake thumbnail and filling in some details about our hypothetical video and channel. We then tried changing the thumbnail and title to one that we thought would attract more viewers and as expected, the predicted number of views increased!
Overall, our finished model performed well on new input++s, where a more “clickbait-y” thumbnail or title is predicted to have a greater number of views. Despite this, we found during testing that it has some difficulties with outputting an accurate prediction of the viewer count for
+ channels with a small number of subscribers. This is fine for the intended purpose of the model, but we feel that it would benefit from some additional data, as the current data only features the most trending and popular creators. In the future, we plan to give the model more data specifically containing YouTube channels with fewer subscribers, so that the model can better identify how a specific feature impacts the resulting viewer count. Watch this video to see how our code works
YCA demo 3.jpg

Key Takeaways:

Compared to Python, we found that MATLAB was easier to use for prototyping, as there were many built-in functions to make coding quick and easy. The huge amount of documentation reduced the difficulty of trying new things, allowing us to explore more of MATLAB’s many features. YouTube Creator Assistant was a fun project to work on and we learned a ton about MATLAB’s features for data science and machine learning, as well as its Deep Network Designer and MATLAB Engine.
If you have any comments or questions about this project, feel free to reach out to us! Our code is available on Github, and you can see more about this project on our Devpost submission page.

 

|
  • print

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.