They say don’t judge a book by its cover, but no one said anything about movies, right?? In today’s post, I am pleased to welcome the winners of the “Most creative use of MATLAB” Award at TAMU Datathon 2020. TAMU Datathon is the world’s first and only Major League Hacking (MLH) Data Science Hackathon, aiming to connect top Data Science/Machine Learning talent with top companies. Our guest bloggers for the day – Vaishnavi Duraisamy, Adhithiyaraj Sankaranarayanan, Guru Sarath Thangamani, Priyanka Karuppuch Samy will talk about their poster based movie recommendation hack, over to our guest bloggers…
TAMU Datathon 2020 was our first hackathon and with the problem statement being generic, we were baffled on how to get started! When asked to come up with a problem statement, after pondering for a while, all that helped us was the movie night that we had the previous day. We found that it was difficult to pick a movie and wished there was a way to get movie recommendations based on our personal interests and Eureka! The fact that our perception of movies is significantly influenced by their poster gave us a spark! We just found our idea for the hackathon! This inspired us to design and develop an application that would recommend a similar movie based on the poster that we input. Considering these facts, recommendations based on posters seem to be a fair idea. Doesn’t it??
Breaking down the problem
To identify a similar movie based on the input poster image, we had to pick a machine learning algorithm that can capture the important image features of the input poster and retrieve posters with similar features as well as implement dimensionality reduction as we are working with image inputs. Thus, Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE) , Uniform Manifold Approximation and Projection (UMAP), and Autoencoders became our potential candidates. We decided to pick autoencoders for our implementation as, autoencoders can capture the non-linear features better and is known for its performance in lossy image compression tasks. Autoencoders are networks that try to replicate the input. The network is designed such that the input and output layers are of the same size. The network grows smaller until the middle layer and then grows in size again towards the output. This way it tries to capture only the important features necessary to recreate the input at the middle layer providing a way to implement dimensionality reduction as well.
How did we implement it?
The Encoder consists of multiple 2D convolutional layers to capture the spatial features of the input image, the output filter maps of the last convolution is then flattened and given to multiple dense layers to generate the encoded vector of the input image. In this way, all the images in the training dataset are mapped into a 10D latent space. The Decoder is used during the training process to reduce the reconstruction loss of the autoencoder.
During recommendation, an input image (poster of a movie) is given by the user, the algorithm retrieves the closest image based on the Euclidean distance mapped in the 10D latent space and outputs that as the recommended movie.
Our goal at the end of the hackathon was to have a simple, proof-of-concept application developed. Since we were using MATLAB for our machine learning model, we decided to use the same to develop our GUI. As all of us were quite new to the tool, we came up with a really simple user interface in the limited time where we have to upload our input poster image on one window with the recommendation give in the other window.
The recommended movie is like the user input. Both are of the same genre – horror! It is evident from the above example that the algorithm retrieves posters with similar features.
Like the previous example, the above example also recommends a similar movie. Both the movies are of the comedy-drama genre.
Why did we choose MATLAB?
This being our first Datathon, our focus was to get our deliverables right on time. Prototyping Machine learning algorithms in MATLAB is hassle-free and the computation time involved is quite low even without the usage of GPUs. MATLAB’s Deep Learning Toolbox helped us solve the problem quickly without having to worry about the intricacies of the coding language, allowing us to focus on solving the problem rather than looping in the spaghetti of code. Since our code involves the usage of Autoencoders, handling dimensions and hyperparameters are quite critical, MATLAB has clear and well-elucidated documentation which helped us build our algorithms efficiently in a very limited time. Finally, developing an app is a ‘few click’ process in MATLAB. Thus, considering these facts, we unanimously opted for MATLAB as our tool to build the recommendation system.
Though the recommendations have turned out good, there is a huge scope of extrapolating this work. As of now, the system recommends a single movie based on the poster we input. The number of recommended movies can be increased to give the user a pool of options. As of now, we have considered the poster’s pixel values to be the only features for the Machine learning algorithm, this can be combined with other features such as review (NLP techniques), rating, genre, length, language to give better recommendations. One major factor other than improving the model performance is to get a concrete way to measure the performance, since ‘liking a movie’ is not a measurable quantity, innovative performance metrics can be advantageous. Finally, labeled data across the globe with a diverse population can give a better recommendation system
No more wasting time on the weekends deciding on what movies to watch. We have a ‘to go’ application for this purpose and the best part is that this idea fetched us the 1st prize in TAMU Datathon 2020 for the most creative use of MATLAB. It was a great experience completing the project in one day. We thoroughly enjoyed the process of making this machine learning application. Our code is available on Github, feel free to try it out and let us know your thoughts!
To leave a comment, please click here to sign in to your MathWorks Account or create a new one.