Student Lounge

Sharing technical and real-life examples of how students can use MATLAB and Simulink in their everyday projects #studentsuccess

Weather Forecasting in MATLAB for the WiDS Datathon 2023

In today’s blog, Grace Woolson gives us an insight into how you can get started with using Machine Learning and MATLAB for Weather Forecasting to take on the WiDS Datathon 2023 challenge. Over to you Grace..

Introduction

Today, I’m going to show an example of how you can use MATLAB for the WiDS Datathon 2023. This year’s challenge tasks participants with creating a model that can predict long-term temperature forecasts, which can help communities adapt to extreme weather events often caused by climate change. WiDS participants will submit their forecasts on Kaggle. This tutorial will walk through the following steps of the model-making process:
  1. Importing a Tabular Dataset
  2. Preprocessing Data
  3. Training and Evaluating a Machine Learning Model
  4. Making New Predictions and Exporting Predictions
MathWorks is happy to support participants of the Women in Data Science Datathon 2023 by providing complimentary MATLAB licenses, tutorials, workshops, and additional resources. To request complimentary licenses for you and your teammates, go to this MathWorks site, click the “Request Software” button, and fill out the software request form.
To register for the competition and access the dataset, go to the Kaggle page, sign-in or register for an account, and click the ‘Join Competition’ button. By accepting the rules for the competition, you will be able to download the challenge datasets available on the ‘Data’ tab.

Import Data

First, we need to bring the training data into the MATLAB workspace. For this tutorial, I will be using a subset of the overall challenge dataset, so the files shown below will differ from the ones you are provided. The datasets I will be using are:
  • Training data (train.xlsx)
  • Testing data (test.xlsx)
The data is in tabular form, so we can use the readtable function to import the data.
trainingData = readtable(‘train.xlsx’, ‘VariableNamingRule’, ‘preserve’);
testingData = readtable(‘test.xlsx’, ‘VariableNamingRule’, ‘preserve’);
Since the tables are so large, we don’t want to show the whole dataset at once, because it will take up the entire screen! Let’s use the head function to display the top 8 rows of the tables, so we can get a sense of what data we are working with.
head(trainingData)
lat lon start_date cancm3_0_x cancm4_0_x ccsm3_0_x ccsm4_0_x cfsv2_0_x gfdl-flor-a_0_x gfdl-flor-b_0_x gfdl_0_x nasa_0_x nmme0_mean_x cancm3_x cancm4_x ccsm3_x ccsm4_x cfsv2_x gfdl_x gfdl-flor-a_x gfdl-flor-b_x nasa_x nmme_mean_x cancm3_y cancm4_y ccsm3_y ccsm4_y cfsv2_y gfdl_y gfdl-flor-a_y gfdl-flor-b_y nasa_y nmme_mean_y cancm3_0_y cancm4_0_y ccsm3_0_y ccsm4_0_y cfsv2_0_y gfdl-flor-a_0_y gfdl-flor-b_0_y gfdl_0_y nasa_0_y nmme0_mean_y cancm3_0_x_1 cancm4_0_x_1 ccsm3_0_x_1 ccsm4_0_x_1 cfsv2_0_x_1 gfdl-flor-a_0_x_1 gfdl-flor-b_0_x_1 gfdl_0_x_1 nasa_0_x_1 nmme0_mean_x_1 tmp2m cancm3_x_1 cancm4_x_1 ccsm3_x_1 ccsm4_x_1 cfsv2_x_1 gfdl_x_1 gfdl-flor-a_x_1 gfdl-flor-b_x_1 nasa_x_1 nmme_mean_x_1 cancm3_y_1 cancm4_y_1 ccsm3_y_1 ccsm4_y_1 cfsv2_y_1 gfdl_y_1 gfdl-flor-a_y_1 gfdl-flor-b_y_1 nasa_y_1 nmme_mean_y_1 cancm3_0_y_1 cancm4_0_y_1 ccsm3_0_y_1 ccsm4_0_y_1 cfsv2_0_y_1 gfdl-flor-a_0_y_1 gfdl-flor-b_0_y_1 gfdl_0_y_1 nasa_0_y_1 nmme0_mean_y_1
___ ___ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ ______ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________27 261 01-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 12.044 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183
27 261 02-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 12.631 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183
27 261 03-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.305 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183
27 261 04-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.396 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183
27 261 05-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.627 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183
27 261 06-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.999 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049
27 261 07-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 14.223 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049
27 261 08-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 14.248 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049
head(testingData)
lat lon start_date cancm3_0_x cancm4_0_x ccsm3_0_x ccsm4_0_x cfsv2_0_x gfdl-flor-a_0_x gfdl-flor-b_0_x gfdl_0_x nasa_0_x nmme0_mean_x cancm3_x cancm4_x ccsm3_x ccsm4_x cfsv2_x gfdl_x gfdl-flor-a_x gfdl-flor-b_x nasa_x nmme_mean_x cancm3_y cancm4_y ccsm3_y ccsm4_y cfsv2_y gfdl_y gfdl-flor-a_y gfdl-flor-b_y nasa_y nmme_mean_y cancm3_0_y cancm4_0_y ccsm3_0_y ccsm4_0_y cfsv2_0_y gfdl-flor-a_0_y gfdl-flor-b_0_y gfdl_0_y nasa_0_y nmme0_mean_y cancm3_0_x_1 cancm4_0_x_1 ccsm3_0_x_1 ccsm4_0_x_1 cfsv2_0_x_1 gfdl-flor-a_0_x_1 gfdl-flor-b_0_x_1 gfdl_0_x_1 nasa_0_x_1 nmme0_mean_x_1 tmp2m cancm3_x_1 cancm4_x_1 ccsm3_x_1 ccsm4_x_1 cfsv2_x_1 gfdl_x_1 gfdl-flor-a_x_1 gfdl-flor-b_x_1 nasa_x_1 nmme_mean_x_1 cancm3_y_1 cancm4_y_1 ccsm3_y_1 ccsm4_y_1 cfsv2_y_1 gfdl_y_1 gfdl-flor-a_y_1 gfdl-flor-b_y_1 nasa_y_1 nmme_mean_y_1 cancm3_0_y_1 cancm4_0_y_1 ccsm3_0_y_1 ccsm4_0_y_1 cfsv2_0_y_1 gfdl-flor-a_0_y_1 gfdl-flor-b_0_y_1 gfdl_0_y_1 nasa_0_y_1 nmme0_mean_y_1
___ ___ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ ______ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________38 238 01-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 9.0021 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753
38 238 02-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 9.4104 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753
38 238 03-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 9.7816 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753
38 238 04-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.066 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753
38 238 05-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.35 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753
38 238 06-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.59 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413
38 238 07-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.674 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413
38 238 08-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.995 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413
Now we can see the names of all of the columns (also known as variables) and get a sense of their datatypes, which will make it much easier to work with these tables. Notice that both datasets have the same variable names. If you look through all of the variable names, you’ll see one called ‘tmp2m’ – this is the column we will be training a model to predict, also called the response variable.
It is important to have a training and testing set with known outputs, so you can see how well your model performs on unseen data. In this case, it is split ahead of time, but you may need to split your training set manually. For example, if you have one dataset in a 100,000-row table called ‘train_data’, the example code below would randomly split this table into 80% training and 20% testing data. These percentages are relatively standard when distributing training and testing data, but you may want to try out different values when making your datasets!
[trainInd, ~, testInd] = dividerand(100000, .8, 0, .2);
trainingData = train_data(trainInd, :);
testingData = train_data(testInd, :);

Preprocess Data

Now that the data is in the workspace, we need to take some steps to clean and format it so it can be used to train a machine learning model. We can use the summary function to see the datatype and statistical information about each variable:
summary(trainingData)

Variables:

lat: 146034×1 double

Values:

Min 27
Median 42
Max 49

lon: 146034×1 double

Values:

Min 236
Median 252
Max 266

start_date: 146034×1 datetime

Values:

Min 01-Jan-2016 00:00:00
Median 01-Jul-2016 12:00:00
Max 31-Dec-2016 00:00:00

cancm3_0_x: 146034×1 double

Values:

Min -12.902
Median 10.535
Max 36.077

cancm4_0_x: 146034×1 double

Values:

Min -13.276
Median 12.512
Max 35.795

ccsm3_0_x: 146034×1 double

Values:

Min -11.75
Median 10.477
Max 32.974

ccsm4_0_x: 146034×1 double

Values:

Min -13.264
Median 12.315
Max 34.311

cfsv2_0_x: 146034×1 double

Values:

Min -11.175
Median 11.34
Max 35.749

gfdl-flor-a_0_x: 146034×1 double

Values:

Min -12.85
Median 11.831
Max 37.416

gfdl-flor-b_0_x: 146034×1 double

Values:

Min -13.52
Median 11.837
Max 37.34

gfdl_0_x: 146034×1 double

Values:

Min -11.165
Median 10.771
Max 36.117

nasa_0_x: 146034×1 double

Values:

Min -19.526
Median 14.021
Max 38.22

nmme0_mean_x: 146034×1 double

Values:

Min -12.194
Median 11.893
Max 34.879

cancm3_x: 146034×1 double

Values:

Min -12.969
Median 9.9291
Max 36.235

cancm4_x: 146034×1 double

Values:

Min -12.483
Median 12.194
Max 38.378

ccsm3_x: 146034×1 double

Values:

Min -13.033
Median 10.368
Max 33.42

ccsm4_x: 146034×1 double

Values:

Min -14.28
Median 12.254
Max 34.957

cfsv2_x: 146034×1 double

Values:

Min -14.683
Median 10.897
Max 35.795

gfdl_x: 146034×1 double

Values:

Min -9.8741
Median 10.476
Max 35.95

gfdl-flor-a_x: 146034×1 double

Values:

Min -13.021
Median 11.15
Max 37.834

gfdl-flor-b_x: 146034×1 double

Values:

Min -12.557
Median 11.117
Max 37.192

nasa_x: 146034×1 double

Values:

Min -21.764
Median 13.721
Max 38.154

nmme_mean_x: 146034×1 double

Values:

Min -13.042
Median 11.354
Max 35.169

cancm3_y: 146034×1 double

Values:

Min 0.075757
Median 18.56
Max 124.58

cancm4_y: 146034×1 double

Values:

Min 0.02538
Median 16.296
Max 137.78

ccsm3_y: 146034×1 double

Values:

Min 4.5927e-05
Median 24.278
Max 126.36

ccsm4_y: 146034×1 double

Values:

Min 0.096667
Median 24.455
Max 204.37

cfsv2_y: 146034×1 double

Values:

Min 0.074655
Median 25.91
Max 156.7

gfdl_y: 146034×1 double

Values:

Min 0.0046441
Median 20.49
Max 133.88

gfdl-flor-a_y: 146034×1 double

Values:

Min 0.0044707
Median 20.438
Max 195.32

gfdl-flor-b_y: 146034×1 double

Values:

Min 0.0095625
Median 20.443
Max 187.15

nasa_y: 146034×1 double

Values:

Min 1.9478e-05
Median 17.98
Max 164.94

nmme_mean_y: 146034×1 double

Values:

Min 0.2073
Median 21.494
Max 132

cancm3_0_y: 146034×1 double

Values:

Min 0.016023
Median 19.365
Max 139.94

cancm4_0_y: 146034×1 double

Values:

Min 0.016112
Median 17.354
Max 160.04

ccsm3_0_y: 146034×1 double

Values:

Min 0.00043188
Median 21.729
Max 144.19

ccsm4_0_y: 146034×1 double

Values:

Min 0.02979
Median 23.642
Max 151.3

cfsv2_0_y: 146034×1 double

Values:

Min 0.01827
Median 25.095
Max 176.15

gfdl-flor-a_0_y: 146034×1 double

Values:

Min 0.0058198
Median 17.634
Max 184.7

gfdl-flor-b_0_y: 146034×1 double

Values:

Min 0.0045824
Median 16.937
Max 194.19

gfdl_0_y: 146034×1 double

Values:

Min 0.0030585
Median 19.379
Max 140.16

nasa_0_y: 146034×1 double

Values:

Min 0.00051379
Median 17.81
Max 167.31

nmme0_mean_y: 146034×1 double

Values:

Min 0.061258
Median 20.697
Max 140.1

cancm3_0_x_1: 146034×1 double

Values:

Min 0.016023
Median 19.436
Max 139.94

cancm4_0_x_1: 146034×1 double

Values:

Min 0.016112
Median 17.261
Max 160.04

ccsm3_0_x_1: 146034×1 double

Values:

Min 0.00043188
Median 21.75
Max 144.19

ccsm4_0_x_1: 146034×1 double

Values:

Min 0.02979
Median 23.45
Max 231.72

cfsv2_0_x_1: 146034×1 double

Values:

Min 0.01827
Median 25.096
Max 176.15

gfdl-flor-a_0_x_1: 146034×1 double

Values:

Min 0.0058198
Median 17.617
Max 217.6

gfdl-flor-b_0_x_1: 146034×1 double

Values:

Min 0.0045824
Median 16.915
Max 195.06

gfdl_0_x_1: 146034×1 double

Values:

Min 0.0030585
Median 19.411
Max 140.16

nasa_0_x_1: 146034×1 double

Values:

Min 0.00051379
Median 17.733
Max 180.77

nmme0_mean_x_1: 146034×1 double

Values:

Min 0.061258
Median 20.67
Max 140.1

tmp2m: 146034×1 double

Values:

Min -21.031
Median 12.742
Max 37.239

cancm3_x_1: 146034×1 double

Values:

Min 0.075757
Median 18.649
Max 124.58

cancm4_x_1: 146034×1 double

Values:

Min 0.02538
Median 16.588
Max 116.86

ccsm3_x_1: 146034×1 double

Values:

Min 4.5927e-05
Median 25.242
Max 134.15

ccsm4_x_1: 146034×1 double

Values:

Min 0.21704
Median 24.674
Max 204.37

cfsv2_x_1: 146034×1 double

Values:

Min 0.028539
Median 26.282
Max 154.39

gfdl_x_1: 146034×1 double

Values:

Min 0.0046441
Median 21.028
Max 142.5

gfdl-flor-a_x_1: 146034×1 double

Values:

Min 0.0044707
Median 21.322
Max 187.57

gfdl-flor-b_x_1: 146034×1 double

Values:

Min 0.0095625
Median 21.444
Max 193.19

nasa_x_1: 146034×1 double

Values:

Min 1.9478e-05
Median 17.963
Max 183.71

nmme_mean_x_1: 146034×1 double

Values:

Min 0.24096
Median 21.881
Max 124.19

cancm3_y_1: 146034×1 double

Values:

Min -11.839
Median 10.067
Max 36.235

cancm4_y_1: 146034×1 double

Values:

Min -11.809
Median 12.179
Max 38.378

ccsm3_y_1: 146034×1 double

Values:

Min -11.662
Median 10.552
Max 33.171

ccsm4_y_1: 146034×1 double

Values:

Min -14.66
Median 12.254
Max 34.891

cfsv2_y_1: 146034×1 double

Values:

Min -14.519
Median 10.99
Max 35.795

gfdl_y_1: 146034×1 double

Values:

Min -10.906
Median 10.555
Max 35.95

gfdl-flor-a_y_1: 146034×1 double

Values:

Min -12.995
Median 11.24
Max 37.834

gfdl-flor-b_y_1: 146034×1 double

Values:

Min -12.899
Median 11.255
Max 37.192

nasa_y_1: 146034×1 double

Values:

Min -21.459
Median 13.768
Max 38.154

nmme_mean_y_1: 146034×1 double

Values:

Min -13.219
Median 11.462
Max 35.169

cancm3_0_y_1: 146034×1 double

Values:

Min -12.902
Median 10.475
Max 36.077

cancm4_0_y_1: 146034×1 double

Values:

Min -13.276
Median 12.385
Max 35.795

ccsm3_0_y_1: 146034×1 double

Values:

Min -9.4298
Median 10.452
Max 32.974

ccsm4_0_y_1: 146034×1 double

Values:

Min -12.54
Median 12.237
Max 34.311

cfsv2_0_y_1: 146034×1 double

Values:

Min -10.862
Median 11.315
Max 35.749

gfdl-flor-a_0_y_1: 146034×1 double

Values:

Min -12.85
Median 11.831
Max 37.416

gfdl-flor-b_0_y_1: 146034×1 double

Values:

Min -13.52
Median 11.842
Max 37.34

gfdl_0_y_1: 146034×1 double

Values:

Min -9.2018
Median 10.658
Max 36.117

nasa_0_y_1: 146034×1 double

Values:

Min -19.526
Median 14.002
Max 38.22

nmme0_mean_y_1: 146034×1 double

Values:

Min -12.194
Median 11.861
Max 34.879

This shows that all variables are doubles except for the ‘start_time’ variable, which is a datetime, and is not compatible with many machine learning algorithms. Let’s break this up into three separate predictors that may be more helpful when training our algorithms:
trainingData.Day = trainingData.start_date.Day;
trainingData.Month = trainingData.start_date.Month;
trainingData.Year = trainingData.start_date.Year;
trainingData.start_date = [];
I’m also going to move the ‘tmp2m’ variable to the end, which will make it easier to distinguish that this is the variable we want to predict.
trainingData = movevars(trainingData, “tmp2m”, “After”, “Year”);
head(trainingData)
lat lon cancm3_0_x cancm4_0_x ccsm3_0_x ccsm4_0_x cfsv2_0_x gfdl-flor-a_0_x gfdl-flor-b_0_x gfdl_0_x nasa_0_x nmme0_mean_x cancm3_x cancm4_x ccsm3_x ccsm4_x cfsv2_x gfdl_x gfdl-flor-a_x gfdl-flor-b_x nasa_x nmme_mean_x cancm3_y cancm4_y ccsm3_y ccsm4_y cfsv2_y gfdl_y gfdl-flor-a_y gfdl-flor-b_y nasa_y nmme_mean_y cancm3_0_y cancm4_0_y ccsm3_0_y ccsm4_0_y cfsv2_0_y gfdl-flor-a_0_y gfdl-flor-b_0_y gfdl_0_y nasa_0_y nmme0_mean_y cancm3_0_x_1 cancm4_0_x_1 ccsm3_0_x_1 ccsm4_0_x_1 cfsv2_0_x_1 gfdl-flor-a_0_x_1 gfdl-flor-b_0_x_1 gfdl_0_x_1 nasa_0_x_1 nmme0_mean_x_1 cancm3_x_1 cancm4_x_1 ccsm3_x_1 ccsm4_x_1 cfsv2_x_1 gfdl_x_1 gfdl-flor-a_x_1 gfdl-flor-b_x_1 nasa_x_1 nmme_mean_x_1 cancm3_y_1 cancm4_y_1 ccsm3_y_1 ccsm4_y_1 cfsv2_y_1 gfdl_y_1 gfdl-flor-a_y_1 gfdl-flor-b_y_1 nasa_y_1 nmme_mean_y_1 cancm3_0_y_1 cancm4_0_y_1 ccsm3_0_y_1 ccsm4_0_y_1 cfsv2_0_y_1 gfdl-flor-a_0_y_1 gfdl-flor-b_0_y_1 gfdl_0_y_1 nasa_0_y_1 nmme0_mean_y_1 Day Month Year tmp2m
___ ___ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ ___ _____ ____ ______27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 1 1 2016 12.044
27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 2 1 2016 12.631
27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 3 1 2016 13.305
27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 4 1 2016 13.396
27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 5 1 2016 13.627
27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 6 1 2016 13.999
27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 7 1 2016 14.223
27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 8 1 2016 14.248
Repeat these steps for the testing data:
testingData.Day = testingData.start_date.Day;
testingData.Month = testingData.start_date.Month;
testingData.Year = testingData.start_date.Year;
testingData.start_date = [];
testingData = movevars(testingData, “tmp2m”, “After”, “Year”);
head(testingData)
lat lon cancm3_0_x cancm4_0_x ccsm3_0_x ccsm4_0_x cfsv2_0_x gfdl-flor-a_0_x gfdl-flor-b_0_x gfdl_0_x nasa_0_x nmme0_mean_x cancm3_x cancm4_x ccsm3_x ccsm4_x cfsv2_x gfdl_x gfdl-flor-a_x gfdl-flor-b_x nasa_x nmme_mean_x cancm3_y cancm4_y ccsm3_y ccsm4_y cfsv2_y gfdl_y gfdl-flor-a_y gfdl-flor-b_y nasa_y nmme_mean_y cancm3_0_y cancm4_0_y ccsm3_0_y ccsm4_0_y cfsv2_0_y gfdl-flor-a_0_y gfdl-flor-b_0_y gfdl_0_y nasa_0_y nmme0_mean_y cancm3_0_x_1 cancm4_0_x_1 ccsm3_0_x_1 ccsm4_0_x_1 cfsv2_0_x_1 gfdl-flor-a_0_x_1 gfdl-flor-b_0_x_1 gfdl_0_x_1 nasa_0_x_1 nmme0_mean_x_1 cancm3_x_1 cancm4_x_1 ccsm3_x_1 ccsm4_x_1 cfsv2_x_1 gfdl_x_1 gfdl-flor-a_x_1 gfdl-flor-b_x_1 nasa_x_1 nmme_mean_x_1 cancm3_y_1 cancm4_y_1 ccsm3_y_1 ccsm4_y_1 cfsv2_y_1 gfdl_y_1 gfdl-flor-a_y_1 gfdl-flor-b_y_1 nasa_y_1 nmme_mean_y_1 cancm3_0_y_1 cancm4_0_y_1 ccsm3_0_y_1 ccsm4_0_y_1 cfsv2_0_y_1 gfdl-flor-a_0_y_1 gfdl-flor-b_0_y_1 gfdl_0_y_1 nasa_0_y_1 nmme0_mean_y_1 Day Month Year tmp2m
___ ___ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ ___ _____ ____ ______38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 1 1 2016 9.0021
38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 2 1 2016 9.4104
38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 3 1 2016 9.7816
38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 4 1 2016 10.066
38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 5 1 2016 10.35
38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6 1 2016 10.59
38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7 1 2016 10.674
38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 8 1 2016 10.995
Now, the data is ready to be used!

Train & Evaluate a Model

There are many different ways to approach this year’s problem, so it’s important to try out different models! In this tutorial, we will be using a machine learning approach to tackle the problem of weather forecasting, and since the response variable ‘tmp2m’ is a number, we will need to create a regression model. Let’s start by opening the Regression Learner app, which will allow us to rapidly prototype several different models.
regressionLearner
When you first open the app, you’ll need to click on the “New Session” button in the top left corner. Set the “Data Set Variable” to ‘trainingData’, and it will automatically select the correct response variable. This is because it is the last variable in the table. Then, since this is a pretty big dataset, I change the validation scheme to “Holdout Validation”, and set the percentage held out to 15. I chose these as starting values, but you may want to play around with the Validation Scheme when making your own model.
After we’ve clicked “Start Session”, the Regression Learner App interface will load.
Step 1: Start A New Session
openRegressionLearner.gif
[Click on “New Session” > “From Workspace”, set the “Data Set Variable” to ‘trainingData’, set the “Validation Scheme” to ‘Holdout Validation’, set “percent held out” to 15, click “Start Session”]
From here, I’m going to choose to train “All Quick-to-Train” model options, so I can see which one performs the best out of these few. The steps for doing this are shown below. Note: this recording is slightly sped up since the training will take several seconds.
Step 2: Train Models
trainRegressionLearner.gif
[Click “All Quick-To-Train” in the MODELS section of the Toolstrip, delete the “1. Tree” model in the “Models” panel, click “Train All”, wait for all models to finish training]
I chose the “All Quick-to-Train” option so that I could show the process, but if you have the time, you may want to try selecting “All” instead of the “All Quick-to-Train” option. This will give you more models to work with.
Once those have finished training, you’ll see the RMSE, or Root-Mean-Squared-Error values, shown on the left hand side. This is a common error metric for regression models, and is what will be used to evaluate your submissions for the competition. RMSE is calculated using the following equation:

 

This value tells you how well the model performed on the validation data. In this case, the Fine Tree model performed the best!
The Regression Learner app also lets you import test data to see how well the trained models perform on new data. This will give you an idea on how accurate the model may be when making your final predictions for the competition test set. Let’s import our ‘testingData’ table, and see how these models peform.
Step 3: Evaluate Models with Testing Data
testRegressionLearner.gif
[Click on the “Test Data” dropdown, select “From Workspace”. In the window that opens, set “Test Data Set Variable” to ‘testingData’, then click “Import”. Click “Test All” – new RMSE values will be calculated]
This will take a few seconds to run, but once it finishes we can see that even though the Fine Tree model performed best on the validation data, the Linear Regression model performs best on completely new data.
You can also use the ‘PLOT AND INTERPRET’ tab of the Regression Learner app to create visuals that show how the model performed on the test and validation sets. For example, let’s look at the “Predicted vs. Actual (Test)” graph for the Linear Regression model:
Step 4: Plot Results
testPlot.gif
[Click on the drop-down menu in the PLOT AND INTERPRET section of the Toolstrip, then select “Predicted vs. Actual (Test)”]
Since this model performed relatively well, the blue dots (representing the predictions) stay pretty close to the line (representing the actual values). I’m happy with how well this model performs, so lets export it to the workspace so we can make predictions on other datasets!
Step 5: Export the Model
exportModel.gif
[In the EXPORT section of the Toolstrip, click “Export Model” > “Export Model”. In the window that appears, click “OK”]
Now the model is in the MATLAB Workspace as “trainedModel” so I can use it outside of the app.
To learn more about exporting models from the Regression Learner app, check out this documentation page!

Save and Export Predictions

Once you have a model that you are happy with, it’s time to make predictions on new data. To show you what this workflow looks like, I’m going to remove the “tmp2m” variable from my testing dataset, because the competition test set will not have this variable.
testingData = removevars(testingData, “tmp2m”);
Now we have a dataset that contains the same variables as our training set except for the response variable. To make predictions on this dataset, use predictFcn:
tmp2m = trainedModel.predictFcn(testingData);
This returns an array containing one prediction per row of the test set. To prepare these predictions for submission, we’ll need to create a table with two columns: one containing the index number, and one containing the prediction for that index number. Since the dataset I am using does not provide an index number, I will create an array with index numbers to show you what the resulting table will look like.
index = (1:length(tmp2m))’;
outputTable = table(index, tmp2m);
head(outputTable)
index tmp2m
_____ ______1 11.037
2 11.041
3 11.046
4 11.05
5 11.054
6 13.632
7 13.636
8 13.641
Then we can export the results to an excel sheet to be read and used by others!
writetable(outputTable, “datathonSubmission.csv”);
To learn more about submission and evaluation for the competition, refer to the Kaggle page.

Experiment!

When creating any kind of AI model, it’s important to test out different workflows to see which one performs best for your dataset and challenge! This tutorial was only meant to be an introduction, but there are so many other choices you can make when preprocessing your data or creating your models. There is no one algorithm that suits all problems, so set aside some time to test out different models. Here are some suggestions on how to get started:
  • Try other preprocessing techniques, such as normalizing the data or creating new variables
  • Play around with the training options available in the app
  • Change the variables that you use to train the model
  • Try machine and deep learning workflows
  • Change the breakdown of training, testing, and validaton data
If you are training a deep learning network, you can also utilize the Experiment Manager to train the network under different conditions and compare the results!

Done!

Thank you for joining me on this tutorial! We are excited to find out how you will take what you have learned to create your own models. I recommend looking at the ‘Additional Resources’ section below for more ideas on how you can improve your models.
Feel free to reach out to us at studentcompetitions@mathworks.com if you have any further questions.

Additional Resources

  1. Overview of Supervised Learning (Video)
  2. Preprocessing Data Documentation
  3. Missing Data in MATLAB
  4. Supervised Learning Workflow and Algorithms
  5. Train Regression Models in Regression Learner App
  6. Train Classification Models in Classiication Learner App
  7. 8 MATLAB Cheat Sheets for Data Science
  8. MATLAB Onramp
  9. Machine Learning Onramp
  10. Deep Learning Onramp

|
  • print
  • send email

Comments

To leave a comment, please click here to sign in to your MathWorks Account or create a new one.