Weather Forecasting in MATLAB for the WiDS Datathon 2023

저자 Connell D`Souza, January 11, 2023

23 회 조회 (최근 30일) | 0 좋아요 | 0 댓글

In today’s blog, Grace Woolson gives us an insight into how you can get started with using Machine Learning and MATLAB for Weather Forecasting to take on the WiDS Datathon 2023 challenge. Over to you Grace..

Introduction

Today, I’m going to show an example of how you can use MATLAB for the WiDS Datathon 2023. This year’s challenge tasks participants with creating a model that can predict long-term temperature forecasts, which can help communities adapt to extreme weather events often caused by climate change. WiDS participants will submit their forecasts on Kaggle. This tutorial will walk through the following steps of the model-making process:

Importing a Tabular Dataset
Preprocessing Data
Training and Evaluating a Machine Learning Model
Making New Predictions and Exporting Predictions

MathWorks is happy to support participants of the Women in Data Science Datathon 2023 by providing complimentary MATLAB licenses, tutorials, workshops, and additional resources. To request complimentary licenses for you and your teammates, go to this MathWorks site, click the “Request Software” button, and fill out the software request form.

To register for the competition and access the dataset, go to the Kaggle page, sign-in or register for an account, and click the ‘Join Competition’ button. By accepting the rules for the competition, you will be able to download the challenge datasets available on the ‘Data’ tab.

Import Data

First, we need to bring the training data into the MATLAB workspace. For this tutorial, I will be using a subset of the overall challenge dataset, so the files shown below will differ from the ones you are provided. The datasets I will be using are:

Training data (train.xlsx)
Testing data (test.xlsx)

The data is in tabular form, so we can use the readtable function to import the data.

trainingData = readtable(‘train.xlsx’, ‘VariableNamingRule’, ‘preserve’);

testingData = readtable(‘test.xlsx’, ‘VariableNamingRule’, ‘preserve’);

Since the tables are so large, we don’t want to show the whole dataset at once, because it will take up the entire screen! Let’s use the head function to display the top 8 rows of the tables, so we can get a sense of what data we are working with.

head(trainingData)

lat lon start_date cancm3_0_x cancm4_0_x ccsm3_0_x ccsm4_0_x cfsv2_0_x gfdl-flor-a_0_x gfdl-flor-b_0_x gfdl_0_x nasa_0_x nmme0_mean_x cancm3_x cancm4_x ccsm3_x ccsm4_x cfsv2_x gfdl_x gfdl-flor-a_x gfdl-flor-b_x nasa_x nmme_mean_x cancm3_y cancm4_y ccsm3_y ccsm4_y cfsv2_y gfdl_y gfdl-flor-a_y gfdl-flor-b_y nasa_y nmme_mean_y cancm3_0_y cancm4_0_y ccsm3_0_y ccsm4_0_y cfsv2_0_y gfdl-flor-a_0_y gfdl-flor-b_0_y gfdl_0_y nasa_0_y nmme0_mean_y cancm3_0_x_1 cancm4_0_x_1 ccsm3_0_x_1 ccsm4_0_x_1 cfsv2_0_x_1 gfdl-flor-a_0_x_1 gfdl-flor-b_0_x_1 gfdl_0_x_1 nasa_0_x_1 nmme0_mean_x_1 tmp2m cancm3_x_1 cancm4_x_1 ccsm3_x_1 ccsm4_x_1 cfsv2_x_1 gfdl_x_1 gfdl-flor-a_x_1 gfdl-flor-b_x_1 nasa_x_1 nmme_mean_x_1 cancm3_y_1 cancm4_y_1 ccsm3_y_1 ccsm4_y_1 cfsv2_y_1 gfdl_y_1 gfdl-flor-a_y_1 gfdl-flor-b_y_1 nasa_y_1 nmme_mean_y_1 cancm3_0_y_1 cancm4_0_y_1 ccsm3_0_y_1 ccsm4_0_y_1 cfsv2_0_y_1 gfdl-flor-a_0_y_1 gfdl-flor-b_0_y_1 gfdl_0_y_1 nasa_0_y_1 nmme0_mean_y_1

___ ___ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ ______ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________27 261 01-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 12.044 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183

261 02-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 12.631 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183

261 03-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.305 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183

261 04-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.396 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183

261 05-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.627 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183

261 06-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.999 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049

261 07-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 14.223 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049

261 08-Jan-2016 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 14.248 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049

head(testingData)

lat lon start_date cancm3_0_x cancm4_0_x ccsm3_0_x ccsm4_0_x cfsv2_0_x gfdl-flor-a_0_x gfdl-flor-b_0_x gfdl_0_x nasa_0_x nmme0_mean_x cancm3_x cancm4_x ccsm3_x ccsm4_x cfsv2_x gfdl_x gfdl-flor-a_x gfdl-flor-b_x nasa_x nmme_mean_x cancm3_y cancm4_y ccsm3_y ccsm4_y cfsv2_y gfdl_y gfdl-flor-a_y gfdl-flor-b_y nasa_y nmme_mean_y cancm3_0_y cancm4_0_y ccsm3_0_y ccsm4_0_y cfsv2_0_y gfdl-flor-a_0_y gfdl-flor-b_0_y gfdl_0_y nasa_0_y nmme0_mean_y cancm3_0_x_1 cancm4_0_x_1 ccsm3_0_x_1 ccsm4_0_x_1 cfsv2_0_x_1 gfdl-flor-a_0_x_1 gfdl-flor-b_0_x_1 gfdl_0_x_1 nasa_0_x_1 nmme0_mean_x_1 tmp2m cancm3_x_1 cancm4_x_1 ccsm3_x_1 ccsm4_x_1 cfsv2_x_1 gfdl_x_1 gfdl-flor-a_x_1 gfdl-flor-b_x_1 nasa_x_1 nmme_mean_x_1 cancm3_y_1 cancm4_y_1 ccsm3_y_1 ccsm4_y_1 cfsv2_y_1 gfdl_y_1 gfdl-flor-a_y_1 gfdl-flor-b_y_1 nasa_y_1 nmme_mean_y_1 cancm3_0_y_1 cancm4_0_y_1 ccsm3_0_y_1 ccsm4_0_y_1 cfsv2_0_y_1 gfdl-flor-a_0_y_1 gfdl-flor-b_0_y_1 gfdl_0_y_1 nasa_0_y_1 nmme0_mean_y_1

___ ___ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ ______ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________38 238 01-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 9.0021 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753

238 02-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 9.4104 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753

238 03-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 9.7816 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753

238 04-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.066 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753

238 05-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.35 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753

238 06-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.59 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413

238 07-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.674 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413

238 08-Jan-2016 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 10.995 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413

Now we can see the names of all of the columns (also known as variables) and get a sense of their datatypes, which will make it much easier to work with these tables. Notice that both datasets have the same variable names. If you look through all of the variable names, you’ll see one called ‘tmp2m’ – this is the column we will be training a model to predict, also called the response variable.

It is important to have a training and testing set with known outputs, so you can see how well your model performs on unseen data. In this case, it is split ahead of time, but you may need to split your training set manually. For example, if you have one dataset in a 100,000-row table called ‘train_data’, the example code below would randomly split this table into 80% training and 20% testing data. These percentages are relatively standard when distributing training and testing data, but you may want to try out different values when making your datasets!

 [trainInd, ~, testInd] = dividerand(100000, .8, 0, .2); 

 trainingData = train_data(trainInd, :);

 testingData = train_data(testInd, :);

Preprocess Data

Now that the data is in the workspace, we need to take some steps to clean and format it so it can be used to train a machine learning model. We can use the summary function to see the datatype and statistical information about each variable:

summary(trainingData)

Variables:
lat: 146034×1 double
Values:
Min 27

Median 42

Max 49
lon: 146034×1 double
Values:
Min 236

Median 252

Max 266
start_date: 146034×1 datetime
Values:
Min 01-Jan-2016 00:00:00

Median 01-Jul-2016 12:00:00

Max 31-Dec-2016 00:00:00
cancm3_0_x: 146034×1 double
Values:
Min -12.902

Median 10.535

Max 36.077
cancm4_0_x: 146034×1 double
Values:
Min -13.276

Median 12.512

Max 35.795
ccsm3_0_x: 146034×1 double
Values:
Min -11.75

Median 10.477

Max 32.974
ccsm4_0_x: 146034×1 double
Values:
Min -13.264

Median 12.315

Max 34.311
cfsv2_0_x: 146034×1 double
Values:
Min -11.175

Median 11.34

Max 35.749
gfdl-flor-a_0_x: 146034×1 double
Values:
Min -12.85

Median 11.831

Max 37.416
gfdl-flor-b_0_x: 146034×1 double
Values:
Min -13.52

Median 11.837

Max 37.34
gfdl_0_x: 146034×1 double
Values:
Min -11.165

Median 10.771

Max 36.117
nasa_0_x: 146034×1 double
Values:
Min -19.526

Median 14.021

Max 38.22
nmme0_mean_x: 146034×1 double
Values:
Min -12.194

Median 11.893

Max 34.879
cancm3_x: 146034×1 double
Values:
Min -12.969

Median 9.9291

Max 36.235
cancm4_x: 146034×1 double
Values:
Min -12.483

Median 12.194

Max 38.378
ccsm3_x: 146034×1 double
Values:
Min -13.033

Median 10.368

Max 33.42
ccsm4_x: 146034×1 double
Values:
Min -14.28

Median 12.254

Max 34.957
cfsv2_x: 146034×1 double
Values:
Min -14.683

Median 10.897

Max 35.795
gfdl_x: 146034×1 double
Values:
Min -9.8741

Median 10.476

Max 35.95
gfdl-flor-a_x: 146034×1 double
Values:
Min -13.021

Median 11.15

Max 37.834
gfdl-flor-b_x: 146034×1 double
Values:
Min -12.557

Median 11.117

Max 37.192
nasa_x: 146034×1 double
Values:
Min -21.764

Median 13.721

Max 38.154
nmme_mean_x: 146034×1 double
Values:
Min -13.042

Median 11.354

Max 35.169
cancm3_y: 146034×1 double
Values:
Min 0.075757

Median 18.56

Max 124.58
cancm4_y: 146034×1 double
Values:
Min 0.02538

Median 16.296

Max 137.78
ccsm3_y: 146034×1 double
Values:
Min 4.5927e-05

Median 24.278

Max 126.36
ccsm4_y: 146034×1 double
Values:
Min 0.096667

Median 24.455

Max 204.37
cfsv2_y: 146034×1 double
Values:
Min 0.074655

Median 25.91

Max 156.7
gfdl_y: 146034×1 double
Values:
Min 0.0046441

Median 20.49

Max 133.88
gfdl-flor-a_y: 146034×1 double
Values:
Min 0.0044707

Median 20.438

Max 195.32
gfdl-flor-b_y: 146034×1 double
Values:
Min 0.0095625

Median 20.443

Max 187.15
nasa_y: 146034×1 double
Values:
Min 1.9478e-05

Median 17.98

Max 164.94
nmme_mean_y: 146034×1 double
Values:
Min 0.2073

Median 21.494

Max 132
cancm3_0_y: 146034×1 double
Values:
Min 0.016023

Median 19.365

Max 139.94
cancm4_0_y: 146034×1 double
Values:
Min 0.016112

Median 17.354

Max 160.04
ccsm3_0_y: 146034×1 double
Values:
Min 0.00043188

Median 21.729

Max 144.19
ccsm4_0_y: 146034×1 double
Values:
Min 0.02979

Median 23.642

Max 151.3
cfsv2_0_y: 146034×1 double
Values:
Min 0.01827

Median 25.095

Max 176.15
gfdl-flor-a_0_y: 146034×1 double
Values:
Min 0.0058198

Median 17.634

Max 184.7
gfdl-flor-b_0_y: 146034×1 double
Values:
Min 0.0045824

Median 16.937

Max 194.19
gfdl_0_y: 146034×1 double
Values:
Min 0.0030585

Median 19.379

Max 140.16
nasa_0_y: 146034×1 double
Values:
Min 0.00051379

Median 17.81

Max 167.31
nmme0_mean_y: 146034×1 double
Values:
Min 0.061258

Median 20.697

Max 140.1
cancm3_0_x_1: 146034×1 double
Values:
Min 0.016023

Median 19.436

Max 139.94
cancm4_0_x_1: 146034×1 double
Values:
Min 0.016112

Median 17.261

Max 160.04
ccsm3_0_x_1: 146034×1 double
Values:
Min 0.00043188

Median 21.75

Max 144.19
ccsm4_0_x_1: 146034×1 double
Values:
Min 0.02979

Median 23.45

Max 231.72
cfsv2_0_x_1: 146034×1 double
Values:
Min 0.01827

Median 25.096

Max 176.15
gfdl-flor-a_0_x_1: 146034×1 double
Values:
Min 0.0058198

Median 17.617

Max 217.6
gfdl-flor-b_0_x_1: 146034×1 double
Values:
Min 0.0045824

Median 16.915

Max 195.06
gfdl_0_x_1: 146034×1 double
Values:
Min 0.0030585

Median 19.411

Max 140.16
nasa_0_x_1: 146034×1 double
Values:
Min 0.00051379

Median 17.733

Max 180.77
nmme0_mean_x_1: 146034×1 double
Values:
Min 0.061258

Median 20.67

Max 140.1
tmp2m: 146034×1 double
Values:
Min -21.031

Median 12.742

Max 37.239
cancm3_x_1: 146034×1 double
Values:
Min 0.075757

Median 18.649

Max 124.58
cancm4_x_1: 146034×1 double
Values:
Min 0.02538

Median 16.588

Max 116.86
ccsm3_x_1: 146034×1 double
Values:
Min 4.5927e-05

Median 25.242

Max 134.15
ccsm4_x_1: 146034×1 double
Values:
Min 0.21704

Median 24.674

Max 204.37
cfsv2_x_1: 146034×1 double
Values:
Min 0.028539

Median 26.282

Max 154.39
gfdl_x_1: 146034×1 double
Values:
Min 0.0046441

Median 21.028

Max 142.5
gfdl-flor-a_x_1: 146034×1 double
Values:
Min 0.0044707

Median 21.322

Max 187.57
gfdl-flor-b_x_1: 146034×1 double
Values:
Min 0.0095625

Median 21.444

Max 193.19
nasa_x_1: 146034×1 double
Values:
Min 1.9478e-05

Median 17.963

Max 183.71
nmme_mean_x_1: 146034×1 double
Values:
Min 0.24096

Median 21.881

Max 124.19
cancm3_y_1: 146034×1 double
Values:
Min -11.839

Median 10.067

Max 36.235
cancm4_y_1: 146034×1 double
Values:
Min -11.809

Median 12.179

Max 38.378
ccsm3_y_1: 146034×1 double
Values:
Min -11.662

Median 10.552

Max 33.171
ccsm4_y_1: 146034×1 double
Values:
Min -14.66

Median 12.254

Max 34.891
cfsv2_y_1: 146034×1 double
Values:
Min -14.519

Median 10.99

Max 35.795
gfdl_y_1: 146034×1 double
Values:
Min -10.906

Median 10.555

Max 35.95
gfdl-flor-a_y_1: 146034×1 double
Values:
Min -12.995

Median 11.24

Max 37.834
gfdl-flor-b_y_1: 146034×1 double
Values:
Min -12.899

Median 11.255

Max 37.192
nasa_y_1: 146034×1 double
Values:
Min -21.459

Median 13.768

Max 38.154
nmme_mean_y_1: 146034×1 double
Values:
Min -13.219

Median 11.462

Max 35.169
cancm3_0_y_1: 146034×1 double
Values:
Min -12.902

Median 10.475

Max 36.077
cancm4_0_y_1: 146034×1 double
Values:
Min -13.276

Median 12.385

Max 35.795
ccsm3_0_y_1: 146034×1 double
Values:
Min -9.4298

Median 10.452

Max 32.974
ccsm4_0_y_1: 146034×1 double
Values:
Min -12.54

Median 12.237

Max 34.311
cfsv2_0_y_1: 146034×1 double
Values:
Min -10.862

Median 11.315

Max 35.749
gfdl-flor-a_0_y_1: 146034×1 double
Values:
Min -12.85

Median 11.831

Max 37.416
gfdl-flor-b_0_y_1: 146034×1 double
Values:
Min -13.52

Median 11.842

Max 37.34
gfdl_0_y_1: 146034×1 double
Values:
Min -9.2018

Median 10.658

Max 36.117
nasa_0_y_1: 146034×1 double
Values:
Min -19.526

Median 14.002

Max 38.22
nmme0_mean_y_1: 146034×1 double
Values:
Min -12.194

Median 11.861

Max 34.879

This shows that all variables are doubles except for the ‘start_time’ variable, which is a datetime, and is not compatible with many machine learning algorithms. Let’s break this up into three separate predictors that may be more helpful when training our algorithms:

trainingData.Day = trainingData.start_date.Day;

trainingData.Month = trainingData.start_date.Month;

trainingData.Year = trainingData.start_date.Year;

trainingData.start_date = [];

I’m also going to move the ‘tmp2m’ variable to the end, which will make it easier to distinguish that this is the variable we want to predict.

trainingData = movevars(trainingData, “tmp2m”, “After”, “Year”);

head(trainingData)

lat lon cancm3_0_x cancm4_0_x ccsm3_0_x ccsm4_0_x cfsv2_0_x gfdl-flor-a_0_x gfdl-flor-b_0_x gfdl_0_x nasa_0_x nmme0_mean_x cancm3_x cancm4_x ccsm3_x ccsm4_x cfsv2_x gfdl_x gfdl-flor-a_x gfdl-flor-b_x nasa_x nmme_mean_x cancm3_y cancm4_y ccsm3_y ccsm4_y cfsv2_y gfdl_y gfdl-flor-a_y gfdl-flor-b_y nasa_y nmme_mean_y cancm3_0_y cancm4_0_y ccsm3_0_y ccsm4_0_y cfsv2_0_y gfdl-flor-a_0_y gfdl-flor-b_0_y gfdl_0_y nasa_0_y nmme0_mean_y cancm3_0_x_1 cancm4_0_x_1 ccsm3_0_x_1 ccsm4_0_x_1 cfsv2_0_x_1 gfdl-flor-a_0_x_1 gfdl-flor-b_0_x_1 gfdl_0_x_1 nasa_0_x_1 nmme0_mean_x_1 cancm3_x_1 cancm4_x_1 ccsm3_x_1 ccsm4_x_1 cfsv2_x_1 gfdl_x_1 gfdl-flor-a_x_1 gfdl-flor-b_x_1 nasa_x_1 nmme_mean_x_1 cancm3_y_1 cancm4_y_1 ccsm3_y_1 ccsm4_y_1 cfsv2_y_1 gfdl_y_1 gfdl-flor-a_y_1 gfdl-flor-b_y_1 nasa_y_1 nmme_mean_y_1 cancm3_0_y_1 cancm4_0_y_1 ccsm3_0_y_1 ccsm4_0_y_1 cfsv2_0_y_1 gfdl-flor-a_0_y_1 gfdl-flor-b_0_y_1 gfdl_0_y_1 nasa_0_y_1 nmme0_mean_y_1 Day Month Year tmp2m

___ ___ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ ___ _____ ____ ______27 261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 1 1 2016 12.044

261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 2 1 2016 12.631

261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 3 1 2016 13.305

261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 4 1 2016 13.396

261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.8245 11.061 10.498 10.408 11.857 8.3761 11.315 11.775 12.281 10.822 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 35.156 28.155 30.717 34.552 28.183 28.298 28.652 34.429 37.595 31.748 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 29.098 21.265 42.821 28.231 40.159 62.355 24.896 24.933 22.981 32.971 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 16.57 18.283 15.485 18.897 17.87 16.714 17.432 13.391 20.003 17.183 5 1 2016 13.627

261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 6 1 2016 13.999

261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 7 1 2016 14.223

261 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 25.938 16.519 23.387 21.876 39.836 21.261 14.133 36.942 29.398 25.477 13.191 16.105 30.301 26.116 43.048 40.007 26.308 24.571 26.924 27.397 9.2216 12.444 10.616 10.461 11.401 7.7597 12.194 11.664 12.213 10.886 10.801 12.686 11.962 13.5 13.012 11.99 12.29 8.5611 13.64 12.049 8 1 2016 14.248

Repeat these steps for the testing data:

testingData.Day = testingData.start_date.Day;

testingData.Month = testingData.start_date.Month;

testingData.Year = testingData.start_date.Year;

testingData.start_date = [];

testingData = movevars(testingData, “tmp2m”, “After”, “Year”);

head(testingData)

lat lon cancm3_0_x cancm4_0_x ccsm3_0_x ccsm4_0_x cfsv2_0_x gfdl-flor-a_0_x gfdl-flor-b_0_x gfdl_0_x nasa_0_x nmme0_mean_x cancm3_x cancm4_x ccsm3_x ccsm4_x cfsv2_x gfdl_x gfdl-flor-a_x gfdl-flor-b_x nasa_x nmme_mean_x cancm3_y cancm4_y ccsm3_y ccsm4_y cfsv2_y gfdl_y gfdl-flor-a_y gfdl-flor-b_y nasa_y nmme_mean_y cancm3_0_y cancm4_0_y ccsm3_0_y ccsm4_0_y cfsv2_0_y gfdl-flor-a_0_y gfdl-flor-b_0_y gfdl_0_y nasa_0_y nmme0_mean_y cancm3_0_x_1 cancm4_0_x_1 ccsm3_0_x_1 ccsm4_0_x_1 cfsv2_0_x_1 gfdl-flor-a_0_x_1 gfdl-flor-b_0_x_1 gfdl_0_x_1 nasa_0_x_1 nmme0_mean_x_1 cancm3_x_1 cancm4_x_1 ccsm3_x_1 ccsm4_x_1 cfsv2_x_1 gfdl_x_1 gfdl-flor-a_x_1 gfdl-flor-b_x_1 nasa_x_1 nmme_mean_x_1 cancm3_y_1 cancm4_y_1 ccsm3_y_1 ccsm4_y_1 cfsv2_y_1 gfdl_y_1 gfdl-flor-a_y_1 gfdl-flor-b_y_1 nasa_y_1 nmme_mean_y_1 cancm3_0_y_1 cancm4_0_y_1 ccsm3_0_y_1 ccsm4_0_y_1 cfsv2_0_y_1 gfdl-flor-a_0_y_1 gfdl-flor-b_0_y_1 gfdl_0_y_1 nasa_0_y_1 nmme0_mean_y_1 Day Month Year tmp2m

___ ___ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ ________ ________ _______ _______ _______ ______ _____________ _____________ ______ ___________ __________ __________ _________ _________ _________ _______________ _______________ ________ ________ ____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ __________ __________ _________ _________ _________ ________ _______________ _______________ ________ _____________ ____________ ____________ ___________ ___________ ___________ _________________ _________________ __________ __________ ______________ ___ _____ ____ ______38 238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 1 1 2016 9.0021

238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 2 1 2016 9.4104

238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 3 1 2016 9.7816

238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 4 1 2016 10.066

238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6.4785 7.2476 8.747 10.039 9.444 7.7948 10.142 10.421 8.4113 8.7472 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 23.973 23.875 26.89 14.057 36.966 34.703 30.382 35.169 31.349 28.596 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 71.945 84.607 56.394 80.506 123.53 57.872 92.886 107.55 69.046 82.703 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 8.9173 9.2217 10.978 12.627 11.894 13.353 12.966 12.68 13.138 11.753 5 1 2016 10.35

238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 6 1 2016 10.59

238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7 1 2016 10.674

238 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 60.347 30.144 66.325 43.001 79.737 48.759 57.888 69.24 45.733 55.686 67.619 63.328 78.491 71.75 98.552 61.833 71.848 66.141 54.315 70.431 7.0177 8.0908 8.0417 9.4876 8.3799 8.738 10.34 10.312 8.6557 8.7848 7.3796 8.3793 8.5218 9.3342 9.508 10.423 10.946 9.3034 9.376 9.2413 8 1 2016 10.995

Now, the data is ready to be used!

Train & Evaluate a Model

There are many different ways to approach this year’s problem, so it’s important to try out different models! In this tutorial, we will be using a machine learning approach to tackle the problem of weather forecasting, and since the response variable ‘tmp2m’ is a number, we will need to create a regression model. Let’s start by opening the Regression Learner app, which will allow us to rapidly prototype several different models.

regressionLearner

When you first open the app, you’ll need to click on the “New Session” button in the top left corner. Set the “Data Set Variable” to ‘trainingData’, and it will automatically select the correct response variable. This is because it is the last variable in the table. Then, since this is a pretty big dataset, I change the validation scheme to “Holdout Validation”, and set the percentage held out to 15. I chose these as starting values, but you may want to play around with the Validation Scheme when making your own model.

After we’ve clicked “Start Session”, the Regression Learner App interface will load.

Step 1: Start A New Session

[Click on “New Session” > “From Workspace”, set the “Data Set Variable” to ‘trainingData’, set the “Validation Scheme” to ‘Holdout Validation’, set “percent held out” to 15, click “Start Session”]

From here, I’m going to choose to train “All Quick-to-Train” model options, so I can see which one performs the best out of these few. The steps for doing this are shown below. Note: this recording is slightly sped up since the training will take several seconds.

Step 2: Train Models

[Click “All Quick-To-Train” in the MODELS section of the Toolstrip, delete the “1. Tree” model in the “Models” panel, click “Train All”, wait for all models to finish training]

I chose the “All Quick-to-Train” option so that I could show the process, but if you have the time, you may want to try selecting “All” instead of the “All Quick-to-Train” option. This will give you more models to work with.

Once those have finished training, you’ll see the RMSE, or Root-Mean-Squared-Error values, shown on the left hand side. This is a common error metric for regression models, and is what will be used to evaluate your submissions for the competition. RMSE is calculated using the following equation:

This value tells you how well the model performed on the validation data. In this case, the Fine Tree model performed the best!

The Regression Learner app also lets you import test data to see how well the trained models perform on new data. This will give you an idea on how accurate the model may be when making your final predictions for the competition test set. Let’s import our ‘testingData’ table, and see how these models peform.

Step 3: Evaluate Models with Testing Data

[Click on the “Test Data” dropdown, select “From Workspace”. In the window that opens, set “Test Data Set Variable” to ‘testingData’, then click “Import”. Click “Test All” – new RMSE values will be calculated]

This will take a few seconds to run, but once it finishes we can see that even though the Fine Tree model performed best on the validation data, the Linear Regression model performs best on completely new data.

You can also use the ‘PLOT AND INTERPRET’ tab of the Regression Learner app to create visuals that show how the model performed on the test and validation sets. For example, let’s look at the “Predicted vs. Actual (Test)” graph for the Linear Regression model:

Step 4: Plot Results

[Click on the drop-down menu in the PLOT AND INTERPRET section of the Toolstrip, then select “Predicted vs. Actual (Test)”]

Since this model performed relatively well, the blue dots (representing the predictions) stay pretty close to the line (representing the actual values). I’m happy with how well this model performs, so lets export it to the workspace so we can make predictions on other datasets!

Step 5: Export the Model

[In the EXPORT section of the Toolstrip, click “Export Model” > “Export Model”. In the window that appears, click “OK”]

Now the model is in the MATLAB Workspace as “trainedModel” so I can use it outside of the app.

To learn more about exporting models from the Regression Learner app, check out this documentation page!

Save and Export Predictions

Once you have a model that you are happy with, it’s time to make predictions on new data. To show you what this workflow looks like, I’m going to remove the “tmp2m” variable from my testing dataset, because the competition test set will not have this variable.

testingData = removevars(testingData, “tmp2m”);

Now we have a dataset that contains the same variables as our training set except for the response variable. To make predictions on this dataset, use predictFcn:

tmp2m = trainedModel.predictFcn(testingData);

This returns an array containing one prediction per row of the test set. To prepare these predictions for submission, we’ll need to create a table with two columns: one containing the index number, and one containing the prediction for that index number. Since the dataset I am using does not provide an index number, I will create an array with index numbers to show you what the resulting table will look like.

index = (1:length(tmp2m))’;

outputTable = table(index, tmp2m);

head(outputTable)

index tmp2m

_____ ______1 11.037

11.041

11.046

11.05

11.054

13.632

13.636

13.641

Then we can export the results to an excel sheet to be read and used by others!

writetable(outputTable, “datathonSubmission.csv”);

To learn more about submission and evaluation for the competition, refer to the Kaggle page.

Experiment!

When creating any kind of AI model, it’s important to test out different workflows to see which one performs best for your dataset and challenge! This tutorial was only meant to be an introduction, but there are so many other choices you can make when preprocessing your data or creating your models. There is no one algorithm that suits all problems, so set aside some time to test out different models. Here are some suggestions on how to get started:

Try other preprocessing techniques, such as normalizing the data or creating new variables
Play around with the training options available in the app
Change the variables that you use to train the model
Try machine and deep learning workflows
Change the breakdown of training, testing, and validaton data

If you are training a deep learning network, you can also utilize the Experiment Manager to train the network under different conditions and compare the results!

Done!

Thank you for joining me on this tutorial! We are excited to find out how you will take what you have learned to create your own models. I recommend looking at the ‘Additional Resources’ section below for more ideas on how you can improve your models.

Feel free to reach out to us at studentcompetitions@mathworks.com if you have any further questions.

Additional Resources

Run in your browser

댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.

Student Lounge
Sharing technical and real-life examples of how students can use MATLAB and Simulink in their everyday projects #studentsuccess

Sharing technical and real-life examples of how students can use MATLAB and Simulink in their everyday projects #studentsuccess

Weather Forecasting in MATLAB for the WiDS Datathon 2023

Introduction

Import Data

Preprocess Data

Train & Evaluate a Model

Save and Export Predictions

Experiment!

Done!

Additional Resources

댓글

Student LoungeSharing technical and real-life examples of how students can use MATLAB and Simulink in their everyday projects #studentsuccess

Sharing technical and real-life examples of how students can use MATLAB and Simulink in their everyday projects #studentsuccess

Introduction

Import Data

Preprocess Data

Train & Evaluate a Model

Save and Export Predictions

Experiment!

Done!

Additional Resources

See Also

댓글

Student Lounge
Sharing technical and real-life examples of how students can use MATLAB and Simulink in their everyday projects #studentsuccess