Loren on the Art of MATLAB

Turn ideas into MATLAB

Note

Loren on the Art of MATLAB has been archived and will not be updated.

New Ways to Arrange and Plot Data in Tables

Today I'd like to introduce a guest blogger, Stephen Doe, who works for the MATLAB Documentation team here at MathWorks. In today's post, Stephen shows us new functions for displaying, arranging, and plotting data in tables and timetables.

Contents

Tables, Then and Now

In R2013b, MATLAB® introduced the table data type, as a convenient container for column-oriented data. And in R2016b, MATLAB introduced the timetable data type, which is a table that has timestamped rows.

From the beginning, these data types offered advantages over cell arrays and structures. But over the course of several releases, the table and graphics development teams have added many new functions for tables and timetables. These functions add convenient ways to display and arrange tabular data. Also, they offer new ways to make plots or charts directly from tables, without the intermediate step of peeling out variables. As of R2018b, MATLAB boasts many new functions to help you make more effective use of tables and timetables.

Read Table and Display First Few Rows

To begin, I will use the readtable function to read data from a sample file that ships with MATLAB. The file outages.csv contains simulated data for electric power outages over a period of 12 years in the United States. The call to readtable returns a table, T, with six variables and 1468 rows, so I will suppress the output using a semicolon.

T = readtable('outages.csv');

One typical way to examine the data in a large table is to display the first few rows of the table. You can use indexing to access a subset of rows (and/or a subset of variables, for that matter). For example, this syntax returns the first three rows of T.

T(1:3,:)
ans =
  3×6 table
      Region          OutageTime        Loss     Customers     RestorationTime         Cause     
    ___________    ________________    ______    __________    ________________    ______________
    'SouthWest'    2002-02-01 12:18    458.98    1.8202e+06    2002-02-07 16:50    'winter storm'
    'SouthEast'    2003-01-23 00:49    530.14    2.1204e+05                 NaT    'winter storm'
    'SouthEast'    2003-02-07 21:15     289.4    1.4294e+05    2003-02-17 08:14    'winter storm'

I have a confession to make: I have written many table examples, using that syntax. And occasionally, I still catch myself starting with code like T(3,:), which accesses only one row.

Happily, in R2016b we added the head function to return the top rows of a table. Here's the call to return the first three rows using the head function.

head(T,3)
ans =
  3×6 table
      Region          OutageTime        Loss     Customers     RestorationTime         Cause     
    ___________    ________________    ______    __________    ________________    ______________
    'SouthWest'    2002-02-01 12:18    458.98    1.8202e+06    2002-02-07 16:50    'winter storm'
    'SouthEast'    2003-01-23 00:49    530.14    2.1204e+05                 NaT    'winter storm'
    'SouthEast'    2003-02-07 21:15     289.4    1.4294e+05    2003-02-17 08:14    'winter storm'

Similarly, tail returns the bottom rows of a table. (If you do not specify the number of rows, then head and tail return eight rows.)

Move, Add, and Delete Table Variables

After examining your table, you might find that you want to organize your table by moving related variables next to each other. For example, in T you might want to move Region and Cause so that they are together.

One way to move table variables is by indexing. But if you use indexing and want to keep all the variables, then you must specify them all in order, as shown in this syntax.

T = T(:,{'OutageTime','Loss','Customers','RestorationTime','Region','Cause'})

You also can use numeric indices. While more compact, this syntax is less readable.

T = T(:,[2:5 1 6])

When your table has many variables, it is awkward to move variables using indexing. Starting in R2018a, you can use the movevars function instead. Using movevars, you only have to specify the variables of interest. Move the Region variable so it is before Cause.

T = movevars(T,'Region','Before','Cause');
head(T,3)
ans =
  3×6 table
       OutageTime        Loss     Customers     RestorationTime       Region           Cause     
    ________________    ______    __________    ________________    ___________    ______________
    2002-02-01 12:18    458.98    1.8202e+06    2002-02-07 16:50    'SouthWest'    'winter storm'
    2003-01-23 00:49    530.14    2.1204e+05                 NaT    'SouthEast'    'winter storm'
    2003-02-07 21:15     289.4    1.4294e+05    2003-02-17 08:14    'SouthEast'    'winter storm'

It is also likely that you want to add data to your table. For example, let's calculate the duration of the power outages in T. Specify the format to display the duration in days.

OutageDuration = T.RestorationTime - T.OutageTime;
OutageDuration.Format = 'dd:hh:mm:ss';

It is easy to add OutageDuration to the end of a table using dot notation.

T.OutageDuration = OutageDuration;

However, you might want to add it at another location in T. In R2018a, you can use the addvars function. Add OutageDuration so that it is after OutageTime.

T = addvars(T,OutageDuration,'After','OutageTime');
head(T,3)
ans =
  3×7 table
       OutageTime       OutageDuration     Loss     Customers     RestorationTime       Region           Cause     
    ________________    ______________    ______    __________    ________________    ___________    ______________
    2002-02-01 12:18     06:04:32:00      458.98    1.8202e+06    2002-02-07 16:50    'SouthWest'    'winter storm'
    2003-01-23 00:49             NaN      530.14    2.1204e+05                 NaT    'SouthEast'    'winter storm'
    2003-02-07 21:15     09:10:59:00       289.4    1.4294e+05    2003-02-17 08:14    'SouthEast'    'winter storm'

Now, let's remove RestorationTime. You can easily remove variables using dot notation and an empty array.

T.RestorationTime = [];

However, in R2018a there is also a function to remove table variables. To remove RestorationTime, use the removevars function.

T = removevars(T,'RestorationTime');
head(T,3)
ans =
  3×6 table
       OutageTime       OutageDuration     Loss     Customers       Region           Cause     
    ________________    ______________    ______    __________    ___________    ______________
    2002-02-01 12:18     06:04:32:00      458.98    1.8202e+06    'SouthWest'    'winter storm'
    2003-01-23 00:49             NaN      530.14    2.1204e+05    'SouthEast'    'winter storm'
    2003-02-07 21:15     09:10:59:00       289.4    1.4294e+05    'SouthEast'    'winter storm'

Convert to Timetable

If your table contains dates and times in a datetime array, you can easily convert it to a timetable using the table2timetable function. In this example, table2timetable converts the values in OutageTime to row times. Row times are time stamps that label the rows of a timetable.

TT = table2timetable(T);
head(TT,3)
ans =
  3×5 timetable
       OutageTime       OutageDuration     Loss     Customers       Region           Cause     
    ________________    ______________    ______    __________    ___________    ______________
    2002-02-01 12:18     06:04:32:00      458.98    1.8202e+06    'SouthWest'    'winter storm'
    2003-01-23 00:49             NaN      530.14    2.1204e+05    'SouthEast'    'winter storm'
    2003-02-07 21:15     09:10:59:00       289.4    1.4294e+05    'SouthEast'    'winter storm'

When you display a timetable, it looks very similar to a table. One important difference is that a timetable has fewer variables than you might expect by glancing at the display. TT has five variables, not six. The vector of row times, OutageTime, is not considered a timetable variable, since its values label the rows. However, you can still access the row times using dot notation, as in T.OutageTime. You can use the vector of row times as an input argument to a function. For example, you can use it as the x-axis of a plot.

The row times of a timetable do not have to be ordered. If you want to be sure that the rows of a timetable are sorted by the row times, use the sortrows function.

TT = sortrows(TT);
head(TT,3)
ans =
  3×5 timetable
       OutageTime       OutageDuration     Loss     Customers       Region           Cause     
    ________________    ______________    ______    __________    ___________    ______________
    2002-02-01 12:18     06:04:32:00      458.98    1.8202e+06    'SouthWest'    'winter storm'
    2002-03-05 17:53     04:20:48:00      96.563    2.8666e+05    'MidWest'      'wind'        
    2002-03-16 06:18     02:17:05:00      186.44    2.1275e+05    'MidWest'      'severe storm'

Make Stacked Plot of Variables

Now I will show you why I converted T to a timetable. Starting in R2018b, you can plot the variables of a table or timetable in a stacked plot. In a stacked plot, the variables are plotted in separate y-axes, but using a common x-axis. And if you make a stacked plot from a timetable, the x-values are the row times.

To plot the variables of TT, use the stackedplot function. The function plots variables that can be plotted (such as numeric, datetime, and categorical arrays) and ignores variables that cannot be plotted. stackedplot also returns properties of the stacked plot as an object that allows customization of the stacked plot.

s = stackedplot(TT)
s = 
  StackedLineChart with properties:

         SourceTable: [1468×5 timetable]
    DisplayVariables: {'OutageDuration'  'Loss'  'Customers'}
               Color: [0 0.4470 0.7410]
           LineStyle: '-'
           LineWidth: 0.5000
              Marker: 'none'
          MarkerSize: 6

  Use GET to show all properties

One thing you can tell right away from this plot is that there must be a few timetable rows with bad data. There is one point for a power outage that supposedly lasted for over 9,000 days (or 24 years), which would mean it ended some time in the 2040s.

Convert Variables in Place

The stackedplot function ignored the Region and Cause variables, because these variables are cell arrays of character vectors. You might want to convert these variables to a different, and more useful, data type. While you can convert variables one at a time, there is now a more convenient way to convert all table variables of a specified data type.

Starting in R2018b, you can convert table variables in place using the convertvars function. For example, identify all the cell arrays of character vectors in TT (using iscellstr) and convert them to categorical arrays. Now Region and Cause contain discrete values assigned to categories. Categorical values are displayed without any quotation marks.

TT = convertvars(TT,@iscellstr,'categorical');
head(TT,3)
ans =
  3×5 timetable
       OutageTime       OutageDuration     Loss     Customers      Region         Cause    
    ________________    ______________    ______    __________    _________    ____________
    2002-02-01 12:18     06:04:32:00      458.98    1.8202e+06    SouthWest    winter storm
    2002-03-05 17:53     04:20:48:00      96.563    2.8666e+05    MidWest      wind        
    2002-03-16 06:18     02:17:05:00      186.44    2.1275e+05    MidWest      severe storm

Plots of Discrete Data

If your table or timetable has variables with values that belong to a finite set of discrete categories, then there are other interesting plots that you can make. Starting in R2017a, you can make a heat map of any two variables that contain discrete values using the heatmap function. For example, make a heat map of the Region and Cause variables to visualize where and why outages occur. Again, heatmap returns an object so you can customize the plot.

h = heatmap(TT,'Region','Cause')
h = 
  HeatmapChart (Count of Cause vs. Region) with properties:

      SourceTable: [1468×5 timetable]
        XVariable: 'Region'
        YVariable: 'Cause'
    ColorVariable: ''
      ColorMethod: 'count'

  Use GET to show all properties

You also can make a pie chart of any categorical variable (as of R2014b), using the pie function. However, you cannot call pie on a table. So, to make a pie chart of the power outages by region, use dot notation to access the Region variable.

pie(TT.Region)

Other Functions to Rearrange or Join Tables

MATLAB also has other functions to reorganize variables in more complex ways, and to join tables. I won't show them all in action, but I will describe some of them briefly. All these functions work with both tables and timetables.

R2018a includes functions to:

  • Reorient rows to become variables (rows2vars)

And from the original release of tables in R2013b, there are functions to:

Tabled for Discussion

Let's table discussion of these new functions for now. But we are eager to hear about your reactions to them. Do they help you make more effective use of tables and timetables? Please let us know here.




Published with MATLAB® R2018b


  • print

댓글

댓글을 남기려면 링크 를 클릭하여 MathWorks 계정에 로그인하거나 계정을 새로 만드십시오.