Loren on the Art of MATLAB

June 11th, 2010

Simple Date Manipulations

I've recently had an opportunity to work with data for which I wanted to exclude weekend dates in order to see some trends. It took a little getting used to for me, because absolute time is not germane to the scientific and engineering data I am accustomed to.

Contents

Excel Data

I was given some data extracted from an Excel spreadsheet. Here's some of it.

mytimes = [40314.338356482
           40315.334988426
           40316.333935185
           40316.333935237
           40317.341400463
           40318.336875000
           40318.336875032
           40318.336875295
           40319.333912037
           40320.337835648
           40321.335960648
           40322.342372685
           40323.342106481
           40324.335439815
           40325.337696759
           40326.338171033
           40326.338171296
           40327.340648148
           40328.338912037
           40329.336631944
           40330.338657407
           40331.337256944
           40332.342314815
           40333.342164352
           40334.334004630
           40335.341342593
           40336.340104167
           40338.334884259
           40339.337777094
           40339.337777778
           40340.333842593];

As you can see, the times have a basis that is not this year! Digging around, I find I can convert these to datenum by adding a magic number. This is a conversion that you can get using the function x2mdate from Financial Toolbox.

 newtimes = mytimes + 693960;

To see if this makes sense, let me extract a date string. Looking at the documentation, you can see many options for formatting the date. I have used the default setting.

 lastday = datestr(newtimes(end))
lastday =
11-Jun-2010 08:00:44

I want to remove weekends from the data and only care about date, not time of day. I also want to remove duplicate dates. Let me first get the dates alone and create a unique list.

 daysonly = fix(mytimes)
daysonly =
       40314
       40315
       40316
       40316
       40317
       40318
       40318
       40318
       40319
       40320
       40321
       40322
       40323
       40324
       40325
       40326
       40326
       40327
       40328
       40329
       40330
       40331
       40332
       40333
       40334
       40335
       40336
       40338
       40339
       40339
       40340
 uniquedays = unique(daysonly)
uniquedays =
       40314
       40315
       40316
       40317
       40318
       40319
       40320
       40321
       40322
       40323
       40324
       40325
       40326
       40327
       40328
       40329
       40330
       40331
       40332
       40333
       40334
       40335
       40336
       40338
       40339
       40340

Using another date function, I can find the days in my data corresponding to Saturday and Sunday, weekend days in the US.

n = weekday(uniquedays);

Find Saturday and Sunday.

weekends = find(n==1 | n==7);
% Remove weekend days.
myweekdays = uniquedays;
myweekdays(weekends) = [];

You can see I've deleted 7 dates from my unique list, corresponding to the weekend days.

whos
  Name             Size            Bytes  Class     Attributes

  daysonly        31x1               248  double              
  lastday          1x20               40  char                
  mytimes         31x1               248  double              
  myweekdays      18x1               144  double              
  n               26x1               208  double              
  newtimes        31x1               248  double              
  s               18x3               108  char                
  uniquedays      26x1               208  double              
  weekends         8x1                64  double              

I can check that the days I have left are weekdays.

[n,s] = weekday(myweekdays,'short');
s
s =
Mon
Tue
Wed
Thu
Fri
Mon
Tue
Wed
Thu
Fri
Mon
Tue
Wed
Thu
Fri
Tue
Wed
Thu

More Date Tools

Another date tool you might find useful is the function datetick, for labeling plots. For more on dates in MATLAB, check out the documentation.

I am curious how many people have data, other than financial data, where the actual date, rather than a relative one, is important to your data analysis. Let me know here.


Get the MATLAB code

Published with MATLAB® 7.10

8 Responses to “Simple Date Manipulations”

  1. Amy replied on :

    Neat post!!!

    I don’t need to know the day of the week like that. However, I analyze shoreline position which is often affected by waves. Sometimes we know about a storm on a particular date, or sometimes we are looking at a more seasonal signal. So I use datetick a lot.

  2. Brian McGill replied on :

    Anytime the system is externally forced and the external forcing is not regular, absolute time is important.

    As an ecologist (where Matlab is pretty popular by the way) I’m often looking at data (say the population of a species over time). If we’re only looking at internal dynamics, just comparing N_t+1 vs N_t is adequate; but a lot of times the weather (or the population of a predator, or ….) is an important component and weather is chaotic and irregular, so analysis usually uses actual historical weather data as an explanatory factor. This makes absolute time relevant.

    Overall, I would say that this is the biggest limitation I perceive in Matlab. My data always has a space, a time, and a species dimension to it, and Matlab doesn’t do much to help me keep track of the absolute coordinates along these dimensions for numerical data in an array. Or join datasets of different shapes – say I have a vector of body mass by species and another vector of temperature by time and an array of population size (time by species). I have to do all the housekeeping details myself to keep arrays aligned which can get burdensome when I, for example, filter on the species dimension.

    I would love it if Matlab would incorporate some OLAP like perspectives that address this need. (The stats toolkit dataset was a step in the right direction but still not general enough).

  3. Nathaniel replied on :

    Readers may be interested in more information about date-representation, here:
    http://www.fourmilab.ch/documents/calendar/
    Near the bottom of that page is a well-expressed description of the ms-excel date format(s).

    I didn’t know of datetick. It may be handy…
    I’m rather surprised that matlab doesn’t provide more date-handling in the basic distribution, without needing
    a rather outlandish toolbox. Gnuplot, for example, has
    a very handy ability to read input and plot ticks with
    arbitrarily specified date data.
    http://t16web.lanl.gov/Kawano/gnuplot/datetime-e.html#S1

    One example of caring about absolute dates and times, nearer to my area than finance, is in modelling of
    expected demands on an electricity network. (Or, I assume,
    on many `instantaneous’ commodities.) Time of day, day of week, week of year (season, holidays) are all important to how people behave.

  4. Daniel replied on :

    In my case, measurements are usually affected by ambient temperature. Our office buildings ventilation system follows very specific patterns across the day and week and to find those I plot by date.

    As Amy pointed out, plotting by date makes it easier to figure out possible causes for artefacts in the measurement.

  5. Fayssal El Moufatich replied on :

    Hi Loren,

    Unfortunately the magic number 693960 that is used to offset the Excel dates does not account for leap years. This is also reflected in the use of the x2mdate function :(. I think a correct implementation that takes into account calendar days would be appreciated.

    I did not know about the datetick function. It is definitely useful! :)

    Thanks for the article!

  6. Travis replied on :

    Great post. I’ve used MatLab for about 9 months now and love it, and love to learn how to use it better.
    I work in atmospheric science, and we are very interested in looking at day-specific data, especially when we want to observed pollution levels as a function of day-of-week. This will be very useful to me, thanks.

  7. Richard replied on :

    The following example shows some additional ways to work with dates in MATLAB.

    1. The dataset array can store cell strings, categorical arrays, and doubles in the same object. You can represent time using an easy to read cell string or as an ordinal.

    2. The dataset array’s join method gives you fine control when merging multiple dataset arrays. The join method supports inner and outer joins including “right”, “left”, and “full” outer joins.

    The combination of “Joins” and interpolation makes it very easy to solve some of the challenges associated with multirate sampling.

    
    %% Generate some data
    
    % Create dataset 1
    
    Time(1,:) = [2009,  10,  24,  12,  45, 07];
    
     for i = 1 : 99
    
         Time(i+1,:) = Time(i,:) + [0,  0,  0,  0,  0, 1];
    
     end
    
    Time = datestr(Time, 'mmmm dd, yyyy HH:MM:SS AM');
    Time = cellstr(Time(:,:));
    
    Dataset1 = dataset(Time);
    Dataset1.AirSpeed = linspace(8451, 8550, 100)';
    
    % Create dataset 2
    
    Dataset2 = dataset({Dataset1.Time(1:2:end), 'Time'});
    Dataset2.Altitude = linspace(501, 550, 50)';
    
    %% Start by displaying our data
    
    % Both datasets share 'Time' as a common variable
    % The two datasets are sampled at different rates
    
    Dataset1(1:20,:)
    Dataset2(1:10,:)
    
    % Note that the dataset array can store cell strings and
    % doubles in the same container
    
    %% Show an Inner join (The default behavior)
    
    % All rows that share a common time stamp are copied to
    % a new dataset array
    
    Dataset3 = join(Dataset2, Dataset1);
    Dataset3(1:10,:)
    
    % Note that the join method is not limited to numerical
    % data.  Here we're using a cell stirng as the key
    
    %% Show an Outer join
    
    % All rows from the right dataset array are copied to the
    % new dataset array.
    % Matching rows are copied from the left dataset array.
    % Missing data is flagged with a NaN
    
    Dataset4 = join(Dataset2, Dataset1, 'Type', 'rightouter');
    Dataset4(1:10,:)
    
    %% Fill in the blanks
    
    % Use isnan to create an index
    index = isnan(Dataset4.Altitude);
    
    % Convert cell strings to datenums for interpolation
    Time_numeric1 = datenum(Dataset4.Time_left(~index));
    Time_numeric2 = datenum(Dataset4.Time_right(index));
    
    % Use linear interpolation to estimate a plausible value
    Altitude_Fit = fit(Time_numeric1,Dataset4.Altitude(~index),'linearinterp');
    Dataset4.Altitude(index) = Altitude_Fit(Time_numeric2);
    
    % Clean up the dataset array
    
    Dataset4.Time_left = Dataset4.Time_right;
    Dataset4.Time_right = [];
    Dataset4.Properties.VarNames(1) = {'Time'};
    Dataset4(1:10,:)
    
  8. Andrew Kershaw replied on :

    I’m a little late to the party…

    In aerospace, we use both absolute and relative times, but I would say that absolute time is more important for us. Knowing the absolute time is required when analyzing telemetry to compare an event to some external influence. Brian McGill above is right on. But having the capability to go back and forth between absolute time and relative time is essential.

    datetick has been found to be seriously lacking. Why doesn’t it scale, zoom, and pan with the plot?! We’ve been rolling our own replacement for datetick since 2003.


MathWorks
Loren Shure works on design of the MATLAB language at MathWorks. She writes here about once a week on MATLAB programming and related topics.

These postings are the author's and don't necessarily represent the opinions of The MathWorks.