I've recently had an opportunity to work with data for which I wanted to exclude weekend dates in order to see some trends. It took a little getting used to for me, because absolute time is not germane to the scientific and engineering data I am accustomed to.
Contents
Excel Data
I was given some data extracted from an Excel spreadsheet. Here's some of it.
mytimes = [40314.338356482
40315.334988426
40316.333935185
40316.333935237
40317.341400463
40318.336875000
40318.336875032
40318.336875295
40319.333912037
40320.337835648
40321.335960648
40322.342372685
40323.342106481
40324.335439815
40325.337696759
40326.338171033
40326.338171296
40327.340648148
40328.338912037
40329.336631944
40330.338657407
40331.337256944
40332.342314815
40333.342164352
40334.334004630
40335.341342593
40336.340104167
40338.334884259
40339.337777094
40339.337777778
40340.333842593];As you can see, the times have a basis that is not this year! Digging around, I find I can convert these to datenum by adding a magic number. This is a conversion that you can get using the function x2mdate from Financial Toolbox.
newtimes = mytimes + 693960;
To see if this makes sense, let me extract a date string. Looking at the documentation, you can see many options for formatting the date. I have used the default setting.
lastday = datestr(newtimes(end))
lastday = 11-Jun-2010 08:00:44
I want to remove weekends from the data and only care about date, not time of day. I also want to remove duplicate dates. Let me first get the dates alone and create a unique list.
daysonly = fix(mytimes)
daysonly =
40314
40315
40316
40316
40317
40318
40318
40318
40319
40320
40321
40322
40323
40324
40325
40326
40326
40327
40328
40329
40330
40331
40332
40333
40334
40335
40336
40338
40339
40339
40340
uniquedays = unique(daysonly)
uniquedays =
40314
40315
40316
40317
40318
40319
40320
40321
40322
40323
40324
40325
40326
40327
40328
40329
40330
40331
40332
40333
40334
40335
40336
40338
40339
40340
Using another date function, I can find the days in my data corresponding to Saturday and Sunday, weekend days in the US.
n = weekday(uniquedays);
Find Saturday and Sunday.
weekends = find(n==1 | n==7);
% Remove weekend days.
myweekdays = uniquedays;
myweekdays(weekends) = [];You can see I've deleted 7 dates from my unique list, corresponding to the weekend days.
whos
Name Size Bytes Class Attributes daysonly 31x1 248 double lastday 1x20 40 char mytimes 31x1 248 double myweekdays 18x1 144 double n 26x1 208 double newtimes 31x1 248 double s 18x3 108 char uniquedays 26x1 208 double weekends 8x1 64 double
I can check that the days I have left are weekdays.
[n,s] = weekday(myweekdays,'short');
ss = Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri Mon Tue Wed Thu Fri Tue Wed Thu
More Date Tools
Another date tool you might find useful is the function datetick, for labeling plots. For more on dates in MATLAB, check out the documentation.
I am curious how many people have data, other than financial data, where the actual date, rather than a relative one, is important to your data analysis. Let me know here.
Get
the MATLAB code
Published with MATLAB® 7.10



Neat post!!!
I don’t need to know the day of the week like that. However, I analyze shoreline position which is often affected by waves. Sometimes we know about a storm on a particular date, or sometimes we are looking at a more seasonal signal. So I use datetick a lot.
Anytime the system is externally forced and the external forcing is not regular, absolute time is important.
As an ecologist (where Matlab is pretty popular by the way) I’m often looking at data (say the population of a species over time). If we’re only looking at internal dynamics, just comparing N_t+1 vs N_t is adequate; but a lot of times the weather (or the population of a predator, or ….) is an important component and weather is chaotic and irregular, so analysis usually uses actual historical weather data as an explanatory factor. This makes absolute time relevant.
Overall, I would say that this is the biggest limitation I perceive in Matlab. My data always has a space, a time, and a species dimension to it, and Matlab doesn’t do much to help me keep track of the absolute coordinates along these dimensions for numerical data in an array. Or join datasets of different shapes – say I have a vector of body mass by species and another vector of temperature by time and an array of population size (time by species). I have to do all the housekeeping details myself to keep arrays aligned which can get burdensome when I, for example, filter on the species dimension.
I would love it if Matlab would incorporate some OLAP like perspectives that address this need. (The stats toolkit dataset was a step in the right direction but still not general enough).
Readers may be interested in more information about date-representation, here:
http://www.fourmilab.ch/documents/calendar/
Near the bottom of that page is a well-expressed description of the ms-excel date format(s).
I didn’t know of datetick. It may be handy…
I’m rather surprised that matlab doesn’t provide more date-handling in the basic distribution, without needing
a rather outlandish toolbox. Gnuplot, for example, has
a very handy ability to read input and plot ticks with
arbitrarily specified date data.
http://t16web.lanl.gov/Kawano/gnuplot/datetime-e.html#S1
One example of caring about absolute dates and times, nearer to my area than finance, is in modelling of
expected demands on an electricity network. (Or, I assume,
on many `instantaneous’ commodities.) Time of day, day of week, week of year (season, holidays) are all important to how people behave.
In my case, measurements are usually affected by ambient temperature. Our office buildings ventilation system follows very specific patterns across the day and week and to find those I plot by date.
As Amy pointed out, plotting by date makes it easier to figure out possible causes for artefacts in the measurement.
Hi Loren,
Unfortunately the magic number 693960 that is used to offset the Excel dates does not account for leap years. This is also reflected in the use of the x2mdate function :(. I think a correct implementation that takes into account calendar days would be appreciated.
I did not know about the datetick function. It is definitely useful! :)
Thanks for the article!
Great post. I’ve used MatLab for about 9 months now and love it, and love to learn how to use it better.
I work in atmospheric science, and we are very interested in looking at day-specific data, especially when we want to observed pollution levels as a function of day-of-week. This will be very useful to me, thanks.
The following example shows some additional ways to work with dates in MATLAB.
1. The dataset array can store cell strings, categorical arrays, and doubles in the same object. You can represent time using an easy to read cell string or as an ordinal.
2. The dataset array’s join method gives you fine control when merging multiple dataset arrays. The join method supports inner and outer joins including “right”, “left”, and “full” outer joins.
The combination of “Joins” and interpolation makes it very easy to solve some of the challenges associated with multirate sampling.
%% Generate some data % Create dataset 1 Time(1,:) = [2009, 10, 24, 12, 45, 07]; for i = 1 : 99 Time(i+1,:) = Time(i,:) + [0, 0, 0, 0, 0, 1]; end Time = datestr(Time, 'mmmm dd, yyyy HH:MM:SS AM'); Time = cellstr(Time(:,:)); Dataset1 = dataset(Time); Dataset1.AirSpeed = linspace(8451, 8550, 100)'; % Create dataset 2 Dataset2 = dataset({Dataset1.Time(1:2:end), 'Time'}); Dataset2.Altitude = linspace(501, 550, 50)'; %% Start by displaying our data % Both datasets share 'Time' as a common variable % The two datasets are sampled at different rates Dataset1(1:20,:) Dataset2(1:10,:) % Note that the dataset array can store cell strings and % doubles in the same container %% Show an Inner join (The default behavior) % All rows that share a common time stamp are copied to % a new dataset array Dataset3 = join(Dataset2, Dataset1); Dataset3(1:10,:) % Note that the join method is not limited to numerical % data. Here we're using a cell stirng as the key %% Show an Outer join % All rows from the right dataset array are copied to the % new dataset array. % Matching rows are copied from the left dataset array. % Missing data is flagged with a NaN Dataset4 = join(Dataset2, Dataset1, 'Type', 'rightouter'); Dataset4(1:10,:) %% Fill in the blanks % Use isnan to create an index index = isnan(Dataset4.Altitude); % Convert cell strings to datenums for interpolation Time_numeric1 = datenum(Dataset4.Time_left(~index)); Time_numeric2 = datenum(Dataset4.Time_right(index)); % Use linear interpolation to estimate a plausible value Altitude_Fit = fit(Time_numeric1,Dataset4.Altitude(~index),'linearinterp'); Dataset4.Altitude(index) = Altitude_Fit(Time_numeric2); % Clean up the dataset array Dataset4.Time_left = Dataset4.Time_right; Dataset4.Time_right = []; Dataset4.Properties.VarNames(1) = {'Time'}; Dataset4(1:10,:)I’m a little late to the party…
In aerospace, we use both absolute and relative times, but I would say that absolute time is more important for us. Knowing the absolute time is required when analyzing telemetry to compare an event to some external influence. Brian McGill above is right on. But having the capability to go back and forth between absolute time and relative time is essential.
datetick has been found to be seriously lacking. Why doesn’t it scale, zoom, and pan with the plot?! We’ve been rolling our own replacement for datetick since 2003.