Loren on the Art of MATLAB

Turn ideas into MATLAB

Long-term Analysis of the S&P 500 6

Posted by Loren Shure,

Here at the MathWorks we have a great staff to support the financial industry. We also have our fair share of DIYers who like to follow the market and manage their own investments, such as this edition's guest blogger, Steve Cruickshank, an admitted "cocktail party economist".

During his day job, Steve is the marketing manager for our MATLAB Application Deployment products.


Introduction - Long term tracking and statistical analysis of the S&P 500

If you’re an amateur investor like me, you probably experienced great frustration during late 2008 and early 2009, watching the value of your various stock portfolios, IRAs, and 401K accounts plummet. Sure, we lived the high life during the run-ups of the late 90s dot-com boom and subsequent gains during 2003 – 2008, but the Great Recession of late 2008 and early 2009 made me take a closer look at the S&P 500, especially the long term trends.

Caveat emptor - I’m not a financial industry professional and couldn’t tell you the difference between GARCH and Black-Scholes without using Wikipedia, but I do enjoy managing my own investments and crunching numbers to gain perspective. As long as we’re in “full disclosure” mode, I borrowed the phrase “cocktail party economist” from my grad school Economics professor.

When looking at the S&P 500 trends over various timeframes, or reading web pages from the various pundits, it’s easy to see that you can cherry pick a time period to validate virtually any hypothesis. Think the market is below a reasonable level? Then start your analysis fifteen years ago … 1999 … the height of the dot com boom. Since then the S&P has grown a paltry 1.4% annually, far below the historical average annual growth rate of ~7.5%. Plenty of room for growth, so buy everything in sight.

Want to espouse the theory that the market has grown way too fast and is in yet another bubble just waiting to burst? Sure, we can do that! Start your analysis five years ago … early 2009 … the bottom of the financial crisis. Since then the S&P has grown at a ridiculous rate, over 20% annually, roughly the same as during the dot com boom years. Time to sell everything.

Given those obvious extremes, perhaps the best way to eliminate the false conclusions reached by cherry picking time periods is to analyze data from as far back as possible and evaluate the current market based on the long term trends. The example uses Yahoo Finance to provide data for the S&P 500 since 1950, but you could use other data sources and time frames as you see fit.

The MathWorks' software used in this example is MATLAB and the Datafeed Toolbox. Home users may want to use MATLAB Home and hard-core number crunchers may prefer the advanced functionality provided by the Curve Fitting Toolbox.

Setting the initial constraints

I am using fetch from the Datafeed Toolbox to pull data from Yahoo based on a user specified timeframe, i.e. how many years from now. This example uses the 'w' option to get weekly data, but could also use 'm' for month or 'd' for day without materially affecting the results and conclusions.

% Set Yahoo as data source.
y = yahoo;

% Determine how far back to look. Yahoo provides data starting at 1950,
% so 65 years gathers all the available data.
years = 65;
start = now - (365.25 * years);

% Get data from Yahoo.
CloseData = fetch(y, '^GSPC', 'Close', start, now, 'w');

Curve fit calculations

We can then extract the closing price and date for each point, using the polyfit and polyval functions to determine the best fit line through the data.

% Extract data of interest from Yahoo results
ClosePrice = CloseData(:,2);
CloseDate = CloseData(:,1);

% Curve fit using a polynomial fit to create curve fit item and trend data
% based on curve fit result.
TrendLine = polyfit(CloseDate,log(ClosePrice), 1);
TrendData = polyval(TrendLine,CloseDate);

Option: If your analysis requires other types of curve fitting, you could also use Curve Fitting Toolbox, as shown here:

  TrendLine2 = fit(CloseDate, log(ClosePrice), 'poly1');
  TrendData = feval(TrendLine2, CloseDate);

Plotting and visualizing

Plotting the variables CloseDate, ClosePrice, and TrendData on a log plot provides the following image. The variables plotted along the y-axis (ClosePrice and TrendData) are expressed as exponential functions, such as $ClosePrice = e^y$. Plotting data using exponents and/or log plots provides a better visual representation of average growth rates over long periods of time.

% Plotting the actual performance against best fit line
title(['S&P 500 Performance and Best Fit Curve, last ',num2str(years),' years'])
ylabel('Closing and Best Fit Prices (e^y)')
xlabel('Closing Dates')

This plot provides an effective visual mechanism to tell if the market is currently above or below the long term average. Interestingly, the current market (as of late-April 2014) is landing very close to the best fit curve, so perhaps the market is just about where it “should” be. You probably won’t hear that sentiment from market prognosticators because bulls, bears, and bubbles generate more page views.

Using the axis function to zoom in on the previous plot, we can highlight two easily recognizable data points: the peak of the late ‘90’s dot com bubble and the financial crisis roughly ten years later. In this example, the annotation function provides the graphics.

% Setting appropriate zoom coordinates
axis([729661, 736053, 6.4, 7.6])

% Create ellipses to highlight areas of interest
annotation('ellipse',[0.168, 0.652, 0.149, 0.152], 'LineWidth',2, 'Color','r');
annotation('ellipse',[0.592, 0.183, 0.078, 0.431], 'LineWidth',2, 'Color','r');

Quantifying the data with statistics

Visuals are nice, but what if you, like so many of us here at MathWorks, prefer numbers and want to quantify the information rather than eyeball it? The first step in quantifying this data is to determine the variance, the variable Delta in my example, between the actual closing price and the projected price based on the best fit line.

% Calculate the difference between actual results and best fit curve
Delta = log(ClosePrice) - TrendData;

The calculations above will transpose the sloped line of the best fit curve to a horizontal line, but the familiar data points from the previous figure are clear, the dot com boom and financial crisis. We can then plot Delta relative to the best fit curve using the functions below. The zeros function converts a single '0' to a vector for effective plotting.

% Clear the figure then plot the raw data. Note that the BestFitCurve is
% multiplied by zeros to get reasonable legend
h1 = plot(CloseDate, Delta);
title(['S&P 500 Variance from Best Fit Curve, last ',num2str(years),' years'])
datetick ('x');
BestFitCurve = line(CloseDate, zeros(size(CloseDate)), 'Color', 'k');
ylabel('Variance from Best Fit Line')
xlabel('Closing Dates')

% Create ellipses to highlight areas of interest
annotation('ellipse', [0.753, 0.155, 0.077, 0.364], 'LineWidth',2, 'Color','r');
annotation('ellipse', [0.631, 0.793, 0.104, 0.131], 'LineWidth',2, 'Color','r');

To continue quantifying the results, we can use the std function to determine the standard deviation, variable name Sigma, for the variable Delta. To cover a broader range of variances we can then apply multipliers (2, 1, ½, -½, -1, -2) to Sigma, showing statistically how far above or below the market is with respect to the long term average. Thankfully none of the market fluctuations reached the Six Sigma levels.

% Define Sigma as the standard deviation of Delta, then convert to a vector
% using a matrix of ones
Sigma = std(Delta)*ones(size(CloseDate));

% Clear the previous figure, define the various Sigma's, then plot the data
% with a legend
h1 = plot (CloseDate, Delta);
title(['S&P 500 Variance from Best Fit Curve, last ',num2str(years),' years'])
BestFitCurve = line(CloseDate, zeros(size(CloseDate)), 'Color', 'k');
TwoSigma = line(CloseDate, 2*Sigma, 'Color', 'g', 'LineStyle', ':');
OneSigma = line(CloseDate, Sigma, 'Color', 'g', 'LineStyle', '--');
HalfSigma = line(CloseDate, 0.5*Sigma, 'Color', 'g');
NegHalfSigma = line(CloseDate, -0.5*Sigma, 'Color', 'r');
NegOneSigma = line(CloseDate, -Sigma, 'Color', 'r', 'LineStyle', '--');
NegTwoSigma = line(CloseDate, -2*Sigma, 'Color', 'r', 'LineStyle', ':');
ylabel('Variance from Best Fit Line')
xlabel('Closing Dates')

legend([h1 TwoSigma OneSigma HalfSigma BestFitCurve ...
    NegHalfSigma NegOneSigma NegTwoSigma], ...
    {'Delta', '+ 2\sigma', '+ \sigma', '+ \sigma/2' 'Best Fit', ...
    '- \sigma/2', '- \sigma','- 2\sigma'}, 'Location', 'NorthEastOutside')


With data and analysis like above, hindsight is always 20/20. In a perfect world, most investors like myself probably wish they had sold everything when the S&P went above the +2Sigma level near the end of the dot com bubble, and bought as much as possible on margin during extremely short window below -2Sigma in early 2009.

We all have our own risk profiles, so I won’t pretend to tell you what level represents an attractive buying or selling opportunity ... you need to do your own homework and decide for yourself. The example in this article is just one method to cut through the propaganda and utilize math to provide relatively unbiased data.

One final question for the readers ... do you agree with the statement made earlier that the current level of S&P 500 is about where it "should" be given the long term data? If not, what's your expected level and rationale? Please let me know here.

Get the MATLAB code

Published with MATLAB® R2014a


Comments are closed.

6 CommentsOldest to Newest

Tony Sturges replied on : 1 of 6

A nice bit of exposition — but
first, a question: why do you use an arbitrary curve-fitting routine? Why not merely plot the data (first) and then use various smoothing filters, if necessary, to suppress what you suspect are mostly the day-to-day noise?

Steve Cruickshank replied on : 2 of 6

Hi Tony,

Thanks for reading and the comment. I didn’t think about filtering the data via smoothing routines, but in hindsight I wouldn’t want to select which smoothing filter to use because that may influence the data. A true best fit curve helps keep it unbiased, at least in my thoughts process, but yours may differ.

Regarding “noise”, the fetch command lets you select various levels of data (daily closing price, weekly, or monthly) so you can easily filter noise out using those options. I chose the weekly option, which generated ~3300 data points (65 years x 52 weeks/year). I tried it with the other options and saw that the results (actual curve and standard deviation values) didn’t vary much at all.

Jotaf replied on : 3 of 6

I don’t usually play with financial data but this is a very nice article indeed. As you mentioned hindsight is 20/20, so it would be interesting to crop the data to different times (i.e., pretend that you were making a decision at that time, with only data from 1965 to that year available) and see if the “2-sigmas is a warning sign”-rule would have been correct, while yielding few false positives in years when nothing bad happened.

Steve Cruickshank replied on : 4 of 6


Great question. I made a minor tweak to the code to allow more flexible finish dates. The results are the same even when you clip to the respective historical decision points: both the 1999 dot com bubble and the 2008 financial crisis fall above/below their respective 2Sigma thresholds. The added/edited lines of code are in the initial few lines of real code. Simply establish a finish date other than ‘now’ and then have the fetch command use that finish date rather than ‘now’. For example, the code for ending five years ago is: “finish = now – (5 * 365.25);” and “CloseData = fetch(y, ‘^GSPC’, ‘Close’, start, finish, ‘w’);”

I think that most financial analysis looks at the percentage return. This will usually give you a pretty stationary time series which you can use as input for other forecasting or analysis techniques that assume stationarity.

Pakt has a very good title, Introduction to R for Quantitative Finance. It covers GARCH, which helps you assess volatility, and many other topics, including how to build portfolios.

Steve Cruickshank replied on : 6 of 6


Thanks for the info. I agree that percent return is an interesting data point, although I tend not to dwell too much on a return rate for any single holding because I tend to buy & sell equities and ETF’s multiple times and at varying prices. Certainly not enough to be considered a day trader.

Since the typical return for the S&P 500 is well known, around 7-8%, the goal of my example was to provide a mechanism to evaluate if the S&P 500 was riding high or low with respect to the long term trend. I also acknowledge the market is far from perfect and can be very irrational for long periods of time, so evaluating the current market relative to the long term trend is certainly not a guarantee of any upcoming market behavior. It’s still a guess. I’ll take a look at the Pakt book.