“All models are wrong. Some models are useful.”

This post is a white paper that I wrote for Economic Pulse Newsletter subscribers a few months ago. If you’re interested in learning more about the newsletter see here. Also, for more info on all kinds of TAA models I highly recommend Allocate Smartly.

This is one of our main tenets at the Economic Pulse Newsletter. We know our base TAA model, the SPY-COMP system, will fail. Hopefully not too often, hopefully for not too long, and hopefully not catastrophically. In this post, I will analyze the history of the SPY-COMP system, it’s triggers, and its failures. This is true of any active investment model, including all TAA models. An analysis of a TAA models failures and successes is something I would like to see discussed more. I think it would help investors set expectations on real time month to month performance and the actual month to month workings of the models, as opposed to only looking at long term results.

Performance and Return Distribution

Let’s start out by baselining performance. Table 1 shows the performance of various TAA portfolios. The green ones are the portfolios in the Economic Pulse Newsletter that are based on the SPY-COMP risk-on risk-off indicator that I will discuss here.

Table 1: TAA Portfolio Performance Statistics

Overall, great results. Higher performance with lower risk. But to get a better picture of what it is like to live with these portfolios on a month to month basis I find it useful to look at the distribution of monthly returns, in particular downside returns. It gives you a better feel for the tail risk or maximum pain points for portfolios. Table 2 provides summary statistics on the monthly returns for SPY-COMP & DM-COMP vs. various benchmarks.

Table 2: Distribution of Monthly Returns for TAA Models

Table 2 shows the average monthly return for each portfolio, the standard deviation of the monthly return, and the number of months (also % of months) that the portfolio experienced various sizes of negative returns. This is one of the biggest benefits of any TAA portfolio, the significant reduction in tail risk or max pain point. This would be extremely beneficial even if TAA portfolios did not increase returns. But they do that as well.

However, the benefits of TAA models, or any active management system, do not come at zero cost. In the next section, we’re going to look at an example of model failures, using the SPY-COMP system as an example.

System Failures

First, let’s define what failure means. I’m going to look at it in three ways. I’ll start with the most obvious. Since one of the premises of any TAA system is to avoid big drawdowns, which are the biggest during recessions, let’s look at how well SPY-COMP catches recessions. Since 1971, the SPY-COMP system has caught every recession, before a significant market downturn; i.e. no false negatives. Fantastic. But that isn’t very difficult to do if you’re just fitting historical data. Therefore, out-of-sample testing should be done with the same basic system. That is, I fitted a shorter period of time,1971 through 2002 for example, to see if the system would pick up the subsequent out-of-sample recession – 2007. It did. That makes you feel a little better that you’re were working with a reasonable model.

Then the next question becomes how many times did the system trigger outside a recession – that is called a false positive.

Since 1971, there have been 6 recessions but the SPY-COMP system triggered 31 times outside of those recessionary periods. That’s a lot and is one of the costs of an active system. But we’re investors, not academic economists. What we care about is making money and the avoidance of large drawdowns. We need to look at the system in terms of that. In other words, did those triggers hurt us?

Let’s define success as avoiding all monthly market drawdowns of more than 10%. I think that is pretty reasonable. How did SPY-COMP do relative to this market-based measure of success? Since 1971, there have been 13 periods, of varying lengths, with market drawdowns in excess of 10%. The SPY-COMP system caught everyone; again, no false negatives. More importantly the system only had 5 periods with drawdowns in excess of 10%. That compares to 13 with a Buy & Hold SPY strategy. Even with a conservative 60/40 US stock bond Buy& Hold system there were 7 periods with drawdowns in excess of 10%. But the system also triggered 18 more times outside of those periods. Let’s take a look at the consequences of those false positives.

Given our definition of success and failure, our system made 18 mistakes in the last 46 years. That’s a total error rate of 3%. In other words given that it could have triggered in any given month over that 46 year period, only 3% of the time did SPY-COMP trigger incorrectly. None-the-less, these are true potentially money losing mistakes. How much did the system suffer during these mistakes? We can measure the consequences of this mistake/false positive in two basic ways:

  1. Whether or not the mistake caused the system to fall behind a SPY buy and hold
  2. Whether or not the mistake made money or lost money.

In 12 out of 18 of the mistakes the system fell behind a Buy & Hold investment in the SP500, in the other 6 instances it was ahead of the SP500. In the 12 instances that it fell behind the SP500, our system lost money 4 times. The other 8 instances our system, while falling behind a simple Buy & Hold, still made you money. The maximum loss in the 4 money losing instances was less than 2%. It’s important to point out that a mistake/false positive in this situation is a switch to an investment in US intermediate government bonds for a minimum of one month. This is not an extremely risky or volatile asset class so we should not expect the loss to be very large. And, that indeed is the case.

Let’s look at recent history for some more concrete and specific examples. Since the end of the 2008 recession, the SPY-COMP indicator has triggered twice. Once in June 2010 for a period of one month and once in Sep 2011 for a period of 3 months. I classified June 2010 as a failure since we were out of the recession and the market was not off more than 10% from its recent high. Despite the failure, the SPY-COMP system outperformed the SPY by 5.9% during this brief period. The Sep 2011 period was a success. The drop in the market was steep but short lived. By the end of the period, the SPY-COMP model was back invested and had fallen behind the SPY by 0.8% over that 3 month period. I think these are the kind of triggers most of us can live with.

In summary, even when the SPY-COMP system errs and falls behind a reasonable benchmark at some point in time, the performance is still well within an acceptable range. I have found this to be true of most TAA models. That is exactly the type of performance we’re striving to achieve with a TAA system as it makes it much easier to stay with an investment system when even during ‘bad’ periods its performance is still pretty good.

Let’s summarize the analysis of system errors.

  • System period: 1971 to 2016. 46 years, 552 months
  • Total of 37 trigger periods of varying length. From 1 month to 17 months
  • The system is invested in risk assets 68% of the time, 32% of the time risk-off
  • 6 recessions, caught all 6, no false negatives
  • 31 triggers outside of a recession
  • 13 triggers during periods with DDs > 10%, caught all 13
  • 18 triggers during periods that market DDs were < 10%; i.e. a mistake
  • In 12 of 18 mistakes the system fell behind SPY buy & hold; the worst-case deviation from the index was 7.6%
  • Only 4 of the mistakes actually lost money; the worst-case loss was < 2%

The key take-away is that historically the system only triggered incorrectly 3% of the time and only lost money 1% of the time. The SPY-COMP system has historically kept the mistakes small, and as is obvious from the overall positive returns and low drawdowns shown in Table 1, the successes far outweigh the mistakes. This is true for most TAA models.

This point becomes clear by examining the base rates of the investment systems (Table 3). Base rates tell you how often the methodologies have beaten their benchmarks over various periods of time. It’s a good indication of how hard the methodologies are to stick to. For example, if the system you’re using is behind it’s index 80% of the time over a rolling 3-year period, even if it outperforms its index 100% of the time over a rolling 20-year period, who cares? The chances of 99.9% of us sticking to the system for 20 years is zero. The table below shows the base rates for SPY-COMP and DM-COMP over 1 year, 3 year, and 5 year periods. In 85% of rolling 12-months periods SPY-COMP (62% for DM-COMP) out-performs its benchmark. That increases to 91% (83% for DM-COMP) over a rolling 36-month period. Most of us can stick with a system that has that level of performance. Jake at Econompicdata has a great post covering this topic and the current state of trend following vs buy and hold.

Table 3: Investment Model Base Rates

The Real Cost of Mistakes

There are two costs to making a mistake. The first is not making money in the short term versus your benchmark and on an absolute basis. This is undesirable, in particular if it happens for an extended period. But every investment system experiences this. There is no system that beats a passive benchmark all of the time. Think of it this way, if a system beats its benchmark all of the time, its outperformance would be quickly arbitraged away. Why? Because when an outcome becomes predictable, i.e. there is no variation associated with it, then everyone adopts it.

But not making money or falling behind the benchmark in the short term isn’t the biggest cost to investors that adopt a TAA approach, since as the data demonstrate, in the long run you make more money and experience lower drawdowns using a TAA model, like the SPY-COMP system. The real cost is the impact on your behavior. When a system mistake causes you to fall behind the Buy-&-Hold approach, there is a tendency to abandon the system and chase performance. And then you’re bound to lose money both in the short and long term, since you’re abandoning the system at the worst possible time.

Fortunately, as we’ve just shown TAA models like the SPY-COMP and DM-COMP methodologies historically have not fallen behind their respective benchmarks very far and have not stayed behind very long. As a matter of fact, in any rolling 12-months the methodologies have historically beat their benchmarks a large majority of the time. And over a rolling 36-months they’ve come out ahead >90% and >80% of the time. That is a system that most of us can stick with.

Summary

TAA models work really well over the long term as shown by their great returns and lower drawdowns. But they’re not perfect. The do suffer from periods of underperformance, what I have classified here as a mistake, a model error. I think more systematic analysis of model errors would help investors make decisions on TAA models and also help set expectations as how historical periods of underperformance have turned out. As I have shown here TAA models, like SPY-COMP, have few and short lived errors. Of course, then we need to look at the other side of the coin, When Models Win. The successes of the models drive the outperformance over buy and hold portfolios. I’ll turn to that in my next post.


3 Comments

Ben · February 24, 2018 at 7:49 am

Hello Paul,
Do you also have a version of your COMP models for international markets?
Thank you
Ben

    paul.novell@gmail.com · February 24, 2018 at 8:20 am

    Only the US for now. Using the US economy as a proxy for WW investments works pretty well. At least for a dollar based investor.

    I plan to work on a European model in the next 6 months or so.

    Paul

James Herman · February 24, 2018 at 5:33 pm

Thanks for all your hard work!

Comments are closed.