Quantlandian

Enhancing Short-Term Mean-Reversion Strategies

Copyright reserved by Rob Reider posted Mar 13, 2017 in Quantopian

For the majority of quant equity hedge funds that have holding periods on the order of a few days to a couple weeks (“medium frequency” funds), by far the most common strategy is some variation of short-term mean reversion. Of course, while no hard data exists to support this claim, in my experience working alongside several dozen quant groups within two multi-strategy hedge funds, and admittedly only seeing aggregate performance or individual performance that was anonymized, I was able to observe a strong correlation between hedge fund performance and the returns to a simple mean-reversion strategy (for example, buying the quintile of stocks in the S&P500 with the lowest five-day returns and shorting the quintile of stocks with the highest five-day returns). When the simple mean-reversion strategy was going through a rough period, the quant groups were almost universally down as well.

Given how common short-term mean-reversion strategies are, and more importantly, how well and consistently these strategies have held up over the years, it’s worthwhile to consider ways to enhance the performance of a simple mean-reversion strategy. Of course, every quant fund has their own way of implementing a mean-reversion strategy, which they often refer to as their “secret sauce”. In this post, I’d like to offer some possible ingredients that the community can use to create their own secret sauce.

I backtested numerous ideas on Quantopian, some worked as expected and many failed. Here is a summary of a few interesting ones:

  • Distinguish between liquidity events and news events
    o Use news sentiment data from Accern and Sentdex
    o Use volume data
    o Look for steady stock moves instead of jumps

  • Trade on a universe of stocks where there is less uncertainty about fair value
    o Use a low volatility universe
    o Use a universe of stocks that have a lower dispersion of analyst estimates

  • Miscellaneous enhancements
    o Separate different types of announcements using EventVestor data
    o Trade on a universe of lower liquidity stocks
    o Trade on a universe that excludes extreme momentum stocks
    o Skip the most recent day when computing mean reversion

As a baseline, I compared all the potential enhancements with a simple mean-reversion strategy: I sorted stocks in the Q500US into quintiles based on five-day returns. For the current day’s return, I used the 3:55 price and rebalanced the equally weighted, unleveraged portfolio daily at the close. Stocks were unwound when they dropped out of the extreme quintile. For such a simple strategy, it performed reasonably well. The cumulative return over the entire backtesting period from 2002 to present was about 100% and the Sharpe Ratio was 0.57.

Distinguishing between liquidity events and news events

There are at least two competing theories about why short-term mean-reversion strategies work (for example, see Subrahmanyam):

  • Because of behavioral biases (for example, investors overweight recent information), the market overreacts to both good news and bad news

  • Liquidity shocks (for example, a large portfolio rebalancing trade by an uniformed trader) lead to temporary moves that get reversed

There is some evidence that for certain news events, investors actually underreact to news, leading to trending, rather than mean reversion. So if it were possible to identify liquidity trades as opposed to news-related trades, you could avoid potentially money-losing news-related trades and focus on the more profitable liquidity trades.

Ravenpack, a news analytics data provider that competes with Accern AlphaOne and Sentdex, has released a white paper arguing that their data can enhance a mean-reversion strategy (there’s no link, but you can request a copy of their paper “Enhancing Short-term Stock Reversal Strategies With News Analytics“ from their website here). Their first enhancement is to combine mean reversion with news sentiment, using their own “Sentiment Strength Indicator”. They do a double sort on five-day returns and news sentiment and find that if they buy past losers that have strong positive sentiment and short past winners that have strong negative sentiment, they can improve the performance of the straight mean-reversion strategy. They also combine mean reversion with a measure of the number of news stories (regardless of whether they are positive or negative), which is their “Event Volume Indicator”. Here, they buy losers with low event volume but sell winners with high event volume. I would have expected selling on low event volume to work better, given the premise that high event volume represents more news-related trades.

I tried something similar with the daily estimates of news sentiment supplied by Accern and Sentdex (neither dataset has a field for news volume). I made numerous attempts to combine the data with a mean-reversion signal but was unable to enhance the simple mean-reversion strategy or replicate Ravenpack’s results.

But there are other, simpler ways to potentially distinguish liquidity events and news events. I tried using volume information - for example, sorting by the ratio of five day volume over the mean-reversion period to average daily volume over a longer period. I wasn’t that successful using volume, but I found a more fruitful approach was to look at the pattern of returns. My conjecture was that a 10% one-day return is more likely to be news-related whereas five consecutive days of 2% returns each day is more likely to be liquidity related, given that there is some evidence that large liquidity trades take place over consecutive days. In fact, Heston, Korajczyk, and Sakda argue that large trades actually get executed not only on consecutive days but also at the same time each day.

There are many ways to penalize return patterns that are dominated by large one-day moves and reward steadier return patterns that have the same cumulative return. I only tried one simple, but obvious, filter: I sorted stocks by the five-day standard deviation of returns. This worked very well and was robust. The results were nearly monotonic when filtering by the five-day standard deviation. Nonetheless, other techniques may work better and achieve the same goal.

Trading on a universe of stocks where there is less uncertainty about fair value

Another enhancement is to use a universe of lower volatility stocks. This idea was presented at a UBS Quant Conference in 2013. The rationale is that when there is less uncertainty about a stock’s “fair value”, stock prices are more likely to reverse after large price moves that deviate from “fair value”.

The improvement was modest but robust: it worked for different trading frequencies as well as a different universe (UBS looked at a monthly reversal strategy and a universe of 1000 stocks in North America). And although the results were not always strictly monotonic, the higher volatility quantiles consistently performed worse than any other quantile, both in terms of Sharpe Ratio and in terms of returns as well.

Applying this concept to a low analyst dispersion universe performed even better, according to UBS. Their measure of dispersion was the standard deviation of analyst earnings estimates. The rationale is the same, and in fact, the two measures are correlated. Quantopian is in the process of incorporating a dataset of analyst earnings forecasts, and as soon as this data is available, I’ll post the results and the algorithm.

These strategies also highlight the idea of using data, like analysts earnings estimates, not as a signal per se, but in a totally different way – as a means to condition your alpha or modify the universe of stocks.

Miscellaneous Enhancements

One could also try to examine, and then separate, different types of stock-moving events using EventVestor data. For some events, like analysts upgrades and downgrades, announcement of stock buybacks, and unexpected changes in dividends, the market may underreact to news and fail to mean revert. On the other hand, the market may overreact to other events, like earnings announcements. As other posts have pointed out, Post Earnings Announcement Drift no longer seems to work, and indeed, I backtested the simple mean-reversion strategy excluding earnings announcements from the sample, and returns were cut in half (although the Sharpe Ratio was almost the same, because mean-reversion trades following earnings announcements, while positive, are also more volatile).

An academic paper by Avramov, Chordia, and Goyal suggests that stocks that are less liquid have stronger reversals. The argument is that the compensation demanded by liquidity providers is greater for less liquid stocks. I tested this out using a simple measure of liquidity, volume/(shares outstanding), and it worked reasonably well. Avramov et al. suggest a more sophisticated measure of liquidity, which I did not try but might be interesting to look at.

I looked at filtering out the extreme momentum deciles. The idea here is that some momentum stocks seem to move in one direction and when they reverse, also move in one direction. This filter resulted in only a modest improvement.

Finally, the last idea is to skip the most recent day when computing the five-day mean reversion. In other words, compute returns from five days ago up to the previous day’s close. Whether you believe the source of mean-reversion profits is from providing liquidity or from overreaction, it’s plausible that it takes longer than one day for the (non-high frequency trading) liquidity providers to step in with capital, or that the overreaction to the news cycle lasts more than one day. Indeed, I found a one-day mean-reversion strategy did not perform very well. But for whatever reason, skipping the most recent day significantly enhanced performance and was robust to all the variations I tried.

While I believe it makes sense to test alpha generating signals separately, ultimately the goal is to combine them. I didn’t focus on this aspect, but I did notice that when I combined several of these ideas, the results were dominated by two simple enhancements - filtering out return jumps and skipping the last day’s return. These two simple enhancements triple the cumulative returns and raise the Sharpe Ratio to 1.34 (see attached algorithm).

I should point out that I’ve only focused on alpha ideas, as opposed to other performance enhancing techniques that are surely done by practitioners, like coming up with smarter ways to unwind positions, improving upon equal weighting of stocks, sector or industry neutralizing the portfolio or employing portfolio optimization techniques (see this post on Optimize API ), etc.

These enhancements will hopefully spur more ideas. Even just a few of these ideas can increase the Sharpe Ratio of a simple mean-reversion strategy to relatively high levels.


"""

This algorithm enhances a simple five-day mean reversion strategy by:
1. Skipping the last day's return
2. Sorting stocks based on the volatility of the five day return, to get steady moves vs jumpy ones
I also commented out two other filters that I looked at: 
1. Six month volatility
2. Liquidity (volume/(shares outstanding))


"""

import numpy as np
import pandas as pd
from quantopian.pipeline import Pipeline
from quantopian.pipeline import CustomFactor
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume
from quantopian.pipeline.data import morningstar
from quantopian.pipeline.filters import Q500US
from quantopian.pipeline.filters import Q1500US, QTradableStocksUS

def initialize(context):
    
    # Set long/short ratio
    context.long_ratio = 0.5
    context.short_ratio = 0.5
    
    # Set benchmark to short-term Treasury note ETF (SHY) since strategy is dollar neutral
#    set_benchmark(sid(23911))
    
    # Schedule our rebalance function to run at the end of each day.
    schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(minutes=30))
#    schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(minutes=5))
    # Record variables at the end of each day.
    schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
    
    # Get intraday prices today before the close if you are not skipping the most recent data
   # schedule_function(get_prices,date_rules.every_day(), time_rules.market_open())
    schedule_function(get_prices, date_rules.every_day(), time_rules.market_open())
    
    # Set commissions and slippage to 0 to determine pure alpha
   # set_commission(commission.PerShare(cost=0, min_trade_cost=0))
   # set_slippage(slippage.FixedSlippage(spread=0))
    
    # Number of quantiles for sorting returns for mean reversion
    context.nq=5
    
    # Number of quantiles for sorting volatility over five-day mean reversion period
    context.nq_vol=3
    

    # Create our pipeline and attach it to our algorithm.
    my_pipe = make_pipeline()
    attach_pipeline(my_pipe, 'my_pipeline')

class Volatility(CustomFactor):  
    inputs = [USEquityPricing.close]
    window_length=132
    
    def compute(self, today, assets, out, close):
        # I compute 6-month volatility, starting before the five-day mean reversion period
        daily_returns = np.log(close[1:-6]) - np.log(close[0:-7])
        out[:] = daily_returns.std(axis = 0)           

class Liquidity(CustomFactor):   
    inputs = [USEquityPricing.volume, morningstar.valuation.shares_outstanding] 
    window_length = 1
    
    def compute(self, today, assets, out, volume, shares):       
        out[:] = volume[-1]/shares[-1]        
        
class Sector(CustomFactor):
    inputs=[morningstar.asset_classification.morningstar_sector_code]
    window_length=1
    
    def compute(self, today, assets, out, sector):
        out[:] = sector[-1]   
        
        
        
        
def make_pipeline():
    """
    Create our pipeline.
    """
    
    pricing=USEquityPricing.close.latest

    # Volatility filter (I made it sector neutral to replicate what UBS did).  Uncomment and
    # change the percentile bounds as you would like before adding to 'universe'
    # vol=Volatility(mask=Q500US())
    # sector=morningstar.asset_classification.morningstar_sector_code.latest
    # vol=vol.zscore(groupby=sector)
    # vol_filter=vol.percentile_between(0,100)

    # Liquidity filter (Uncomment and change the percentile bounds as you would like before
    # adding to 'universe'
    # liquidity=Liquidity(mask=Q500US())
    # I included NaN in liquidity filter because of the large amount of missing data for shares out
    # liquidity_filter=liquidity.percentile_between(0,75) | liquidity.isnan()
    
    universe = (
        Q500US()
        & (pricing > 5)
        # & liquidity_filter
        # & volatility_filter
    )


    return Pipeline(
        screen=universe
    )


def before_trading_start(context, data):
    # Gets our pipeline output every day.
    context.output = pipeline_output('my_pipeline')
       

def get_prices(context, data):
    # Get the last 6 days of prices for every stock in our universe
    Universe500=context.output.index.tolist()
    prices = data.history(Universe500,'price',6,'1d')
    daily_rets=np.log(prices/prices.shift(1))

    rets=(prices.iloc[-2] - prices.iloc[0]) / prices.iloc[0]
    # I used data.history instead of Pipeline to get historical prices so you can have the 
    # option of using the intraday price just before the close to get the most recent return.
    # In my post, I argue that you generally get better results when you skip that return.
    # If you don't want to skip the most recent return, however, use .iloc[-1] instead of .iloc[-2]:
    # rets=(prices.iloc[-1] - prices.iloc[0]) / prices.iloc[0]
    
    stdevs=daily_rets.std(axis=0)

    rets_df=pd.DataFrame(rets,columns=['five_day_ret'])
    stdevs_df=pd.DataFrame(stdevs,columns=['stdev_ret'])
    
    context.output=context.output.join(rets_df,how='outer')
    context.output=context.output.join(stdevs_df,how='outer')
    
    context.output['ret_quantile']=pd.qcut(context.output['five_day_ret'],context.nq,labels=False)+1
    context.output['stdev_quantile']=pd.qcut(context.output['stdev_ret'],3,labels=False)+1

    context.longs=context.output[(context.output['ret_quantile']==1) & 
                                (context.output['stdev_quantile']<context.nq_vol)].index.tolist()
    context.shorts=context.output[(context.output['ret_quantile']==context.nq) & 
                                 (context.output['stdev_quantile']<context.nq_vol)].index.tolist()    

    
def my_rebalance(context, data):
    """
    Rebalance daily.
    """
    Universe500=context.output.index.tolist()


    existing_longs=0
    existing_shorts=0
    for security in context.portfolio.positions:
        # Unwind stocks that have moved out of Q500US
        if security not in Universe500 and data.can_trade(security): 
            order_target_percent(security, 0)
        else:
            if data.can_trade(security):
                current_quantile=context.output['ret_quantile'].loc[security]
                if context.portfolio.positions[security].amount>0:
                    if (current_quantile==1) and (security not in context.longs):
                        existing_longs += 1
                    elif (current_quantile>1) and (security not in context.shorts):
                        order_target_percent(security, 0)
                elif context.portfolio.positions[security].amount<0:
                    if (current_quantile==context.nq) and (security not in context.shorts):
                        existing_shorts += 1
                    elif (current_quantile<context.nq) and (security not in context.longs):
                        order_target_percent(security, 0)

    for security in context.longs:
        if data.can_trade(security):
            order_target_percent(security, (context.long_ratio)/(len(context.longs)+existing_longs))

    for security in context.shorts:
        if data.can_trade(security):
            order_target_percent(security, -(context.short_ratio)/(len(context.shorts)+existing_shorts))


def my_record_vars(context, data):
    """
    Record variables at the end of each day.
    """
    longs = shorts = 0
    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            longs += 1
        elif position.amount < 0:
            shorts += 1
    # Record our variables.
    record(leverage=context.account.leverage, long_count=longs, short_count=shorts)
    
    log.info("Today's shorts: "  +", ".join([short_.symbol for short_ in context.shorts]))
    log.info("Today's longs: "  +", ".join([long_.symbol for long_ in context.longs]))

BUILD OUR QUANTLAND

1 Like

There are a lot of interesting ideas in this post. Sadly without the infrastructure there are more basic questions to be asked, such as: How can one create a substitute for all the datasets used, let alone the other data sources mentioned in the text?

One idea I found particularly interesting: looking at volume and patterns of returns to distinguish news events from liquidity events. I’ll surely keep that in mind when I go about testing myself. Particularly since there are other ideas here. For example: If you can distinguish liquidity events and news events in this way, and if it is true that liquidity events lead to trending, could one build a short-term trend-following algo around this idea? In particular combining it with the idea mentioned in the text of repetitive large trades on consecutive days around similar times intraday?

@APTrade

Very well said and true. We have two issues here:

  1. Infrastructure
  2. Dataset

For the first point, we should probably try to localize zipline and no longer dependent on a third party (company) continuous operation. I am sure many developing projects are out there.

For the second, which should be more tricky one, can we resolve it by only relying on free data available on the internet? Can it be some kind of scrapers which try to gather and process disorganized raw data on web (form 10k/annual report)?

There are vast possibilities!

Kyo

Oh yes, I personally decided some months ago I wanted my own database for backtesting, so as to not depend on providers like Quantopian. As it turn out my hunch wasn’t wrong.

Using zipline locally is a good idea and I already experimented with that. It is not that hard to do. But you still need some data to build your own bundle from.

As far as I can see, there is quite a few data sources which are in principle available for free, if you do the work yourself to scrape / aggregate them. I’m getting close to be able to do that, but not quite there yet.