Last month's reorganization by index providers Standard & Poor's and MSCI of their Global Industry Classification Standard has been a major catalyst for market activity.
As such, it has been raising transaction costs for some investors. To take one example, the $23 billion SPDR Technology ETF alone will ultimately have had to reallocate almost $4 billion of positions in order to faithfully track the new GICS "communication services" sector.
This forced rebalancing is hardly consistent with the idea of efficient markets, and raises the question of whether certain types of investors would benefit from exploring alternative classifications that could avoid such frenzied turnover. As a quantitative investment manager, Winton considers a range of classification methods for use in our investment systems.
Market participants seek to classify companies into sectors for many purposes to:
- Invest according to thematic views (through sector-specific ETFs, for example).
- Construct hedges for single-stock positions.
- Benchmark performance.
- Identify closely related companies for which fundamental indicators can be compared.
- Improve estimates of risk.
It is always vital to consider the limitations as well as the advantages of a given approach. While some classifications will deliver superior solutions for certain problems, no single classification contains all the answers.
As the industry standard, the GICS framework has some virtues. It provides a consistent taxonomy for classifying companies across geographies and its methodology is relatively transparent. But it also has significant drawbacks. The top-level view allows companies to belong to just one sector — a limiting assumption given the commercial diversity of many modern companies. Furthermore, its classifications are relatively fixed. Since its launch in 1999, there had been only one recategorization prior to the Sept. 28 reorganization. In 2016, a real estate sector was added for companies previously lumped in with financials.
Winton has long considered alternative classifications that impose fewer constraints on the companies under observation and that adapt to changing circumstances more readily. Of these, we would highlight two interesting methods that use data and statistics. These methods are systematic, capable of accounting for the diversified nature of firms, can change according to a consistent methodology over time and measure clearly defined properties.
The first uses a machine learning technique called natural language processing to read regulatory filings and thereby determine a company's relevant sector classification. We estimate that it would take a small team about 70 years to read 24 years' worth of annual financial reports for the Russell 3000's constituents. By using computational heft, however, we processed, parsed and standardized these documents in under a day. This structured data was then used to train our NLP machine to produce its own sector categories for S&P 500 constituents.
This NLP approach produces intriguing results. For 2016, our NLP classification named only three of the same sectors as GICS: energy, real estate and utilities (see Figure 1). The three largest 2016 GICS sectors by share of market capitalization were information technology, financials and health care; for our NLP machine, meanwhile, they were retail, hardware and software. The GICS industrials sector, ranking sixth for market-cap share in 2016, had no obvious NLP equivalent. And instead of a health-care sector like GICS, the NLP classification created one for pharma/biotech and another for "health-care equipment and services."
One advantage of the NLP approach is that it classifies companies by the nature of their businesses, both in terms of customers and competitors, and the fundamental commercial and economic changes to which they might respond. It allows for the modern diversity of businesses, with each company able to belong to multiple sectors. And it avoids being purely backward-looking by analyzing the words companies themselves use to describe their future intentions as well as current activities. The approach does have limits, however. An NLP classification may sometimes produce incoherent results, meaning subsequent human interpretation could introduce bias.
The second method involves computing a covariance matrix to capture relationships in the movements of company share prices. This method identifies stocks that might move together in the future by examining what moved together in the past. To do this, Winton applies statistical clustering methods to a covariance matrix of returns. For this exercise, the first step was to take price changes over some period and measure the correlation. Although Winton computes intraday correlations for a large collection of global assets, here we restricted our attention to daily returns during 2016 for stocks in the S&P 500.
Groups of stocks lying close to one another could then be considered clusters, and there are many ways to identify clusters automatically once we have a distance measure. If the clusters are sufficiently large, it is possible to identify them as sectors.
Various themes emerged from our covariance-based sectors for 2016. Health care, a single, top-level GICS sector, was split into two major clusters. The pharmaceutical or biotech companies in the first group — the likes of Amgen Inc. or Gilead Sciences Inc. — behaved distinctly from the companies providing hardware and services in the second group, such as Aetna Inc. or Medtronic PLC. The telecommunications services sector, meanwhile, was merged into utilities, which seems a reasonable reflection of the underlying companies' role in the modern economy. It also provides an interesting counterpoint to the creation of the communication services sector under GICS.
The flexibility of this data-driven covariance approach has promising applications, such as for risk management and designing investment systems. Yet, as with GICS, it is purely backward-looking, since it is derived from historical price data.
In moving away from a somewhat imposed, discretionary, one-size-fits-all schemes for stock market sector classifications, it is possible to adopt different approaches for different business needs. In our equity trading, we are particularly interested in how investment signals apply to groups of stocks. Some signals have greater statistical power when applied to a collection of stocks rather than to the entire market in all its diversity, or to a single stock with its own idiosyncrasies. By contrast, other signals may, when traded, lead to unwanted sector exposure if this is not carefully risk managed.
The benefits of flexible and data-driven schemes of classification are clear then, but caution is required. Noise in the data may throw up potentially spurious relationships between stocks or identify counterintuitive groupings that are unlikely to persist.
Furthermore, if we simulate a trading system that makes use of proprietary, dynamic sectors, we must decide how we would have defined them in the past and how they would have changed through time, introducing the possibility of hindsight bias. A long history may be difficult to reconstruct, given changes in the data available to feed into machine learning algorithms, though we note that other sector definitions, including GICS, may also have relatively short histories.
It is possible that these methods may have resulted in less frenetic market churn than the recent GICS reorganization. But neither should be considered a default choice, as the GICS often appears to be. Rather, when deciding how to classify a company, one ought to start by considering what question the classification is meant to answer. The next steps are to assess what data is required to make the decision and which statistical methods are best suited to the task in hand.
Geraint Harker is vice president, product research at Winton Group, London. This content represents the views of the author. It was submitted and edited under P&I guidelines but is not a product of P&I's editorial team.