Last month's reorganization by index providers Standard & Poor's and MSCI of their Global Industry Classification Standard has been a major catalyst for market activity.
As such, it has been raising transaction costs for some investors. To take one example, the $23 billion SPDR Technology ETF alone will ultimately have had to reallocate almost $4 billion of positions in order to faithfully track the new GICS "communication services" sector.
This forced rebalancing is hardly consistent with the idea of efficient markets, and raises the question of whether certain types of investors would benefit from exploring alternative classifications that could avoid such frenzied turnover. As a quantitative investment manager, Winton considers a range of classification methods for use in our investment systems.
Market participants seek to classify companies into sectors for many purposes to:
- Invest according to thematic views (through sector-specific ETFs, for example).
- Construct hedges for single-stock positions.
- Benchmark performance.
- Identify closely related companies for which fundamental indicators can be compared.
- Improve estimates of risk.
It is always vital to consider the limitations as well as the advantages of a given approach. While some classifications will deliver superior solutions for certain problems, no single classification contains all the answers.
As the industry standard, the GICS framework has some virtues. It provides a consistent taxonomy for classifying companies across geographies and its methodology is relatively transparent. But it also has significant drawbacks. The top-level view allows companies to belong to just one sector — a limiting assumption given the commercial diversity of many modern companies. Furthermore, its classifications are relatively fixed. Since its launch in 1999, there had been only one recategorization prior to the Sept. 28 reorganization. In 2016, a real estate sector was added for companies previously lumped in with financials.
Winton has long considered alternative classifications that impose fewer constraints on the companies under observation and that adapt to changing circumstances more readily. Of these, we would highlight two interesting methods that use data and statistics. These methods are systematic, capable of accounting for the diversified nature of firms, can change according to a consistent methodology over time and measure clearly defined properties.
The first uses a machine learning technique called natural language processing to read regulatory filings and thereby determine a company's relevant sector classification. We estimate that it would take a small team about 70 years to read 24 years' worth of annual financial reports for the Russell 3000's constituents. By using computational heft, however, we processed, parsed and standardized these documents in under a day. This structured data was then used to train our NLP machine to produce its own sector categories for S&P 500 constituents.
This NLP approach produces intriguing results. For 2016, our NLP classification named only three of the same sectors as GICS: energy, real estate and utilities (see Figure 1). The three largest 2016 GICS sectors by share of market capitalization were information technology, financials and health care; for our NLP machine, meanwhile, they were retail, hardware and software. The GICS industrials sector, ranking sixth for market-cap share in 2016, had no obvious NLP equivalent. And instead of a health-care sector like GICS, the NLP classification created one for pharma/biotech and another for "health-care equipment and services."
One advantage of the NLP approach is that it classifies companies by the nature of their businesses, both in terms of customers and competitors, and the fundamental commercial and economic changes to which they might respond. It allows for the modern diversity of businesses, with each company able to belong to multiple sectors. And it avoids being purely backward-looking by analyzing the words companies themselves use to describe their future intentions as well as current activities. The approach does have limits, however. An NLP classification may sometimes produce incoherent results, meaning subsequent human interpretation could introduce bias.