In the next five years, big data analysis is poised to become one of the most important and competitive skill sets around. Portfolio analysis in particular is where pension funds are focusing their big data investments.
Big data is a set of techniques embedded in the latest, most sophisticated technologies: social media analytics, digital video recognition, 5G cellular technology and much more. The capabilities of big data are incredibly powerful and extend far beyond traditional systems. Supported as a spying technology in the World War II and later, the Cold War, core big data techniques were developed in the 1940s, 1960s and 1980s and are well tested and established.
At the heart of big data is the concept of data as a structure that obeys well-defined principles. Regardless of how random, disjointed or even empty your data pool might be — a lot of big data addresses issues with missing values — big data techniques help distill the important drivers and create meaningful inferences.
Many portfolio managers swim in data. And the more data, the merrier, think modern financial industry stalwarts such as Goldman Sachs Group Inc. Indeed, Goldman recently created a central data "lake" — a repository of all sorts of data accessible to all channels of the organization. Synthesizing data and making inferences quickly and efficiently is what the big data techniques do best. In other words, separating the wheat from the chaff in data with big data analytics takes only seconds, not weeks, as traditionally has been the case.
As explained in our latest research paper, "Big Data in Portfolio Allocation," the key to successful data processing lies in the concept of eigenvalues, first invented in the 18th century as a tool to help solve differential equations. Eigenvalues, also known as characteristic values, help explain and reduce any data set, no matter how disjointed, random or even empty. Furthermore, the past 50 years have seen an explosion of interest and analysis of eigenvalue behavior. In particular, eigenvalues have been shown to obey strict distributional properties, including how close to each other they may or may not occur.
Eigenvalue analysis is performed using two, now-common big data techniques: principal component analysis or its cousin, singular value decomposition. Both techniques allow users to identify and label critical and less relevant factors in data so that researchers can replace original data with smaller, more informative, yet equivalent data sets.
Eigenvalue analysis is, therefore, particularly important to institutional portfolio managers dealing with a large number of financial instruments and an explosive number of exchange-traded funds. The sheer number of financial instruments available today can produce unwieldy inferences in traditional modern portfolio theory management frameworks. In addition, multiple studies have shown that the vanilla MPT framework applied to a large number of instruments creates two major problems for optimal portfolio allocation: unreasonably extreme positions and possibly extreme reallocations, resulting in poor diversification, and high transaction costs in practice. Many portfolio managers resort to "art" — experience and intuition — and strict threshold policies to constrain extreme positions and portfolio composition changes, but that tends to defeat the purpose of mathematical approaches. Big data eigenvalue analysis, on the other hand, allows researchers to reoptimize the MPT optimization framework with scientific precision.
MPT reallocation involves computation of the correlation matrix of the assets, then inverting the correlation matrix to generate optimal portfolio weights. Several big data studies proposed breaking down the correlation matrix and then picking up the key components in an effort to reduce the matrix' complexity. The straight-on decomposition of the correlation matrix, however, has met with muted results in part because it failed to produce significant portfolio gains.
Our research paper is the first to study the big data properties of the inverse of the correlation matrix. The paper shows the inverse is much more informative than the correlation matrix itself, from a big data perspective. Subsequently, the paper proposes big data approaches to harness the correlation inverse and to deliver superior out-of-sample returns.
Our research uses daily closing prices to construct monthly correlation matrices for end-of-month portfolio reallocation. By decomposing the inverse of the correlation matrix using a brand-new technique (even for the big data universe), we show that the performance of the Standard & Poor's 500 portfolio alone can be improved by as much as 400% over the 20 years from 1998 to 2017.
And this is just one of the applications showcasing the big power of big data. Armed with eigenvalues and other big data techniques, researchers can identify key trends, drivers and even missing or latent variables quickly, seamlessly and with great precision. Huge amounts of data become trivial and information of all sorts is more valuable than ever. Most importantly, researchers, analysts and portfolio managers can save countless hours and ramp up productivity by letting technology do the grunt work. Then, researchers and portfolio managers can focus their efforts on creating inferences from smaller, much more powerful datasets condensed by big data.