Multivariate Synthetic Control to Estimate Missing Data in Hierarchically Correlated Series

Abstract:
The multivariate synthetic control method broadens the use of synthetic control to completely missing target data when correlated donor data exist. The synthetic control is performed by combining a Bayesian latent factor model and constrained quadratic program to hierarchically grouped time series. Weights are chosen by selecting donor groups with large correlations across latent factors and nested levels. The number of latent factors is chosen by a cross-validation procedure holding out known series.

Other info:
A real world example of website visitation data for mobile and desktop users in Latin America will be presented. We worked with Rok C. (Stan dev) to update the quadratic optimizer to use in the parameters block. We will show the combined usage of the quadratic optimizer, a multivariate latent factor model (MVN Cholesky parameterization) and multivariate priors coded in Stan. We ended up using VI and a cross-validation procedure to select the number of latent factors.

Presenter biography:
Sean Pinkney

Sean leads a team of data scientists at that create statistical and machine learning tools internally and externally at Comscore. The team is currently involved in cross platform media measurement and is working closely with the industry on a number of digital privacy initiatives. He has a decade of data science experience within the marketing and insurance industries.