The State of GPU Computation Support for Stan

Our presentation details the current state of and future work on the OpenCL-based framework that allows the Stan automatic differentiation library to utilize GPUs. Our research was initially motivated by large Gaussian Process models, where the computation is dominated by the Cholesky decomposition, but has since developed into an extensible framework. Major improvements in the past year include simplified use of the framework, improved data caching, and an implementation of kernel fusion, which will substantially reduce the work required for adding GPU support for other Stan math routines. We will show some experimental results that illustrate how computation times scale when running multiple chains with multiple CPU cores and a single GPU. Finally, we will discuss directions for future work, routines to implement next, auto-tuning tunable GPU parameters, and support for multiple heterogeneous devices. A relatively up-to-date account of the Stan GPU framework can be found in our arXiv paper: