ArviZ, InferenceData, and NetCDF: A unified file format for Bayesians

Recently many libraries have been built to specify probabilistic models as executable code.
ArviZ aims to unify common Bayesian statistical analysis by providing up to date plotting, diagnostic capabilities, and a common data structure, InferenceData. InferenceData is based on netCDF file format, which is a standard and language agnostic format. ArviZ is available in Python and Julia and works with PyStan cmdStan and other probabilistic programming languages (PPL).

By using a common format Stan users will be able to share their results in a way that any Bayesian can analyze those results after inference, not just other Stan users. If other plotting packages also adopt InferenceData, inference and analysis can then be performed as virtually independent tasks using different libraries and tools. Users of other PPLs could then save their results in netCDF and use R packages such as loo, bayesplot, etc. and users of RStan or brms could use ArviZ to take advantage of xarray capabilities for high-dimensional analysis and distributed computing or to use specific matplotlib and bokeh capabilities.

We therefore hope InferenceData will become a key tool in Bayesian analysis by providing users with a consistent data format to analyze and compare results across different PPLs. By unifying the common pre and post inference toolset, the workflow is simpler for both PPL designers, Bayesian packages maintainers and Bayesian practitioners. Through unification we hope this increases reproducibility and knowledge sharing across the bayesian community.


Presenter biography:
Ravin Kumar

ArviZ is developed and maintained by many folks internationally. All biographies for recorded talks in in this google doc