Stan (software)

Stan
Original author(s)	Stan Development Team
Initial release	August 30, 2012
Stable release	2.36.0 / 10 December 2024; 5 months ago
Repository	github.com/stan-dev/stan ;
Written in	C++
Operating system	Unix-like, Microsoft Windows, Mac OS X
Platform	Intel x86 - 32-bit, x64
Type	Statistical package
License	New BSD License
Website	mc-stan.org

Stan is a probabilistic programming language for statistical inference written in C++.^[2] The Stan language is used to specify a (Bayesian) statistical model with an imperative program calculating the log probability density function.^[2]

Stan is licensed under the New BSD License. Stan is named in honour of Stanislaw Ulam, pioneer of the Monte Carlo method.^[2]

Stan was created by a development team consisting of 52 members^[3] that includes Andrew Gelman, Bob Carpenter, Daniel Lee, Ben Goodrich, and others.

Example

A simple linear regression model can be described as $y_{n}=\alpha +\beta x_{n}+\epsilon _{n}$ , where $\epsilon _{n}\sim {\text{normal}}(0,\sigma )$ . This can also be expressed as $y_{n}\sim {\text{normal}}(\alpha +\beta X_{n},\sigma )$ . The latter form can be written in Stan as the following:

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  y ~ normal(alpha + beta * x, sigma);
}

Interfaces

The Stan language itself can be accessed through several interfaces:

CmdStan – a command-line executable for the shell,
CmdStanR and rstan – R software libraries,
CmdStanPy and PyStan – libraries for the Python programming language,
CmdStan.rb - library for the Ruby programming language,
MatlabStan – integration with the MATLAB numerical computing environment,
Stan.jl – integration with the Julia programming language,
StataStan – integration with Stata.
Stan Playground - online at [1]

In addition, higher-level interfaces are provided with packages using Stan as backend, primarily in the R language:^[4]

rstanarm provides a drop-in replacement for frequentist models provided by base R and lme4 using the R formula syntax;
brms^[5] provides a wide array of linear and nonlinear models using the R formula syntax;
prophet provides automated procedures for time series forecasting.

Algorithms

Stan implements gradient-based Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference, stochastic, gradient-based variational Bayesian methods for approximate Bayesian inference, and gradient-based optimization for penalized maximum likelihood estimation.

MCMC algorithms:
- Hamiltonian Monte Carlo (HMC)
- No-U-Turn sampler^[2]^[6] (NUTS), a variant of HMC and Stan's default MCMC engine
Variational inference algorithms:
- Automatic Differentiation Variational Inference^[7]
- Pathfinder: Parallel quasi-Newton variational inference^[8]
Optimization algorithms:
- Limited-memory BFGS (L-BFGS) (Stan's default optimization algorithm)
- Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS)
- Laplace's approximation for classical standard error estimates and approximate Bayesian posteriors

Automatic differentiation

Stan implements reverse-mode automatic differentiation to calculate gradients of the model, which is required by HMC, NUTS, L-BFGS, BFGS, and variational inference.^[2] The automatic differentiation within Stan can be used outside of the probabilistic programming language.

Usage

Stan is used in fields including social science,^[9] pharmaceutical statistics,^[10] market research,^[11] and medical imaging.^[12]

References

^ "Release 2.36.0". 10 December 2024. Retrieved 30 December 2024.
^ ^a ^b ^c ^d ^e Stan Development Team. 2015. Stan Modeling Language User's Guide and Reference Manual, Version 2.9.0
^ "Development Team". stan-dev.github.io. Retrieved 2024-11-21.
^ Gabry, Jonah. "The current state of the Stan ecosystem in R". Statistical Modeling, Causal Inference, and Social Science. Retrieved 25 August 2020.
^ "BRMS: Bayesian Regression Models using 'Stan'". 23 August 2021.
^ Hoffman, Matthew D.; Gelman, Andrew (April 2014). "The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo". Journal of Machine Learning Research. 15: pp. 1593–1623.
^ Kucukelbir, Alp; Ranganath, Rajesh; Blei, David M. (June 2015). "Automatic Variational Inference in Stan". 1506 (3431). arXiv:1506.03431. Bibcode:2015arXiv150603431K. {{cite journal}}: Cite journal requires |journal= (help)
^ Zhang, Lu; Carpenter, Bob; Gelman, Andrew; Vehtari, Aki (2022). "Pathfinder: Parallel quasi-Newton variational inference". Journal of Machine Learning Research. 23 (306): 1–49.
^ Goodrich, Benjamin King, Wawro, Gregory and Katznelson, Ira, Designing Quantitative Historical Social Inquiry: An Introduction to Stan (2012). APSA 2012 Annual Meeting Paper. Available at SSRN 2105531
^ Natanegara, Fanni; Neuenschwander, Beat; Seaman, John W.; Kinnersley, Nelson; Heilmann, Cory R.; Ohlssen, David; Rochester, George (2013). "The current state of Bayesian methods in medical product development: survey results and recommendations from the DIA Bayesian Scientific Working Group". Pharmaceutical Statistics. 13 (1): 3–12. doi:10.1002/pst.1595. ISSN 1539-1612. PMID 24027093. S2CID 19738522.
^ Feit, Elea (15 May 2017). "Using Stan to Estimate Hierarchical Bayes Models". Retrieved 19 March 2019.
^ Gordon, GSD; Joseph, J; Alcolea, MP; Sawyer, T; Macfaden, AJ; Williams, C; Fitzpatrick, CRM; Jones, PH; di Pietro, M; Fitzgerald, RC; Wilkinson, TD; Bohndiek, SE (2019). "Quantitative phase and polarization imaging through an optical fiber applied to detection of early esophageal tumorigenesis". Journal of Biomedical Optics. 24 (12): 1–13. arXiv:1811.03977. Bibcode:2019JBO....24l6004G. doi:10.1117/1.JBO.24.12.126004. PMC 7006047. PMID 31840442.