Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method.[1] It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions.[2][3][4] Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology.[5][6][7] It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models.[8][9] The INLA method is implemented in the R-INLA R package.[10]
Let y = ( y 1 , … , y n ) {\displaystyle {\boldsymbol {y}}=(y_{1},\dots ,y_{n})} denote the response variable (that is, the observations) which belongs to an exponential family, with the mean μ i {\displaystyle \mu _{i}} (of y i {\displaystyle y_{i}} ) being linked to a linear predictor η i {\displaystyle \eta _{i}} via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector x {\displaystyle {\boldsymbol {x}}} . The hyperparameters of the model are denoted by θ {\displaystyle {\boldsymbol {\theta }}} . As per Bayesian statistics, x {\displaystyle {\boldsymbol {x}}} and θ {\displaystyle {\boldsymbol {\theta }}} are random variables with prior distributions.
The observations are assumed to be conditionally independent given x {\displaystyle {\boldsymbol {x}}} and θ {\displaystyle {\boldsymbol {\theta }}} : π ( y | x , θ ) = ∏ i ∈ I π ( y i | η i , θ ) , {\displaystyle \pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }})=\prod _{i\in {\mathcal {I}}}\pi (y_{i}|\eta _{i},{\boldsymbol {\theta }}),} where I {\displaystyle {\mathcal {I}}} is the set of indices for observed elements of y {\displaystyle {\boldsymbol {y}}} (some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor η {\displaystyle {\boldsymbol {\eta }}} is part of x {\displaystyle {\boldsymbol {x}}} .
For the model to be a latent Gaussian model, it is assumed that x | θ {\displaystyle {\boldsymbol {x}}|{\boldsymbol {\theta }}} is a Gaussian Markov Random Field (GMRF)[1] (that is, a multivariate Gaussian with additional conditional independence properties) with probability density π ( x | θ ) ∝ | Q θ | 1 / 2 exp ( − 1 2 x T Q θ x ) , {\displaystyle \pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\propto \left|{\boldsymbol {Q_{\theta }}}\right|^{1/2}\exp \left(-{\frac {1}{2}}{\boldsymbol {x}}^{T}{\boldsymbol {Q_{\theta }}}{\boldsymbol {x}}\right),} where Q θ {\displaystyle {\boldsymbol {Q_{\theta }}}} is a θ {\displaystyle {\boldsymbol {\theta }}} -dependent sparse precision matrix and | Q θ | {\displaystyle \left|{\boldsymbol {Q_{\theta }}}\right|} is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution π ( θ ) {\displaystyle \pi ({\boldsymbol {\theta }})} for the hyperparameters need not be Gaussian. However, the number of hyperparameters, m = d i m ( θ ) {\displaystyle m=\mathrm {dim} ({\boldsymbol {\theta }})} , is assumed to be small (say, less than 15).
In Bayesian inference, one wants to solve for the posterior distribution of the latent variables x {\displaystyle {\boldsymbol {x}}} and θ {\displaystyle {\boldsymbol {\theta }}} . Applying Bayes' theorem π ( x , θ | y ) = π ( y | x , θ ) π ( x | θ ) π ( θ ) π ( y ) , {\displaystyle \pi ({\boldsymbol {x}},{\boldsymbol {\theta }}|{\boldsymbol {y}})={\frac {\pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\pi ({\boldsymbol {\theta }})}{\pi ({\boldsymbol {y}})}},} the joint posterior distribution of x {\displaystyle {\boldsymbol {x}}} and θ {\displaystyle {\boldsymbol {\theta }}} is given by π ( x , θ | y ) ∝ π ( θ ) π ( x | θ ) ∏ i π ( y i | η i , θ ) ∝ π ( θ ) | Q θ | 1 / 2 exp ( − 1 2 x T Q θ x + ∑ i log [ π ( y i | η i , θ ) ] ) . {\displaystyle {\begin{aligned}\pi ({\boldsymbol {x}},{\boldsymbol {\theta }}|{\boldsymbol {y}})&\propto \pi ({\boldsymbol {\theta }})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }})\prod _{i}\pi (y_{i}|\eta _{i},{\boldsymbol {\theta }})\\&\propto \pi ({\boldsymbol {\theta }})\left|{\boldsymbol {Q_{\theta }}}\right|^{1/2}\exp \left(-{\frac {1}{2}}{\boldsymbol {x}}^{T}{\boldsymbol {Q_{\theta }}}{\boldsymbol {x}}+\sum _{i}\log \left[\pi (y_{i}|\eta _{i},{\boldsymbol {\theta }})\right]\right).\end{aligned}}} Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals π ( x i | y ) = ∫ π ( x i | θ , y ) π ( θ | y ) d θ π ( θ j | y ) = ∫ π ( θ | y ) d θ − j , {\displaystyle {\begin{array}{rcl}\pi (x_{i}|{\boldsymbol {y}})&=&\int \pi (x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}})\pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}\\\pi (\theta _{j}|{\boldsymbol {y}})&=&\int \pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}_{-j},\end{array}}} where θ − j = ( θ 1 , … , θ j − 1 , θ j + 1 , … , θ m ) {\displaystyle {\boldsymbol {\theta }}_{-j}=\left(\theta _{1},\dots ,\theta _{j-1},\theta _{j+1},\dots ,\theta _{m}\right)} .
A key idea of INLA is to construct nested approximations given by π ~ ( x i | y ) = ∫ π ~ ( x i | θ , y ) π ~ ( θ | y ) d θ π ~ ( θ j | y ) = ∫ π ~ ( θ | y ) d θ − j , {\displaystyle {\begin{array}{rcl}{\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})&=&\int {\widetilde {\pi }}(x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}}){\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}\\{\widetilde {\pi }}(\theta _{j}|{\boldsymbol {y}})&=&\int {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})d{\boldsymbol {\theta }}_{-j},\end{array}}} where π ~ ( ⋅ | ⋅ ) {\displaystyle {\widetilde {\pi }}(\cdot |\cdot )} is an approximated posterior density. The approximation to the marginal density π ( x i | y ) {\displaystyle \pi (x_{i}|{\boldsymbol {y}})} is obtained in a nested fashion by first approximating π ( θ | y ) {\displaystyle \pi ({\boldsymbol {\theta }}|{\boldsymbol {y}})} and π ( x i | θ , y ) {\displaystyle \pi (x_{i}|{\boldsymbol {\theta }},{\boldsymbol {y}})} , and then numerically integrating out θ {\displaystyle {\boldsymbol {\theta }}} as π ~ ( x i | y ) = ∑ k π ~ ( x i | θ k , y ) × π ~ ( θ k | y ) × Δ k , {\displaystyle {\begin{aligned}{\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})=\sum _{k}{\widetilde {\pi }}\left(x_{i}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)\times {\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})\times \Delta _{k},\end{aligned}}} where the summation is over the values of θ {\displaystyle {\boldsymbol {\theta }}} , with integration weights given by Δ k {\displaystyle \Delta _{k}} . The approximation of π ( θ j | y ) {\displaystyle \pi (\theta _{j}|{\boldsymbol {y}})} is computed by numerically integrating θ − j {\displaystyle {\boldsymbol {\theta }}_{-j}} out from π ~ ( θ | y ) {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})} .
To get the approximate distribution π ~ ( θ | y ) {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})} , one can use the relation π ( θ | y ) = π ( x , θ , y ) π ( x | θ , y ) π ( y ) , {\displaystyle {\begin{aligned}{\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})={\frac {\pi \left({\boldsymbol {x}},{\boldsymbol {\theta }},{\boldsymbol {y}}\right)}{\pi \left({\boldsymbol {x}}|{\boldsymbol {\theta }},{\boldsymbol {y}}\right)\pi ({\boldsymbol {y}})}},\end{aligned}}} as the starting point. Then π ~ ( θ | y ) {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}|{\boldsymbol {y}})} is obtained at a specific value of the hyperparameters θ = θ k {\displaystyle {\boldsymbol {\theta }}={\boldsymbol {\theta }}_{k}} with Laplace's approximation[1] π ~ ( θ k | y ) ∝ π ( x , θ k , y ) π ~ G ( x | θ k , y ) | x = x ∗ ( θ k ) , ∝ π ( y | x , θ k ) π ( x | θ k ) π ( θ k ) π ~ G ( x | θ k , y ) | x = x ∗ ( θ k ) , {\displaystyle {\begin{aligned}{\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})&\propto \left.{\frac {\pi \left({\boldsymbol {x}},{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}{{\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}}\right\vert _{{\boldsymbol {x}}={\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})},\\&\propto \left.{\frac {\pi ({\boldsymbol {y}}|{\boldsymbol {x}},{\boldsymbol {\theta }}_{k})\pi ({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k})\pi ({\boldsymbol {\theta }}_{k})}{{\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)}}\right\vert _{{\boldsymbol {x}}={\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})},\end{aligned}}} where π ~ G ( x | θ k , y ) {\displaystyle {\widetilde {\pi }}_{G}\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} is the Gaussian approximation to π ( x | θ k , y ) {\displaystyle {\pi }\left({\boldsymbol {x}}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} whose mode at a given θ k {\displaystyle {\boldsymbol {\theta }}_{k}} is x ∗ ( θ k ) {\displaystyle {\boldsymbol {x}}^{*}({\boldsymbol {\theta }}_{k})} . The mode can be found numerically for example with the Newton-Raphson method.
The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of x {\displaystyle {\boldsymbol {x}}} in the denominator since it is usually close to a Gaussian due to the GMRF property of x {\displaystyle {\boldsymbol {x}}} . Applying the approximation here improves the accuracy of the method, since the posterior π ( θ | y ) {\displaystyle {\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})} itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on π ( θ | y ) {\displaystyle {\pi }({\boldsymbol {\theta }}|{\boldsymbol {y}})} . The second important property of a GMRF, the sparsity of the precision matrix Q θ k {\displaystyle {\boldsymbol {Q}}_{{\boldsymbol {\theta }}_{k}}} , is required for efficient computation of π ~ ( θ k | y ) {\displaystyle {\widetilde {\pi }}({\boldsymbol {\theta }}_{k}|{\boldsymbol {y}})} for each value θ k {\displaystyle {{\boldsymbol {\theta }}_{k}}} .[1]
Obtaining the approximate distribution π ~ ( x i | θ k , y ) {\displaystyle {\widetilde {\pi }}\left(x_{i}|{\boldsymbol {\theta }}_{k},{\boldsymbol {y}}\right)} is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation.[1] For the numerical integration to obtain π ~ ( x i | y ) {\displaystyle {\widetilde {\pi }}(x_{i}|{\boldsymbol {y}})} , also three options are available: grid search, central composite design, or empirical Bayes.[1]