In computational statistics, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a probability distribution for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a random walk that has the target probability distribution as an invariant measure:
Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by Julian Besag in 1994,[1] (although the method Smart Monte Carlo was already introduced in 1978 [2]) and its properties were examined in detail by Gareth Roberts together with Richard Tweedie[3] and Jeff Rosenthal.[4] Many variations and refinements have been introduced since then, e.g. the manifold variant of Girolami and Calderhead (2011).[5] The method is equivalent to using the Hamiltonian Monte Carlo (hybrid Monte Carlo) algorithm with only a single discrete time step.[5]
Let π {\displaystyle \pi } denote a probability density function on R d {\displaystyle \mathbb {R} ^{d}} , one from which it is desired to draw an ensemble of independent and identically distributed samples. We consider the overdamped Langevin Itô diffusion
driven by the time derivative of a standard Brownian motion W {\displaystyle W} . (Note that another commonly-used normalization for this diffusion is
which generates the same dynamics.) In the limit as t → ∞ {\displaystyle t\to \infty } , this probability distribution ρ ( t ) {\displaystyle \rho (t)} of X ( t ) {\displaystyle X(t)} approaches a stationary distribution, which is also invariant under the diffusion, which we denote ρ ∞ {\displaystyle \rho _{\infty }} . It turns out that, in fact, ρ ∞ = π {\displaystyle \rho _{\infty }=\pi } .
Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the Euler–Maruyama method with a fixed time step τ > 0 {\displaystyle \tau >0} . We set X 0 := x 0 {\displaystyle X_{0}:=x_{0}} and then recursively define an approximation X k {\displaystyle X_{k}} to the true solution X ( k τ ) {\displaystyle X(k\tau )} by
where each ξ k {\displaystyle \xi _{k}} is an independent draw from a multivariate normal distribution on R d {\displaystyle \mathbb {R} ^{d}} with mean 0 and covariance matrix equal to the d × d {\displaystyle d\times d} identity matrix. Note that X k + 1 {\displaystyle X_{k+1}} is normally distributed with mean X k + τ ∇ log π ( X k ) {\displaystyle X_{k}+\tau \nabla \log \pi (X_{k})} and covariance equal to 2 τ {\displaystyle 2\tau } times the d × d {\displaystyle d\times d} identity matrix.
In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates X k {\displaystyle X_{k}} according to the update rule
MALA incorporates an additional step. We consider the above update rule as defining a proposal X ~ k + 1 {\displaystyle {\tilde {X}}_{k+1}} for a new state,
This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set
where
is the transition probability density from x {\displaystyle x} to x ′ {\displaystyle x'} (note that, in general q ( x ′ ∣ x ) ≠ q ( x ∣ x ′ ) {\displaystyle q(x'\mid x)\neq q(x\mid x')} ). Let u {\displaystyle u} be drawn from the continuous uniform distribution on the interval [ 0 , 1 ] {\displaystyle [0,1]} . If u ≤ α {\displaystyle u\leq \alpha } , then the proposal is accepted, and we set X k + 1 := X ~ k + 1 {\displaystyle X_{k+1}:={\tilde {X}}_{k+1}} ; otherwise, the proposal is rejected, and we set X k + 1 := X k {\displaystyle X_{k+1}:=X_{k}} .
The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the detailed balance conditions necessary for the existence of a unique, invariant, stationary distribution ρ ∞ = π {\displaystyle \rho _{\infty }=\pi } . Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher π {\displaystyle \pi } probability, which are then more likely to be accepted. On the other hand, when π {\displaystyle \pi } is strongly anisotropic (i.e. it varies much more quickly in some directions than others), it is necessary to take 0 < τ ≪ 1 {\displaystyle 0<\tau \ll 1} in order to properly capture the Langevin dynamics; the use of a positive-definite preconditioning matrix A ∈ R d × d {\displaystyle A\in \mathbb {R} ^{d\times d}} can help to alleviate this problem, by generating proposals according to
so that X ~ k + 1 {\displaystyle {\tilde {X}}_{k+1}} has mean X k + τ A ∇ log π ( X k ) {\displaystyle X_{k}+\tau A\nabla \log \pi (X_{k})} and covariance 2 τ A {\displaystyle 2\tau A} .
For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be 0.574 {\displaystyle 0.574} ; if it is discovered to be substantially different in practice, τ {\displaystyle \tau } should be modified accordingly.[4]