Consensus-based optimization (CBO)[1] is a multi-agent derivative-free optimization method, designed to obtain solutions for global optimization problems of the form min x ∈ X f ( x ) , {\displaystyle \min _{x\in {\cal {X}}}f(x),}
where f : X → R {\displaystyle f:{\mathcal {X}}\to \mathbb {R} } denotes the objective function acting on the state space X {\displaystyle {\cal {X}}} , which is assumed to be a normed vector space. The function f {\displaystyle f} can potentially be nonconvex and nonsmooth. The algorithm employs particles or agents to explore the state space, which communicate with each other to update their positions. Their dynamics follows the paradigm of metaheuristics, which blend exporation with exploitation. In this sense, CBO is comparable to ant colony optimization, wind driven optimization,[2] particle swarm optimization or Simulated annealing.
Consider an ensemble of points x t = ( x t 1 , … , x t N ) ∈ X N {\displaystyle x_{t}=(x_{t}^{1},\dots ,x_{t}^{N})\in {\cal {X}}^{N}} , dependent of the time t ∈ [ 0 , ∞ ) {\displaystyle t\in [0,\infty )} . Then the update for the i {\displaystyle i} th particle is formulated as a stochastic differential equation,
d x t i = − λ ( x t i − c α ( x t ) ) d t ⏟ consensus drift + σ D ( x t i − c α ( x t ) ) d B t i ⏟ scaled diffusion , {\displaystyle dx_{t}^{i}=-\lambda \,\underbrace {(x_{t}^{i}-c_{\alpha }(x_{t}))\,dt} _{\text{consensus drift}}+\sigma \underbrace {D(x_{t}^{i}-c_{\alpha }(x_{t}))\,dB_{t}^{i}} _{\text{scaled diffusion}},}
with the following components:
In practice, the SDE is discretized via the Euler–Maruyama method such that the following explicit update formula for the ensemble x = ( x 1 , … , x N ) {\displaystyle x=(x^{1},\dots ,x^{N})} is obtained, x i ← x i − λ ( x i − c α ( x ) ) d t + σ D ( x i − c α ( x ) ) B i . {\displaystyle x^{i}\gets x^{i}-\lambda \,(x^{i}-c_{\alpha }(x))\,dt+\sigma D(x^{i}-c_{\alpha }(x))\,B^{i}.} If one can employ an efficient implementation of the LogSumExp functions, this can be beneficial for numerical stability of the consensus point computation. We refer to existing implementation in Python [1] and Julia [2].
Consensus-based optimization can be transformed into a sampling method[4] by modifying the noise term and choosing appropriate hyperparameters. Namely, one considers the following SDE
d x t i = − ( x t i − c α ( x t ) ) d t + 2 λ ~ − 1 C α ( x t ) d B t i , {\displaystyle dx_{t}^{i}=-(x_{t}^{i}-c_{\alpha }(x_{t}))\,dt+{\sqrt {2{\tilde {\lambda }}^{-1}\,C_{\alpha }(x_{t})}}\,dB_{t}^{i},}
where the weighted covariance matrix is defined as
C α ( x t ) := 1 ∑ i = 1 N ω α ( x t i ) ∑ i = 1 N ( x t i − c ( x t ) ) ⊗ ( x t i − c ( x t ) ) ω ( x t i ) {\displaystyle C_{\alpha }(x_{t}):={\frac {1}{\sum _{i=1}^{N}\omega _{\alpha }(x_{t}^{i})}}\sum _{i=1}^{N}(x_{t}^{i}-c(x_{t}))\otimes (x_{t}^{i}-c(x_{t}))\omega (x_{t}^{i})} .
If the parameters are chosen such that λ ~ − 1 = ( 1 + α ) {\displaystyle {\tilde {\lambda }}^{-1}=(1+\alpha )} the above scheme creates approximate samples of a probability distribution with a density, that is proportional to exp ( − α f ) {\displaystyle \exp(-\alpha f)} .
If the function f {\displaystyle f} is multi-modal, i.e., has more than one global minimum, the standard CBO algorithm can only find one of these points. However, one can “polarize”[5] the consensus computation by introducing a kernel k : X × X → [ 0 , ∞ ) {\displaystyle k:{\cal {{X}\times {\cal {{X}\to [0,\infty )}}}}} that includes local information into the weighting. In this case, every particle has its own version of the consensus point, which is computed as c α j ( x ) = 1 ∑ i = 1 N ω α j ( x i ) ∑ i = 1 N x i ω α j ( x i ) , with ω α j ( ⋅ ) = e x p ( − α f ( ⋅ ) ) k ( ⋅ , x j ) . {\displaystyle c_{\alpha }^{j}(x)={\frac {1}{\sum _{i=1}^{N}\omega _{\alpha }^{j}(x^{i})}}\sum _{i=1}^{N}x^{i}\ \omega _{\alpha }^{j}(x^{i}),\quad {\text{ with }}\quad \omega _{\alpha }^{j}(\,\cdot \,)=\mathrm {exp} (-\alpha f(\,\cdot \,))\,k(\cdot ,x^{j}).} In this case, the drift is a vector field over the state space X {\displaystyle {\cal {X}}} . Intuitively, particles are now not only attracted to other particles based on their objective value, but also based on their spatial locality. For a constant kernel function, the polarized version corresponds to standard CBO and is therefore a generalization. We briefly give some examples of common configurations: