Consensus based optimization

Consensus-based optimization (CBO)^[1] is a multi-agent derivative-free optimization method, designed to obtain solutions for global optimization problems of the form $\min _{x\in {\cal {X}}}f(x),$

Behavior of CBO on the Rastrigin function. **Blue:** Particles, **Pink:** drift vectors and consensus point.

where $f:{\mathcal {X}}\to \mathbb {R}$ denotes the objective function acting on the state space ${\cal {X}}$ , which is assumed to be a normed vector space. The function $f$ can potentially be nonconvex and nonsmooth. The algorithm employs particles or agents to explore the state space, which communicate with each other to update their positions. Their dynamics follows the paradigm of metaheuristics, which blend exporation with exploitation. In this sense, CBO is comparable to ant colony optimization, wind driven optimization,^[2] particle swarm optimization or Simulated annealing.

Algorithm

Consider an ensemble of points $x_{t}=(x_{t}^{1},\dots ,x_{t}^{N})\in {\cal {X}}^{N}$ , dependent of the time $t\in [0,\infty )$ . Then the update for the $i$ th particle is formulated as a stochastic differential equation,

$dx_{t}^{i}=-\lambda \,\underbrace {(x_{t}^{i}-c_{\alpha }(x_{t}))\,dt} _{\text{consensus drift}}+\sigma \underbrace {D(x_{t}^{i}-c_{\alpha }(x_{t}))\,dB_{t}^{i}} _{\text{scaled diffusion}},$

with the following components:

The consensus point $c_{\alpha }(x)$ : The key idea of CBO is that in each step the particles “agree” on a common consensus point, by computing an average of their positions, weighted by their current objective function value $c_{\alpha }(x_{t})={\frac {1}{\sum _{i=1}^{N}\omega _{\alpha }(x_{t}^{i})}}\sum _{i=1}^{N}x_{t}^{i}\ \omega _{\alpha }(x_{t}^{i}),\quad {\text{ with }}\quad \omega _{\alpha }(\,\cdot \,)=\mathrm {exp} (-\alpha f(\,\cdot \,)).$ This point is then used in the drift term $x_{t}^{i}-c_{\alpha }(x_{t})$ , which moves each particle into the direction of the consensus point.
Scaled noise: For each $t\geq 0$ $t\geq 0$ and $i=1,\dots ,N$ $i=1,\dots ,N$ , we denote by $B_{t}^{i}$ $B_{t}^{i}$ independent standard Brownian motions. The function $D:{\cal {X}}\to \mathbb {R} ^{s}$ $D:{\cal {X}}\to \mathbb {R} ^{s}$ incorporates the drift of the $i$ $i$ th particle and determines the noise model. The most common choices are:
- Isotropic noise, $D(\cdot )=\|\cdot \|$ : In this case $s=1$ and every component of the noise vector is scaled equally. This was used in the original version of the algorithm.^[1]
- Anisotropic noise^[3], $D(\cdot )=|\cdot |$ : In the special case, where ${\cal {X}}\subset \mathbb {R} ^{d}$ , this means that $s=d$ and $D$ applies the absolute value function component-wise. Here, every component of the noise vector is scaled, dependent on the corresponding entry of the drift vector.
Hyperparameters: The parameter $\sigma \geq 0$ $\sigma \geq 0$ scales the influence of the noise term. The parameter $\alpha \geq 0$ $\alpha \geq 0$ determines the separation effect of the particles:^[1]
- in the limit $\alpha \to 0$ every particle is assigned the same weight and the consensus point is a regular mean.
- In the limit $\alpha \to \infty$ the consensus point corresponds to the particle with the best objective value, completely ignoring the position of other points in the ensemble.

Implementation notes

In practice, the SDE is discretized via the Euler–Maruyama method such that the following explicit update formula for the ensemble $x=(x^{1},\dots ,x^{N})$ is obtained, $x^{i}\gets x^{i}-\lambda \,(x^{i}-c_{\alpha }(x))\,dt+\sigma D(x^{i}-c_{\alpha }(x))\,B^{i}.$ If one can employ an efficient implementation of the LogSumExp functions, this can be beneficial for numerical stability of the consensus point computation. We refer to existing implementation in Python [1] and Julia [2].

Variants

Sampling

Consensus-based optimization can be transformed into a sampling method^[4] by modifying the noise term and choosing appropriate hyperparameters. Namely, one considers the following SDE

$dx_{t}^{i}=-(x_{t}^{i}-c_{\alpha }(x_{t}))\,dt+{\sqrt {2{\tilde {\lambda }}^{-1}\,C_{\alpha }(x_{t})}}\,dB_{t}^{i},$

where the weighted covariance matrix is defined as

$C_{\alpha }(x_{t}):={\frac {1}{\sum _{i=1}^{N}\omega _{\alpha }(x_{t}^{i})}}\sum _{i=1}^{N}(x_{t}^{i}-c(x_{t}))\otimes (x_{t}^{i}-c(x_{t}))\omega (x_{t}^{i})$ .

If the parameters are chosen such that ${\tilde {\lambda }}^{-1}=(1+\alpha )$ the above scheme creates approximate samples of a probability distribution with a density, that is proportional to $\exp(-\alpha f)$ .

Polarization

If the function $f$ is multi-modal, i.e., has more than one global minimum, the standard CBO algorithm can only find one of these points. However, one can “polarize”^[5] the consensus computation by introducing a kernel $k:{\cal {{X}\times {\cal {{X}\to [0,\infty )}}}}$ that includes local information into the weighting. In this case, every particle has its own version of the consensus point, which is computed as $c_{\alpha }^{j}(x)={\frac {1}{\sum _{i=1}^{N}\omega _{\alpha }^{j}(x^{i})}}\sum _{i=1}^{N}x^{i}\ \omega _{\alpha }^{j}(x^{i}),\quad {\text{ with }}\quad \omega _{\alpha }^{j}(\,\cdot \,)=\mathrm {exp} (-\alpha f(\,\cdot \,))\,k(\cdot ,x^{j}).$ In this case, the drift is a vector field over the state space ${\cal {X}}$ . Intuitively, particles are now not only attracted to other particles based on their objective value, but also based on their spatial locality. For a constant kernel function, the polarized version corresponds to standard CBO and is therefore a generalization. We briefly give some examples of common configurations:

Gaussian kernel $k(\cdot ,\cdot )=\exp \left(-{\frac {1}{2\kappa ^{2}\alpha }}\|\cdot -\cdot \|_{2}^{2}\right)$ : the parameter $\kappa$ determines the communication radius of particles. This choice corresponds to a local convex regularization of the objective function $f$ .
Mean-shift algorithm:^[6] Employing polarized CBO for a constant objective function $f$ , together with no noise (i.e. $\sigma =0$ ) and an Euler–Maruyama discretization with step size $dt=1$ , corresponds to the mean-shift algorithm.
Bounded confidence model: When choosing a constant objective function, no noise model, but also the special kernel function $k(x,{\tilde {x}})=1_{\|x-{\tilde {x}}\|\leq \kappa }$ , the SDE in transforms to a ODE known as the bounded confidence model,^[7] which arises in opinion dynamics.

References

^ ^a ^b ^c Pinnau, René; Totzeck, Claudia; Tse, Oliver; Martin, Stephan (January 2017). "A consensus-based model for global optimization and its mean-field limit". Mathematical Models and Methods in Applied Sciences. 27 (1): 183–204. arXiv:1604.05648. doi:10.1142/S0218202517400061. ISSN 0218-2025. S2CID 119296432.
^ Bayraktar, Zikri; Komurcu, Muge; Bossard, Jeremy A.; Werner, Douglas H. (2013). "The Wind Driven Optimization Technique and its Application in Electromagnetics". IEEE Transactions on Antennas and Propagation. 61 (5): 2745–2757. Bibcode:2013ITAP...61.2745B. doi:10.1109/TAP.2013.2238654. S2CID 38181295. Retrieved 2024-02-03.
^ Carrillo, José A.; Jin, Shi; Li, Lei; Zhu, Yuhua (2020-03-04). "A consensus-based global optimization method for high dimensional machine learning problems". arXiv:1909.09249 [math.OC].
^ Carrillo, J. A.; Hoffmann, F.; Stuart, A. M.; Vaes, U. (2021-11-04), Consensus Based Sampling, arXiv:2106.02519
^ Bungert, Leon; Roith, Tim; Wacker, Philipp (2023-10-09). "Polarized consensus-based dynamics for optimization and sampling". arXiv:2211.05238 [math.OC].
^ Fukunaga, K.; Hostetler, L. (January 1975). "The estimation of the gradient of a density function, with applications in pattern recognition". IEEE Transactions on Information Theory. 21 (1): 32–40. doi:10.1109/TIT.1975.1055330. ISSN 0018-9448.
^ Deffuant, Guillaume; Neau, David; Amblard, Frederic; Weisbuch, Gérard (January 2000). "Mixing beliefs among interacting agents". Advances in Complex Systems. 03 (1n04): 87–98. doi:10.1142/S0219525900000078. ISSN 0219-5259. S2CID 15604530.

[:0-1] Pinnau, René; Totzeck, Claudia; Tse, Oliver; Martin, Stephan (January 2017). "A consensus-based model for global optimization and its mean-field limit". Mathematical Models and Methods in Applied Sciences. 27 (1): 183–204. arXiv:1604.05648. doi:10.1142/S0218202517400061. ISSN 0218-2025. S2CID 119296432.

[2] Bayraktar, Zikri; Komurcu, Muge; Bossard, Jeremy A.; Werner, Douglas H. (2013). "The Wind Driven Optimization Technique and its Application in Electromagnetics". IEEE Transactions on Antennas and Propagation. 61 (5): 2745–2757. Bibcode:2013ITAP...61.2745B. doi:10.1109/TAP.2013.2238654. S2CID 38181295. Retrieved 2024-02-03.

[3] Carrillo, José A.; Jin, Shi; Li, Lei; Zhu, Yuhua (2020-03-04). "A consensus-based global optimization method for high dimensional machine learning problems". arXiv:1909.09249 [math.OC].

[4] Carrillo, J. A.; Hoffmann, F.; Stuart, A. M.; Vaes, U. (2021-11-04), Consensus Based Sampling, arXiv:2106.02519

[5] Bungert, Leon; Roith, Tim; Wacker, Philipp (2023-10-09). "Polarized consensus-based dynamics for optimization and sampling". arXiv:2211.05238 [math.OC].

[6] Fukunaga, K.; Hostetler, L. (January 1975). "The estimation of the gradient of a density function, with applications in pattern recognition". IEEE Transactions on Information Theory. 21 (1): 32–40. doi:10.1109/TIT.1975.1055330. ISSN 0018-9448.

[7] Deffuant, Guillaume; Neau, David; Amblard, Frederic; Weisbuch, Gérard (January 2000). "Mixing beliefs among interacting agents". Advances in Complex Systems. 03 (1n04): 87–98. doi:10.1142/S0219525900000078. ISSN 0219-5259. S2CID 15604530.

[1]

[2]

[3]

[4]

[5]

[6]

[7]