Explaining the brain's abilities through statistical principles
Bayesian approaches to brain function investigate the capacity of the nervous system to operate in situations of uncertainty in a fashion that is close to the optimal prescribed by Bayesian statistics.[1][2] This term is used in behavioural sciences and neuroscience and studies associated with this term often strive to explain the brain's cognitive abilities based on statistical principles. It is frequently assumed that the nervous system maintains internal probabilistic models that are updated by neural processing of sensory information using methods approximating those of Bayesian probability.[3][4]
Origins
This field of study has its historical roots in numerous disciplines including machine learning, experimental psychology and Bayesian statistics. As early as the 1860s, with the work of Hermann Helmholtz in experimental psychology, the brain's ability to extract perceptual information from sensory data was modeled in terms of probabilistic estimation.[5][6] The basic idea is that the nervous system needs to organize sensory data into an accurate internal model of the outside world.
Bayesian probability has been developed by many important contributors. Pierre-Simon Laplace, Thomas Bayes, Harold Jeffreys, Richard Cox and Edwin Jaynes developed mathematical techniques and procedures for treating probability as the degree of plausibility that could be assigned to a given supposition or hypothesis based on the available evidence.[7] In 1988 Edwin Jaynes presented a framework for using Bayesian Probability to model mental processes.[8] It was thus realized early on that the Bayesian statistical framework holds the potential to lead to insights into the function of the nervous system.
This idea was taken up in research on unsupervised learning, in particular the Analysis by Synthesis approach, branches of machine learning.[9][10] In 1983 Geoffrey Hinton and colleagues proposed the brain could be seen as a machine making decisions based on the uncertainties of the outside world.[11] During the 1990s researchers including Peter Dayan, Geoffrey Hinton and Richard Zemel proposed that the brain represents knowledge of the world in terms of probabilities and made specific proposals for tractable neural processes that could manifest such a Helmholtz Machine.[12][13][14]
A wide range of studies interpret the results of psychophysical experiments in light of Bayesian perceptual models. Many aspects of human perceptual and motor behavior can be modeled with Bayesian statistics. This approach, with its emphasis on behavioral outcomes as the ultimate expressions of neural information processing, is also known for modeling sensory and motor decisions using Bayesian decision theory. Examples are the work of Landy,[15][16] Jacobs,[17][18] Jordan, Knill,[19][20] Kording and Wolpert,[21][22] and Goldreich.[23][24][25]
Neural coding
Many theoretical studies ask how the nervous system could implement Bayesian algorithms. Examples are the work of Pouget, Zemel, Deneve, Latham, Hinton and Dayan. George and Hawkins published a paper that establishes a model of cortical information processing called hierarchical temporal memory that is based on Bayesian network of Markov chains. They further map this mathematical model to the existing knowledge about the architecture of cortex and show how neurons could recognize patterns by hierarchical Bayesian inference.[26]
Electrophysiology
A number of recent electrophysiological studies focus on the representation of probabilities in the nervous system. Examples are the work of Shadlen and Schultz.
Predictive coding
Predictive coding is a neurobiologically plausible scheme for inferring the causes of sensory input based on minimizing prediction error.[27] These schemes are related formally to Kalman filtering and other Bayesian update schemes.
During the 1990s some researchers such as Geoffrey Hinton and Karl Friston began examining the concept of free energy as a calculably tractable measure of the discrepancy between actual features of the world and representations of those features captured by neural network models.[28] A synthesis has been attempted recently[29] by Karl Friston, in which the Bayesian brain emerges from a general principle of free energy minimisation.[30] In this framework, both action and perception are seen as a consequence of suppressing free-energy, leading to perceptual[31] and active inference[32] and a more embodied (enactive) view of the Bayesian brain. Using variational Bayesian methods, it can be shown how internal models of the world are updated by sensory information to minimize free energy or the discrepancy between sensory input and predictions of that input. This can be cast (in neurobiologically plausible terms) as predictive coding or, more generally, Bayesian filtering.
"The free-energy considered here represents a bound on the surprise inherent in any exchange with the environment, under expectations encoded by its state or configuration. A system can minimise free energy by changing its configuration to change the way it samples the environment, or to change its expectations. These changes correspond to action and perception, respectively, and lead to an adaptive exchange with the environment that is characteristic of biological systems. This treatment implies that the system’s state and structure encode an implicit and probabilistic model of the environment."[33]
This area of research was summarized in terms understandable by the layperson in a 2008 article in New Scientist that offered a unifying theory of brain function.[34] Friston makes the following claims about the explanatory power of the theory:
"This model of brain function can explain a wide range of anatomical and physiological aspects of brain systems; for example, the hierarchical deployment of cortical areas, recurrent architectures using forward and backward connections and functional asymmetries in these connections. In terms of synaptic physiology, it predicts associative plasticity and, for dynamic models, spike-timing-dependent plasticity. In terms of electrophysiology it accounts for classical and extra-classical receptive field effects and long-latency or endogenous components of evoked cortical responses. It predicts the attenuation of responses encoding prediction error with perceptual learning and explains many phenomena like repetition suppression, mismatch negativity and the P300 in electroencephalography. In psychophysical terms, it accounts for the behavioural correlates of these physiological phenomena, e.g., priming, and global precedence."[33]
"It is fairly easy to show that both perceptual inference and learning rest on a minimisation of free energy or suppression of prediction error."[33]
^Kenji Doya (Editor), Shin Ishii (Editor), Alexandre Pouget (Editor), Rajesh P. N. Rao (Editor) (2007), Bayesian Brain: Probabilistic Approaches to Neural Coding, The MIT Press; 1 edition (Jan 1 2007)
^Knill David, Pouget Alexandre (2004), The Bayesian brain: the role of uncertainty in neural coding and computation, Trends in Neurosciences Vol.27 No.12 December 2004
^Helmholtz, H. (1860/1962). Handbuch der physiologischen optik (Southall, J. P. C. (Ed.), English trans.), Vol. 3. New York: Dover.
^Westheimer, G. (2008) Was Helmholtz a Bayesian?" Perception 39, 642–50
^Jaynes, E. T., 1986, `Bayesian Methods: General Background,' in Maximum-Entropy and Bayesian Methods in Applied Statistics, J. H. Justice (ed.), Cambridge Univ. Press, Cambridge
^Ghahramani, Z. (2004). Unsupervised learning. In O. Bousquet, G. Raetsch, & U. von Luxburg
(Eds.), Advanced lectures on machine learning. Berlin: Springer-Verlag.
^Neisser, U., 1967. Cognitive Psychology. Appleton-Century-Crofts, New York.
^Fahlman, S.E., Hinton, G.E. and Sejnowski, T.J.(1983). Massively parallel architectures for A.I.: Netl, Thistle, and Boltzmann machines. Proceedings of the National Conference on Artificial Intelligence, Washington DC.
^Dayan, P., Hinton, G. E., & Neal, R. M. (1995). The Helmholtz machine. Neural Computation, 7, 889–904.
^Dayan, P. and Hinton, G. E. (1996), Varieties of Helmholtz machines, Neural Networks, 9 1385–1403.
^Hinton, G. E., Dayan, P., To, A. and Neal R. M. (1995), The Helmholtz machine through time., Fogelman-Soulie and R. Gallinari (editors) ICANN-95, 483–490
^Goldreich, D; Peterson, MA (2012). "A Bayesian observer replicates convexity context effects in figure-ground perception". Seeing and Perceiving. 25 (3–4): 365–95. doi:10.1163/187847612X634445. PMID22564398. S2CID4931501.
^George D, Hawkins J, 2009 Towards a Mathematical Theory of Cortical Micro-circuits" PLoS Comput Biol 5(10) e1000532. doi:10.1371/journal.pcbi.1000532
^Rao RPN, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience. 1999. 2:79–87