Kruschke's popular textbook, Doing Bayesian Data Analysis,[2]
was notable for its accessibility and unique scaffolding of concepts. The first half of the book used the simplest type of data (i.e., dichotomous values) for presenting all the fundamental concepts of Bayesian analysis, including generalized Bayesian power analysis and sample-size planning. The second half of the book used the generalized linear model as a framework for explaining applications to a spectrum of other types of data.
Kruschke has written many tutorial articles about Bayesian data analysis, including an open-access article that explains Bayesian and frequentist concepts side-by-side.[6]
There is an accompanying online app that interactively does frequentist and Bayesian analyses simultaneously.
Kruschke gave a video-recorded plenary talk on this topic at the United States Conference on Teaching Statistics (USCOTS).
Bayesian analysis reporting guidelines
Bayesian data analyses are increasing in popularity but are still relatively novel in many fields, and guidelines for reporting Bayesian analyses are useful for researchers, reviewers, and students. Kruschke's open-access Bayesian analysis reporting guidelines (BARG)[7]
provide a step-by-step list with explanation. For instance, the BARG recommend that if the analyst uses Bayesian hypothesis testing, then the report should include not only the Bayes factor but also the minimum prior model probability for the posterior model probability to exceed a decision criterion.
Assessing null values of parameters
Kruschke proposed a decision procedure for assessing null values of parameters, based on the uncertainty of the posterior estimate of the parameter.[8]
This approach contrasts with Bayesian hypothesis testing as model comparison
.[9]
Ordinal data
Liddell and Kruschke
[10]
showed that the common practice of treating ordinal data (such as subjective ratings) as if they were metric values can systematically lead to errors of interpretation, even inversions of means. The problems were addressed by treating ordinal data with ordinal models, in particular an ordered-probit model. Frequentist techniques can also use ordered-probit models, but the authors favored Bayesian techniques for their robustness.
Models of learning
An overview of Kruschke's models of attentional learning through 2010 is provided in reference.[11]
That reference summarizes numerous findings from human learning that suggest attentional learning. That reference also summarizes a series of Kruschke's models of learning under a general framework.
Dimensionality in backpropagation networks
Back-propagation networks are a type of connectionist model, at the core of deep-learning neural networks. Kruschke's early work with back-propagation networks created algorithms for expanding or contracting the dimensionality of hidden layers in the network, thereby affecting how the network generalized from training cases to testing cases
.[12]
The algorithms also improved the speed of learning.[13]
Exemplar-based models and learned attention
The ALCOVE model of associative learning
[1]
used gradient descent on error, as in back-propagation networks, to learn what stimulus dimensions to attend to or to ignore. The ALCOVE model was derived from the generalized context model
[14]
of R. M. Nosofsky. These models mathematically represent stimuli in a multi-dimensional space based on human perceived dimensions (such as color, size, etc.), and assume that training examples are stored in memory as complete exemplars (that is, as combinations of values on the dimensions). The ALCOVE model is trained with input-output pairs and gradually associates exemplars with trained outputs while simultaneously shifting attention toward relevant dimensions and away from irrelevant dimensions.
An enhancement of the ALCOVE model, called RASHNL, provided a mathematically coherent mechanism for gradient descent with limited-capacity attention.[15]
The RASHNL model assumed that attention is shifted rapidly when a stimulus is presented, while learning of attention across trials is more gradual.
These models were fitted to empirical data from numerous human learning experiments, and provided good accounts of relative difficulties of learning different types of associations, and of accuracies of individual stimuli during training and generalization. Those models can not explain all aspects of learning; for example, an additional mechanism was needed to account for the rapidity of human learning of reversal shift (i.e., what was "A" is now "B" and vice versa).[16]
The highlighting effect
When people learn to categorize combinations of discrete features successively across a training session, people will tend to learn about the distinctive features of the later-learned items instead of learning about their complete combination of features. This attention to distinctive features of later-learned items is called "the highlighting effect", and is derived from an earlier finding known as "the inverse base-rate effect".[17]
Kruschke conducted an extensive series of novel learning experiments with human participants, and developed two connectionist models to account for the findings. The ADIT model
[18]
learned to attend to distinctive features, and the EXIT model
[19]
used rapid shifts of attention on each trial.
A canonical highlighting experiment and a review of findings was presented in reference.[20]
Hybrid representation models for rules or functions with exceptions
People can learn to classify stimuli according to rules such as "a container for liquids that is wider than it is tall is called a bowl", along with exceptions to the rule such as "unless it is this specific case that is called a mug". A series of experiments demonstrated that people tend to classify novel items, that are relatively close to an exceptional case, according to the rule more than would be predicted by exemplar-based models. To account for the data, Erickson and Kruschke developed hybrid models that shifted attention between rule-based representation and exemplar-based representation.[21][22][23]
People can also learn continuous relationships between variables, called functions, such as "a page's height is about 1.5 times its width". When people are trained with examples of functions that have exceptional cases, the data are accounted for by hybrid models that combine locally applicable functional rules.[24]
Bayesian models of learning
Kruschke also explored Bayesian models of human-learning results that were addressed by his connectionist models. The effects of sequential or successive learning (such as highlighting, mentioned above) can be especially challenging for Bayesian models, which typically assume order-independence. Instead of assuming that the entire learning system is globally Bayesian, Kruschke developed models in which layers of the system are locally Bayesian.[25]
This "locally Bayesian learning" accounted for combinations of phenomena that are difficult for non-Bayesian learning models or for globally-Bayesian learning models.
Another advantage of Bayesian representations is that they inherently represent uncertainty of parameter values, unlike typical connectionist models that save only a single value for each parameter. The representation of uncertainty can be used to guide active learning in which the learner decides which cases would be most useful to learn about next.[26]
Kruschke attained a B.A. in mathematics, with High Distinction in General Scholarship, from the University of California at Berkeley in 1983. In 1990, he received a Ph.D. in Psychology also from U. C. Berkeley.
Kruschke attended the 1978 Summer Science Program at The Thacher School in Ojai CA, which focused on astrophysics and celestial mechanics. He attended the 1988 Connectionist Models Summer School
[27]
at Carnegie Mellon University.
Provost Professor, Indiana University, 2018.[3][4]
References
^ abKruschke, John K. (1992). "ALCOVE: An exemplar-based connectionist model of category learning". Psychological Review. 99 (1): 22–44. doi:10.1037/0033-295X.99.1.22. PMID1546117.
^ abKruschke, John K. (2015). Doing Bayesian Data Analysis: A tutorial with R, JAGS, and Stan (2nd ed.). Academic Press. ISBN9780124058880.
^ ab"Provost Professor Award". Office of the Vice Provost for Faculty & Academic Affairs. Retrieved 2022-05-27.
^Nosofsky, R. M. (1986). "Attention, similarity, and the identification-categorization". Journal of Experimental Psychology. 115 (1): 39–57. doi:10.1037/0096-3445.115.1.39. PMID2937873.
^Kruschke, John K.; Johansen, M. K. (1999). "A model of probabilistic category learning". Journal of Experimental Psychology: Learning, Memory, and Cognition. 25 (5): 1083–1119. doi:10.1037/0278-7393.25.5.1083. PMID10505339.
^Kruschke, John K. (1996). "Dimensional relevance shifts in category learning". Connection Science. 8 (2): 201–223. doi:10.1080/095400996116893.
^Medin, D. L.; Edelson, S. M. (1988). "Problem structure and the use of base-rate information from experience". Journal of Experimental Psychology: General. 117 (1): 68–85. doi:10.1037/0096-3445.117.1.68. PMID2966231.
^Kruschke, John K. (1996). "Base rates in category learning". Journal of Experimental Psychology: Learning, Memory, and Cognition. 22 (1): 3–26. doi:10.1037/0278-7393.22.1.3. PMID8648289.
^Kruschke, John K. (2001). "The inverse base rate effect is not explained by eliminative inference". Journal of Experimental Psychology: Learning, Memory, and Cognition. 27 (6): 1385–1400. doi:10.1037/0278-7393.27.6.1385. PMID11713874.
^Erickson, M. A.; Kruschke, John K. (1998). "Rules and exemplars in category learning". Journal of Experimental Psychology: General. 127 (2): 107–140. doi:10.1037/0096-3445.127.2.107. PMID9622910.
^Kalish, M. L.; Lewandowsky, S. (2004). "Population of Linear Experts: Knowledge Partitioning and Function Learning". Psychological Review. 111 (4): 1072–1099. doi:10.1037/0033-295X.111.4.1072. PMID15482074.
^Kruschke, John K. (2006). "Locally Bayesian Learning with Applications to Retrospective Revaluation and Highlighting". Psychological Review. 113 (4): 677–699. doi:10.1037/0033-295X.113.4.677. PMID17014300.