Strong and weak sampling

Strong and weak sampling are two sampling approach^[1] in Statistics, and are popular in computational cognitive science and language learning.^[2] In strong sampling, it is assumed that the data are intentionally generated as positive examples of a concept,^[3] while in weak sampling, it is assumed that the data are generated without any restrictions.^[4]

Formal Definition

In strong sampling, we assume observation is randomly sampled from the true hypothesis:

$P(x|h)={\begin{cases}{\frac {1}{|h|}}&{\text{, if }}x\in h\\0&{\text{, otherwise}}\end{cases}}$

In weak sampling, we assume observations randomly sampled and then classified:

$P(x|h)={\begin{cases}1&{\text{, if }}x\in h\\0&{\text{, otherwise}}\end{cases}}$

Consequence: Posterior computation under Weak Sampling

$P(h|x)={\frac {P(x|h)P(h)}{\sum \limits _{h'}P(x|h')P(h')}}={\begin{cases}{\frac {P(h)}{\sum \limits _{h':x\in h'}P(h')}}&{\text{, if }}x\in h\\0&{\text{, otherwise}}\end{cases}}$