For parsing algorithms in computer science, the inside–outside algorithm is a way of re-estimating production probabilities in a probabilistic context-free grammar. It was introduced by James K. Baker in 1979 as a generalization of the forward–backward algorithm for parameter estimation on hidden Markov models to stochastic context-free grammars. It is used to compute expectations, for example as part of the expectation–maximization algorithm (an unsupervised learning algorithm).
The inside probability β j ( p , q ) {\displaystyle \beta _{j}(p,q)} is the total probability of generating words w p ⋯ w q {\displaystyle w_{p}\cdots w_{q}} , given the root nonterminal N j {\displaystyle N^{j}} and a grammar G {\displaystyle G} :[1]
The outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is the total probability of beginning with the start symbol N 1 {\displaystyle N^{1}} and generating the nonterminal N p q j {\displaystyle N_{pq}^{j}} and all the words outside w p ⋯ w q {\displaystyle w_{p}\cdots w_{q}} , given a grammar G {\displaystyle G} :[1]
Base Case:
β j ( p , p ) = P ( w p | N j , G ) {\displaystyle \beta _{j}(p,p)=P(w_{p}|N^{j},G)}
General case:
Suppose there is a rule N j → N r N s {\displaystyle N_{j}\rightarrow N_{r}N_{s}} in the grammar, then the probability of generating w p ⋯ w q {\displaystyle w_{p}\cdots w_{q}} starting with a subtree rooted at N j {\displaystyle N_{j}} is:
∑ k = p k = q − 1 P ( N j → N r N s ) β r ( p , k ) β s ( k + 1 , q ) {\displaystyle \sum _{k=p}^{k=q-1}P(N_{j}\rightarrow N_{r}N_{s})\beta _{r}(p,k)\beta _{s}(k+1,q)}
The inside probability β j ( p , q ) {\displaystyle \beta _{j}(p,q)} is just the sum over all such possible rules:
β j ( p , q ) = ∑ N r , N s ∑ k = p k = q − 1 P ( N j → N r N s ) β r ( p , k ) β s ( k + 1 , q ) {\displaystyle \beta _{j}(p,q)=\sum _{N_{r},N_{s}}\sum _{k=p}^{k=q-1}P(N_{j}\rightarrow N_{r}N_{s})\beta _{r}(p,k)\beta _{s}(k+1,q)}
α j ( 1 , n ) = { 1 if j = 1 0 otherwise {\displaystyle \alpha _{j}(1,n)={\begin{cases}1&{\mbox{if }}j=1\\0&{\mbox{otherwise}}\end{cases}}}
Here the start symbol is N 1 {\displaystyle N_{1}} .
Suppose there is a rule N r → N j N s {\displaystyle N_{r}\rightarrow N_{j}N_{s}} in the grammar that generates N j {\displaystyle N_{j}} . Then the left contribution of that rule to the outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is:
∑ k = q + 1 k = n P ( N r → N j N s ) α r ( p , k ) β s ( q + 1 , k ) {\displaystyle \sum _{k=q+1}^{k=n}P(N_{r}\rightarrow N_{j}N_{s})\alpha _{r}(p,k)\beta _{s}(q+1,k)}
Now suppose there is a rule N r → N s N j {\displaystyle N_{r}\rightarrow N_{s}N_{j}} in the grammar. Then the right contribution of that rule to the outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is:
∑ k = 1 k = p − 1 P ( N r → N s N j ) α r ( k , q ) β s ( k , p − 1 ) {\displaystyle \sum _{k=1}^{k=p-1}P(N_{r}\rightarrow N_{s}N_{j})\alpha _{r}(k,q)\beta _{s}(k,p-1)}
The outside probability α j ( p , q ) {\displaystyle \alpha _{j}(p,q)} is the sum of the left and right contributions over all such rules:
α j ( p , q ) = ∑ N r , N s ∑ k = q + 1 k = n P ( N r → N j N s ) α r ( p , k ) β s ( q + 1 , k ) + ∑ N r , N s ∑ k = 1 k = p − 1 P ( N r → N s N j ) α r ( k , q ) β s ( k , p − 1 ) {\displaystyle \alpha _{j}(p,q)=\sum _{N_{r},N_{s}}\sum _{k=q+1}^{k=n}P(N_{r}\rightarrow N_{j}N_{s})\alpha _{r}(p,k)\beta _{s}(q+1,k)+\sum _{N_{r},N_{s}}\sum _{k=1}^{k=p-1}P(N_{r}\rightarrow N_{s}N_{j})\alpha _{r}(k,q)\beta _{s}(k,p-1)}