Language identification in the limit is a formal model for inductive inference of formal languages, mainly by computers (see machine learning and induction of regular languages). It was introduced by E. Mark Gold in a technical report[1] and a journal article[2] with the same title.
In this model, a teacher provides to a learner some presentation (i.e. a sequence of strings) of some formal language. The learning is seen as an infinite process. Each time the learner reads an element of the presentation, it should provide a representation (e.g. a formal grammar) for the language.
Gold defines that a learner can identify in the limit a class of languages if, given any presentation of any language in the class, the learner will produce only a finite number of wrong representations, and then stick with the correct representation. However, the learner need not be able to announce its correctness; and the teacher might present a counterexample to any representation arbitrarily long after.
Gold defined two types of presentations:
This model is an early attempt to formally capture the notion of learnability. Gold's journal article[3] introduces for contrast the stronger models
A weaker formal model of learnability is the Probably approximately correct learning (PAC) model, introduced by Leslie Valiant in 1984.
It is instructive to look at concrete examples (in the tables) of learning sessions the definition of identification in the limit speaks about.
More formally,[7]
Notes:
Gold's theorem (1967) (Theorem I.8 of (Gold, 1967))—If a language family C {\displaystyle C} contains L 1 , L 2 , . . . , L ∞ {\displaystyle L_{1},L_{2},...,L_{\infty }} , such that L 1 ⊊ L 2 ⊊ ⋯ {\displaystyle L_{1}\subsetneq L_{2}\subsetneq \cdots } and L ∞ = ∪ n = 1 ∞ L n {\displaystyle L_{\infty }=\cup _{n=1}^{\infty }L_{n}} , then it is not learnable.
Suppose f {\displaystyle f} is a learner that can learn L 1 , L 2 , . . . {\displaystyle L_{1},L_{2},...} , then we show it cannot learn L ∞ {\displaystyle L_{\infty }} , by constructing an environment for L ∞ {\displaystyle L_{\infty }} that "tricks" f {\displaystyle f} .
First, construct environments E 1 , E 2 , . . . {\displaystyle E_{1},E_{2},...} for languages L 1 , L 2 , . . . {\displaystyle L_{1},L_{2},...} .
Next, construct environment E {\displaystyle E} for L ∞ {\displaystyle L_{\infty }} inductively as follows:
By construction, the resulting environment E {\displaystyle E} contains the entirety of E 1 , E 2 , . . . {\displaystyle E_{1},E_{2},...} , thus it contains ∪ n E n = ∪ n L n = L ∞ {\displaystyle \cup _{n}E_{n}=\cup _{n}L_{n}=L_{\infty }} , so it is an environment for L ∞ {\displaystyle L_{\infty }} . Since the learner always switches to L n {\displaystyle L_{n}} for some finite n {\displaystyle n} , it never converges to L ∞ {\displaystyle L_{\infty }} .
Gold's theorem is easily bypassed if negative examples are allowed. In particular, the language family { L 1 , L 2 , . . . , L ∞ } {\displaystyle \{L_{1},L_{2},...,L_{\infty }\}} can be learned by a learner that always guesses L ∞ {\displaystyle L_{\infty }} until it receives the first negative example ¬ a n {\displaystyle \neg a_{n}} , where a n ∈ L n + 1 ∖ L n {\displaystyle a_{n}\in L_{n+1}\setminus L_{n}} , at which point it always guesses L n {\displaystyle L_{n}} .
Dana Angluin gave the characterizations of learnability from text (positive information) in a 1980 paper.[8] If a learner is required to be effective, then an indexed class of recursive languages is learnable in the limit if there is an effective procedure that uniformly enumerates tell-tales for each language in the class (Condition 1).[9] It is not hard to see that if an ideal learner (i.e., an arbitrary function) is allowed, then an indexed class of languages is learnable in the limit if each language in the class has a tell-tale (Condition 2).[10]
The table shows which language classes are identifiable in the limit in which learning model. On the right-hand side, each language class is a superclass of all lower classes. Each learning model (i.e. type of presentation) can identify in the limit all classes below it. In particular, the class of finite languages is identifiable in the limit by text presentation (cf. Example 2 above), while the class of regular languages is not.
Pattern Languages, introduced by Dana Angluin in another 1980 paper,[12] are also identifiable by normal text presentation; they are omitted in the table, since they are above the singleton and below the primitive recursive language class, but incomparable to the classes in between.[note 7][clarification needed]
Condition 1 in Angluin's paper[9] is not always easy to verify. Therefore, people come up with various sufficient conditions for the learnability of a language class. See also Induction of regular languages for learnable subclasses of regular languages.
A class of languages has finite thickness if every non-empty set of strings is contained in at most finitely many languages of the class. This is exactly Condition 3 in Angluin's paper.[13] Angluin showed that if a class of recursive languages has finite thickness, then it is learnable in the limit.[14]
A class with finite thickness certainly satisfies MEF-condition and MFF-condition; in other words, finite thickness implies M-finite thickness.[15]
A class of languages is said to have finite elasticity if for every infinite sequence of strings s 0 , s 1 , . . . {\displaystyle s_{0},s_{1},...} and every infinite sequence of languages in the class L 1 , L 2 , . . . {\displaystyle L_{1},L_{2},...} , there exists a finite number n such that s n ∉ L n {\displaystyle s_{n}\not \in L_{n}} implies L n {\displaystyle L_{n}} is inconsistent with { s 1 , . . . , s n − 1 } {\displaystyle \{s_{1},...,s_{n-1}\}} .[16]
It is shown that a class of recursively enumerable languages is learnable in the limit if it has finite elasticity.
A bound over the number of hypothesis changes that occur before convergence.
A language L has infinite cross property within a class of languages L {\displaystyle {\mathcal {L}}} if there is an infinite sequence L i {\displaystyle L_{i}} of distinct languages in L {\displaystyle {\mathcal {L}}} and a sequence of finite subset T i {\displaystyle T_{i}} such that:
Note that L is not necessarily a member of the class of language.
It is not hard to see that if there is a language with infinite cross property within a class of languages, then that class of languages has infinite elasticity.