A macrolanguage is a book-keeping mechanism for the ISO 639 international standard of language codes. Macrolanguages are established to assist mapping between different sets of ISO language codes. Specifically, there may be a many-to-one correspondence between ISO 639-3, intended to identify all the thousands of languages of the world, and either of two other sets, ISO 639-1, established to identify languages in computer systems, and ISO 639-2, which encodes a few hundred languages for library cataloguing and bibliographic purposes. When such many-to-one ISO 639-2 codes are included in an ISO 639-3 context, they are called "macrolanguages" to distinguish them from the corresponding individual languages of ISO 639-3.[1] According to the ISO,
Some existing code elements in ISO 639-2, and the corresponding code elements in ISO 639-1, are designated in those parts of ISO 639 as individual language code elements, yet are in a one-to-many relationship with individual language code elements in [ISO 639-3]. For purposes of [ISO 639-3], they are considered to be macrolanguage code elements.
— ISO 639-3: Relationship between ISO 639-3 and the other parts of ISO 639[2]
The mapping often has the implication that it covers borderline cases where two language varieties may be considered strongly divergent dialects of the same language or very closely related languages (dialect continua); it may also encompass situations when there are language varieties that are considered to be varieties of the same language on the grounds of ethnic, cultural, and political considerations, rather than linguistic reasons.[dubious – discuss] However, this is not its primary function and the classification is not evenly applied.
For example, Chinese is a macrolanguage encompassing many languages that are not mutually intelligible, but the languages "Standard German", "Bavarian German", and other closely related languages do not form a macrolanguage, despite being more mutually intelligible. Other examples include Tajiki not being part of the Persian macrolanguage despite sharing much lexicon, and Urdu and Hindi not forming a macrolanguage despite forming a mutually intelligible dialect continuum. All dialects of Hindi are considered separate languages. Basically, ISO 639-2 and ISO 639-3 use different criteria for dividing language varieties into languages, 639-2 uses shared writing systems and literature more whereas 639-3 focuses on mutual intelligibility and shared lexicon. The macrolanguages exist within the ISO 639-3 code set to make mapping between the two sets easier.
The use of macrolanguages was applied in Ethnologue, starting in the 16th edition.[3] As of 21 December 2023[update], there are fifty-nine language codes in ISO 639-2 that are counted as macrolanguages in ISO 639-3.[4] The most recent registered macrolanguage is Sanskrit with code san, adopted in 15 December 2023, though it already existed as individual language for several years.[5]
Some of the macrolanguages had no individual language (as defined by 639-3) in ISO 639-2, e.g. "ara" (Arabic), but ISO 639-3 recognizes different varieties of Arabic as separate languages under some circumstances. Others, like "nor" (Norwegian) had their two individual parts (nno Nynorsk, nob Bokmål) already in 639-2. That means some languages (e.g. "arb" Standard Arabic) that were considered by ISO 639-2 to be dialects of one language ("ara") are now in ISO 639-3 in certain contexts considered to be individual languages themselves. This is an attempt to deal with varieties that may be linguistically distinct from each other, but are treated by their speakers as forms of the same language, e.g. in cases of diglossia. For example,
ISO 639-2 also includes codes for collections of languages; these are not the same as macrolanguages. These collections of languages are excluded from ISO 639-3, because they never refer to individual languages. Most such codes are included in ISO 639-5.
Types of macrolanguages
elements that have no ISO 639-2 code: 4 (bnc, hbs, kln, luy)
elements that have no ISO 639-1 code: 29
elements that do have ISO 639-1 codes: 33
elements whose individual languages have ISO 639-1 codes: 4
In addition, there are six closely associated individual codes:
nsk – Naskapi (part of the Cree language group but not included under the cre macrolanguage designation)
moe – Montagnais (part of the Cree language group but not included under the cre macrolanguage designation)
atj – Atikamekw (part of the Cree language group but not included under the cre macrolanguage designation)
crg – Michif language (Cree-French mixed language with strong influences from Ojibwe language group and not included under the cre macrolanguage designation)
ojs – Ojibwa, Severn (Ojibwa, Northern) (part of the Ojibwa language group with strong influences from the Cree language group and not included under the cre macrolanguage designation)
ojw – Ojibwa, Western (part of the Ojibwa language group with strong influences from the Cree language group and not included under the cre macrolanguage designation)
In addition, there is one other language without individual codes closely associated, but not part of, this macrolanguage code:
hbs is the ISO 639-3language code for Serbo-Croatian. It formerly had an ISO 639-1 code sh but deprecated in 2000. There are four individual language codes assigned:
blu – Hmong Njua (Split into Hmong Njua [hnj] (new identifier), Chuanqiandian Cluster Miao [cqd], Horned Miao [hrm], and Small Flowery Miao [sfm] on 14 January 2008)
In addition, there are three closely associated individual codes:
alq – Algonquin language (part of the Ojibwe language group but not included under the oji macrolanguage designation)
pot – Potawatomi language (formerly part of the Ojibwe language group and not included under the oji macrolanguage designation)
crg – Michif language (Cree-French mixed language with strong influences from Ojibwe language group and not included under the oji macrolanguage designation)
In addition, there are two other languages without individual codes closely associated, but not part of, this macrolanguage code:
Broken Ojibwa (pidgin language used until the end of the 19th century)
ccx – Northern Zhuang (Split into Guibian Zh [zgn], Liujiang Zh [zlj], Qiubei Zh [zqe], Guibei Zh [zgb], Youjiang Zh [zyj], Central Hongshuihe Zh [zch], Eastern Hongshuihe Zh [zeh], Liuqian Zh [zlq], Yongbei Zh [zyb], and Lianshan Zh [zln]. on 14 January 2008)
ccy – Southern Zhuang (Split into Nong Zhuang [zhn], Yang Zhuang [zyg], Yongnan Zhuang [zyn], Zuojiang Zhuang [zzj], and Dai Zhuang [zhd] on 18 July 2007)
Although the Dungan language (dng) is a dialect of Mandarin, it is not listed under Chinese in ISO 639-3 due to separate historical and cultural development.[11]
ISO 639 also lists codes for Old Chinese (och) and Late Middle Chinese (ltc)). They are not listed under Chinese in ISO 639-3 because they are categorized as ancient and historical languages, respectively.
This code was deprecated in 2000 because there were separate language codes for each individual language represented (Serbian, Croatian, and then Bosnian was added). It was published in a revision of ISO 639-1, but was never included in ISO 639-2. It is considered a macrolanguage (general name for a cluster of closely related individual languages) in ISO 639-3. Its deprecated status was reaffirmed by the ISO 639 JAC in 2005.
sr
srp [scc]
Serbian
serbe
2008-06-28
CC
ISO 639-2/B code deprecated in favor of ISO 639-2/T code
hr
hrv [scr]
Croatian
croate
2008-06-28
CC
ISO 639-2/B code deprecated in favor of ISO 639-2/T code
^Rimsky-Korsakoff, Svetlana (1967). "Soviet Dungan: The Chinese language of Central Asia. Alphabet, phonology, morphology". Monumenta Serica. 26: 352–421. doi:10.1080/02549948.1967.11744973.