Hindi–Urdu transliteration (or Hindustani transliteration) is essential for Hindustani speakers to understand each other's text, and it is especially important considering that the underlying language of both the Hindi & Urdu registers are almost the same.[4]Transliteration is theoretically possible because of the common Hindustani phonology underlying Hindi-Urdu. In the present day, the Hindustani language is seen as a unifying language,[5] as initially proposed by Mahatma Gandhi to resolve the Hindi–Urdu controversy.[6] ("Hindustani" is not to be confused with followers of Hinduism, as 'Hindu' in Persian means 'Indo')
Technically, a direct one-to-one script mapping or rule-based lossless transliteration of Hindi-Urdu is not possible, majorly since Hindi is written in an abugida script and Urdu is written in an abjad script, and also because of other constraints like multiple similar characters from Perso-Arabic mapping onto a single character in Devanagari.[7] However, there have been dictionary-based mapping attempts which have yielded very high accuracy, providing near-to-perfect transliterations.[8] For literary domains, a mere transliteration between Hindi-Urdu will not suffice as formal Hindi is more inclined towards Sanskrit vocabulary whereas formal Urdu is more inclined towards Persian and Arabic vocabulary; hence a system combining transliteration and translation would be necessary for such cases.[9]
Hindustani has a rich set of consonants in its full-alphabet, since it has a mixed-vocabulary (rekhta) derived from Old Hindi (from Dehlavi), with loanwords from Parsi (from Pahlavi) and Arabic languages, all of which itself are from 3 different language-families respectively: Indo-Aryan, Iranian and Semitic.
The following table provides an approximate one-to-one mapping for Hindi-Urdu consonants,[18] especially for computational purposes (lossless script conversion). Note that this direct script conversion will not yield correct spellings,[19] but rather a readable text for both the readers. Note that Hindi–Urdu transliteration schemes can be used for Punjabi as well, for Gurmukhi (Eastern Punjabi) to Shahmukhi (Western Punjabi) conversion, since Shahmukhi is a superset of the Urdu alphabet (with 2 extra consonants) and the Gurmukhi script can be easily converted to the Devanagari script.
^Ray, Aniruddha (2011). The Varied Facets of History: Essays in Honour of Aniruddha Ray. Primus Books. ISBN978-93-80607-16-0. There was the Hindustani Dictionary of Fallon published in 1879; and two years later (1881), John J. Platts produced his Dictionary of Urdu, Classical Hindi and English, which implied that Hindi and Urdu were literary forms of a single language. More recently, Christopher R. King in his One Language, Two Scripts (1994) has presented the late history of the single spoken language in two forms, with the clarity and detail that the subject deserves.
^Ashmore, Harry S. (1961). Encyclopaedia Britannica: a new survey of universal knowledge, Volume 11. Encyclopædia Britannica. p. 579. The everyday speech of well over 50,000,000 persons of all communities in the north of India and in West Pakistan is the expression of a common language, Hindustani.
^Durrani, Nadir; Sajjad, Hassan; Fraser, Alexander; Schmid, Helmut (July 2010). "Hindi-to-Urdu Machine Translation through Transliteration". Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics: 465–474.
^Diacritics in Urdu are normally not written and usually implied and interpreted based on the context of the sentence
^[ɛ] occurs as a conditioned allophone of /ə/ near an /ɦ/ surrounded on both sides by schwas. Usually, the second schwa undergoes syncopation, and the resultant is just an [ɛ] preceding an /ɦ/.
^Hindi does not have a diacritic to represent ə as it is usually implied after unmarked consonants.
^Hindi has individual letters for aspirated consonants whereas Urdu has a specific letter to represent an aspirated consonant
^No words in Hindustani can begin with a nasalised letter/diacritic.
In Urdu the initial form (letter) for representing a nasalised word is: ن٘ (nūn + small nūn ghunna diacritic)
^ abcdeShapiro, Michael C. (1989). A Primer of Modern Standard Hindi. Motilal Banarsidass Publ. p. 20. ISBN978-81-208-0508-8. In addition to the basic consonantal sounds discussed in sections 3.1 and 3.2, many speakers use any or all five additional consonants (क़ ḳ, ख़ ḳh,ग़ ġ, ज़ z, फ़ f) in words of foreign origin (primarily from Persian, Arabic, English, and Portuguese). The last two of these, ज़ z and फ़ f, are the initial sounds in English zig and fig respectively. The consonant क़ ḳ is a voiceless uvular stop, somewhat like k, but pronounced further back in the mouth. ख़ ḳh is a voiceless fricative similar in pronunciation to the final sound of the German ach. ग़ ġ is generally pronounced as a voiceless uvular fricative, although it is occasionally heard as a stop rather than a fricative. In devanāgari each of these five sounds is represented by the use of a subscript dot under one of the basic consonant signs. In practice, however, the dot is often omitted, leaving it to the reader to render the correct pronunciation on the basis of his prior knowledge of the language.
^ abcdePandey, Dipti; Mondal, Tapabrata; Agrawal, S. S.; Bangalore, Srinivas (2013). "Development and suitability of Indian languages speech database for building watson based ASR system". 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE). p. 3. doi:10.1109/ICSDA.2013.6709861. ISBN978-1-4799-2378-6. S2CID26461938. Only in Hindi 10 Phonemes व /v/ क़ /q/ ञ /ɲ/ य /j/ ष /ʂ/ ख़ /x/ ग़ /ɣ/ ज़ /z/ झ़ /ʒ/ फ़ /f/