Bidirectional text

A bidirectional text contains two text directionalities, right-to-left (RTL) and left-to-right (LTR). It generally involves text containing different types of alphabets, but may also refer to boustrophedon, which is changing text direction in each row.

An example is the RTL Hebrew name Sarah: שרה, spelled sin (ש) on the right, resh (ר) in the middle, and heh (ה) on the left. Many computer program failed to display this correctly, because they were designed to display text in one direction only.

Some so-called right-to-left scripts such as the Persian script and Arabic are mostly, but not exclusively, right-to-left—mathematical expressions, numeric dates and numbers bearing units are embedded from left to right. That also happens if text from a left-to-right language such as English is embedded in them; or vice versa, if Arabic is embedded in a left-to-right script such as English.

Bidirectional script support

Bidirectional script support is the capability of a computer system to correctly display bidirectional text. The term is often shortened to "BiDi" or "bidi".

Early computer installations were designed only to support a single writing system, typically for left-to-right scripts based on the Latin alphabet only. Adding new character sets and character encodings enabled a number of other left-to-right scripts to be supported, but did not easily support right-to-left scripts such as Arabic or Hebrew, and mixing the two was not practical. Right-to-left scripts were introduced through encodings like ISO/IEC 8859-6 and ISO/IEC 8859-8, storing the letters (usually) in writing and reading order. It is possible to simply flip the left-to-right display order to a right-to-left display order, but doing this sacrifices the ability to correctly display left-to-right scripts. With bidirectional script support, it is possible to mix characters from different scripts on the same page, regardless of writing direction.

In particular, the Unicode standard provides foundations for complete BiDi support, with detailed rules as to how mixtures of left-to-right and right-to-left scripts are to be encoded and displayed.

Unicode bidi support

The Unicode standard calls for characters to be ordered 'logically', i.e. in the sequence they are intended to be interpreted, as opposed to 'visually', the sequence they appear. This distinction is relevant for bidi support because at any bidi transition, the visual presentation ceases to be the 'logical' one. Thus, in order to offer bidi support, Unicode prescribes an algorithm for how to convert the logical sequence of characters into the correct visual presentation. For this purpose, the Unicode encoding standard divides all its characters into one of four types: 'strong', 'weak', 'neutral', and 'explicit formatting'.[1]

Strong characters

Strong characters are those with a definite direction. Examples of this type of character include most alphabetic characters, syllabic characters, Han ideographs, non-European or non-Arabic digits, and punctuation characters that are specific to only those scripts.

Weak characters

Weak characters are those with vague direction. Examples of this type of character include European digits, Eastern Arabic-Indic digits, arithmetic symbols, and currency symbols.

Neutral characters

Neutral characters have direction indeterminable without context. Examples include paragraph separators, tabs, and most other whitespace characters. Punctuation symbols that are common to many scripts, such as the colon, comma, full-stop, and the no-break-space also fall within this category.

Explicit formatting

Explicit formatting characters, also referred to as "directional formatting characters", are special Unicode sequences that direct the algorithm to modify its default behavior. These characters are subdivided into "marks", "embeddings", "isolates", and "overrides". Their effects continue until the occurrence of either a paragraph separator, or a "pop" character.

Marks

If a "weak" character is followed by another "weak" character, the algorithm will look at the first neighbouring "strong" character. Sometimes this leads to unintentional display errors. These errors are corrected or prevented with "pseudo-strong" characters. Such Unicode control characters are called marks. The mark (U+200E LEFT-TO-RIGHT MARK (LRM) or U+200F RIGHT-TO-LEFT MARK (RLM)) is to be inserted into a location to make an enclosed weak character inherit its writing direction.

For example, to correctly display the U+2122 TRADE MARK SIGN for an English name brand (LTR) in an Arabic (RTL) passage, an LRM mark is inserted after the trademark symbol if the symbol is not followed by LTR text (e.g. "قرأ Wikipedia™‎ طوال اليوم."). If the LRM mark is not added, the weak character ™ will be neighbored by a strong LTR character and a strong RTL character. Hence, in an RTL context, it will be considered to be RTL, and displayed in an incorrect order (e.g. "قرأ Wikipedia™ طوال اليوم.").

Embeddings

The "embedding" directional formatting characters are the classical Unicode method of explicit formatting, and as of Unicode 6.3, are being discouraged in favor of "isolates". An "embedding" signals that a piece of text is to be treated as directionally distinct. The text within the scope of the embedding formatting characters is not independent of the surrounding text. Also, characters within an embedding can affect the ordering of characters outside. Unicode 6.3 recognized that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use.

Isolates

The "isolate" directional formatting characters signal that a piece of text is to be treated as directionally isolated from its surroundings. As of Unicode 6.3, these are the formatting characters that are being encouraged in new documents – once target platforms are known to support them. These formatting characters were introduced after it became apparent that directional embeddings usually have too strong an effect on their surroundings and are thus unnecessarily difficult to use. Unlike the legacy 'embedding' directional formatting characters, 'isolate' characters have no effect on the ordering of the text outside their scope. Isolates can be nested, and may be placed within embeddings and overrides.

Overrides

The "override" directional formatting characters allow for special cases, such as for part numbers (e.g. to force a part number made of mixed English, digits and Hebrew letters to be written from right to left), and are recommended to be avoided wherever possible. As is true of the other directional formatting characters, "overrides" can be nested one inside another, and in embeddings and isolates.

Using Unicode to override

Using U+202D LEFT-TO-RIGHT OVERRIDE will switch the text direction from left-to-right to right-to-left. Similarly, using U+202E RIGHT-TO-LEFT OVERRIDE will switch the text direction from right-to-left to left-to-right. Refer to the Unicode Bidirectional Algorithm.

Pops

The "pop" directional formatting character, encoded at U+202C POP DIRECTIONAL FORMATTING, terminates the scope of the most recent "embedding", "override", or "isolate".

Runs

In the algorithm, each sequence of concatenated strong characters is called a "run". A "weak" character that is located between two "strong" characters with the same orientation will inherit their orientation. A "weak" character that is located between two "strong" characters with a different writing direction will inherit the main context's writing direction (in an LTR document the character will become LTR, in an RTL document, it will become RTL).

Table of possible BiDi character types

Bidirectional character type (Bidi_Class Unicode character property)[1]
Type[2] Description Strength Directionality General scope Bidi_Control character[3]
L Left-to-Right Strong L-to-R Most alphabetic and syllabic characters, Chinese characters, non-European or non-Arabic digits, LRM character, ... U+200E LEFT-TO-RIGHT MARK (LRM)
R Right-to-Left Strong R-to-L Adlam, Garay, Hebrew, Mandaic, Mende Kikakui, N'Ko, Samaritan, ancient scripts like Kharoshthi and Nabataean, RLM character, ... U+200F RIGHT-TO-LEFT MARK (RLM)
AL Arabic Letter Strong R-to-L Arabic, Hanifi Rohingya, Sogdian, Syriac, and Thaana alphabets, and most punctuation specific to those scripts, ALM character, ... U+061C ARABIC LETTER MARK (ALM)
EN European Number Weak European digits, Eastern Arabic-Indic digits, Coptic epact numbers, ...
ES European Separator Weak plus sign, minus sign, ...
ET European Number Terminator Weak degree sign, currency symbols, ...
AN Arabic Number Weak Arabic-Indic digits, Arabic decimal and thousands separators, Rumi digits, Hanifi Rohingya digits, ...
CS Common Number Separator Weak colon, comma, full stop, no-break space, ...
NSM Nonspacing Mark Weak Characters in General Categories Mark, nonspacing, and Mark, enclosing (Mn, Me)
BN Boundary Neutral Weak Default ignorables, non-characters, control characters other than those explicitly given other types
B Paragraph Separator Neutral paragraph separator, appropriate Newline Functions, higher-level protocol paragraph determination
S Segment Separator Neutral Tabs
WS Whitespace Neutral space, figure space, line separator, form feed, General Punctuation block spaces (smaller set than the Unicode whitespace list)
ON Other Neutrals Neutral All other characters, including object replacement character
LRE Left-to-Right Embedding Explicit L-to-R LRE character only U+202A LEFT-TO-RIGHT EMBEDDING (LRE)
LRO Left-to-Right Override Explicit L-to-R LRO character only U+202D LEFT-TO-RIGHT OVERRIDE (LRO)
RLE Right-to-Left Embedding Explicit R-to-L RLE character only U+202B RIGHT-TO-LEFT EMBEDDING (RLE)
RLO Right-to-Left Override Explicit R-to-L RLO character only U+202E RIGHT-TO-LEFT OVERRIDE (RLO)
PDF Pop Directional Format Explicit PDF character only U+202C POP DIRECTIONAL FORMATTING (PDF)
LRI Left-to-Right Isolate Explicit L-to-R LRI character only U+2066 LEFT-TO-RIGHT ISOLATE (LRI)
RLI Right-to-Left Isolate Explicit R-to-L RLI character only U+2067 RIGHT-TO-LEFT ISOLATE (RLI)
FSI First Strong Isolate Explicit FSI character only U+2068 FIRST STRONG ISOLATE (FSI)
PDI Pop Directional Isolate Explicit PDI character only U+2069 POP DIRECTIONAL ISOLATE (PDI)
Notes
1.^ Unicode Bidirectional Algorithm (UAX#9), As of Unicode version 16.0
2.^ Possible Bidirectional character types for character property: Bidi_Class or 'type'
3.^ Bidi_Control characters: Twelve Bidi_Control formatting characters are defined. They are invisible, and have no effect apart from directionality. Nine of them have a unique, overruling BiDi-type that is used by the algorithm. Their type is also their acronym (e.g. character 'LRE' has BiDi type 'LRE').

Security

Unicode bidirectional characters are used in the Trojan Source vulnerability.[2]

Visual Studio Code highlights BiDi control characters since version 1.62 released in October 2021.[3]

Visual Studio highlights BiDi control characters since version 17.0.3 released on December 14, 2021.[4]

Scripts using bidirectional text

Egyptian hieroglyphs

Egyptian hieroglyphs were written bidirectionally, where the signs that had a distinct "head" or "tail" faced the beginning of the line.

Chinese characters and other CJK scripts

Chinese characters can be written in either direction as well as vertically (top to bottom then right to left), especially in signs (such as plaques), but the orientation of the individual characters does not change. This can often be seen on tour buses in China, where the company name customarily runs from the front of the vehicle to its rear — that is, from right to left on the right side of the bus, and from left to right on the left side of the bus. English texts on the right side of the vehicle are also quite commonly written in reverse order. (See pictures of tour bus and post vehicle below.)

Likewise, other CJK scripts made up of the same square characters, such as the Japanese writing system and Korean writing system, can also be written in any direction, although horizontally left-to-right, top-to-bottom and vertically top-to-bottom right-to-left are the two most common forms.

Boustrophedon

Boustrophedon is a writing style found in ancient Greek inscriptions, in Old Sabaic (an Old South Arabian language) and in Hungarian runes. This method of writing alternates direction, and usually reverses the individual characters, on each successive line.

Moon type

Moon type is an embossed adaptation of the Latin alphabet invented as a tactile alphabet for the blind. Initially the text changed direction (but not character orientation) at the end of the lines. Special embossed lines connected the end of a line and the beginning of the next.[5] Around 1990, it changed to a left-to-right orientation.

See also

References

  1. ^ "UAX #9: Unicode Bi-directional Algorithm". Unicode.org. 2018-05-09. Retrieved 2018-06-26.
  2. ^ "Trojan Source Attacks". trojansource.codes. Retrieved 17 January 2022.
  3. ^ "Visual Studio Code October 2021". code.visualstudio.com. Retrieved 11 November 2021.
  4. ^ "Visual Studio 2022 version 17.0 Release Notes". docs.microsoft.com. Retrieved 17 January 2022.
  5. ^ Moon Type for the Blind, Ramseyer Bible Collection, Kathryn A. Martin Library, University of Minnesota Duluth.

Read other articles:

В Википедии есть статьи о других людях с такой фамилией, см. Агаев.Гасан-бек Мешади Гусейн оглы Агаевазерб. Həsən bəy Məşədi Hüseyn oğlu Ağayev Дата рождения 1875 Место рождения Елизаветполь, Кавказское наместничество, Российская империя Дата смерти 19 июля 1920(1920-07-19) Место смерти Тифли...

 

When the Weather Is FinePoster promosiNama alternatifI'll Find You on a Beautiful DayHangul날씨가 좋으면 찾아가겠어요 GenreDramaBerdasarkanI'll Go to You When the Weather Is Fineoleh Lee Do-wooDitulis olehHan Ga-ramSutradaraHan Ji-seungPemeranPark Min-youngSeo Kang-joonNegara asalKorea SelatanBahasa asliKoreaProduksiProduser eksekutifOh Hwan-minProduserMin Hyun-il Lee Sung-jinDurasi60 menitRumah produksiAce FactoryDistributorJTBCRilisJaringan asliJTBCFormat gambar1080i (HDTV)Forma...

 

?Короличка пізня Біологічна класифікація Домен: Ядерні (Eukaryota) Царство: Рослини (Plantae) — Судинні (Tracheophyta) — Покритонасінні (Angiosperms) — Евдикоти (Eudicots) — Айстериди (Asterids) Порядок: Айстроцвіті (Asterales) Родина: Айстрові (Asteraceae) Підродина: Asteroideae Триба: Anthemideae Рід: Короличка (Leucanthemella) В

Yandex MailPengembangYandexRilis perdanaJuni 2000; Galat: first parameter cannot be parsed as a date or time. (Juni 2000)JenisEmail serviceSitus webmail.yandex.com Yandex Mail (bahasa Rusia: Яндекс Почта; dikenal sebagai Yandex.Mail) adalah layanan surel gratis asal Rusia yang dikelola oleh Yandex. Diluncurkan pertama kali pada 26 Juni 2000,[1][2] dan merupakan salah satu dari tiga layanan surel terbesar di Runet (selain Gmail dan Mail.ru).[3] Layan...

 

Gerardo Clemente Vega García Gerardo Clemente Vega García (Puebla, 28 maart 1940 – Mérida, 21 juni 2022) was een Mexicaans generaal en politicus. Hij was militair attaché in West-Duitsland, Polen en de Sovjet-Unie. Van 2000 tot 2006 was hij minister van defensie van Mexico. Vega García overleed op 82-jarige leeftijd in een ziekenhuis in Mérida.[1] Geplaatst op:31-10-2005 Dit artikel is een beginnetje over politiek. U wordt uitgenodigd om op bewerken te klikken om uw kenni...

 

معهد الحفريات الفقارية والباليوأنثروبولوجي中国科学院古脊椎动物与古人类研究所   معلومات التأسيس 1929[1]  النوع معهد حكومي الموقع الجغرافي إحداثيات 39°56′10″N 116°19′40″E / 39.936193°N 116.327846°E / 39.936193; 116.327846  المكان بكين البلد الصين  إحصاءات متفرقات الموقع [1] تعديل م...

Reino Unido en los Juegos Olímpicos Bandera de Reino UnidoCódigo COI GBRCON Asociación Olímpica Británica(pág. web)Juegos Olímpicos de Tokio 2020Deportistas 378 en 26 deportesAbanderado Mohamed Sbihi y Hannah MillsMedallasPuesto: 4 22 20 22 64 Historia olímpicaJuegos de verano 1896 • 1900 • 1904 • 1908 • 1912 • 1920 • 1924 • 1928 • 1932 • 1936 • 1948 • 1952...

 

Dalit caste in India BauriTotal populationc. 1.9 million (2011, census)Regions with significant populationsIndiaWest Bengal1,228,635[1]Odisha523,127[2]Jharkhand186,356[3]Bihar2,233[4]LanguagesRegional languages (Bengali, Odia)ReligionHinduism Bauri (Bengali:বাউরী) is a community of indigenous people primarily residing in Bengal, and considered as one of the Scheduled Castes of India.[5][6] The Bauris belong to the Bhil tribe...

 

Second-level province of the Ottoman Empire This article is about the administrative territorial entity. For the region in Serbia and Montenegro, see Sandžak. For the village in Iran, see Sanjaq, Iran. The Vilayets and Sanjaks of the Ottoman Empire around 1317 Hijri, 1899 Gregorian Not to be confused with Saint-Jacques (disambiguation). State organisation ofthe Ottoman Empire House of Osman Classic period Divan Porte Grand Vizier Constitutional period Imperial Government General Assembly Sen...

Volcano For the mountain by this name in Antarctica, see Mount Blackburn (Antarctica). Mount BlackburnMount Blackburn from the southeast, looking up the Kennicott GlacierHighest pointElevation16,390 ft (5,000 m)[1]Prominence11,640 ft (3,550 m)[1]Isolation60.7 mi (97.7 km)[1]ListingWorld most prominent peaks 50thNorth America highest peaks 12thNorth America prominent peaks 9thUS highest major peaks 5thAlaska highest major peaks 5thCoor...

 

Santar Plaats in Portugal Situering Gemeente Arcos de Valdevez Coördinaten 41° 49′ NB, 8° 25′ WL Algemeen Oppervlakte 0,90 km² Inwoners (2001) 153[1] (170,0 inw./km²) Overig Website http://www.jf-santar.com Detailkaart Ligging in gemeente Arcos de Valdevez Portaal    Portugal Santar is een plaats (freguesia) in de Portugese gemeente Arcos de Valdevez en telt 153 inwoners (2001). Bevolkingsontwikkeling tussen 1864 en 2011 Bronnen, noten en/of referenties ↑ ...

 

2010 film by Andrew Jarecki All Good ThingsTheatrical release posterDirected byAndrew JareckiWritten byMarcus HincheyMarc SmerlingProduced by Andrew Jarecki Michael London Bruna Papandrea Marc Smerling Starring Ryan Gosling Kirsten Dunst Frank Langella Philip Baker Hall CinematographyMichael SeresinEdited byDavid RosenbloomShelby SiegelMusic byRob SimonsenDistributed byMagnolia Pictures (United States)The Weinstein Company (International)[1]Release date December 3, 2010 ...

Italian writer (1923–1993) Giovanni TestoriBorn(1923-05-12)12 May 1923Novate Milanese, ItalyDied16 March 1993(1993-03-16) (aged 69)Milan, ItalyOccupationPlaywrightGenreDrama Giovanni Testori (Novate Milanese, 12 May 1923 – Milan, 16 March 1993) was an Italian writer, journalist, poet, art and literary critic, dramatist, screenplay writer, theatrical director and painter. Biography Childhood and youth “It is enough to love reality, always, in every possible way, even in the precipit...

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (يناير 2022) إعادة بناء عالم أفضل الاختصار B3W تاريخ التأسيس 2021؛ منذ 2 سنوات (2021) الاهتمامات تعزيز التنمية الاقتصادية والتواصل بين الأقاليم تعديل مصدري - تعديل   إ...

 

2019 studio album by Che ApalacheRearrange My HeartStudio album by Che ApalacheReleasedAugust 9, 2019GenreBluegrass, Americana, LatinLength47:50LabelFree Dirt RecordsProducerBéla FleckChe Apalache chronology Latin Grass(2017) Rearrange My Heart(2019) Rearrange My Heart is the second studio album by Che Apalache, released on August 9, 2019. It was produced by Béla Fleck and was nominated for Best Folk Album at the 62nd Grammy Awards.[1] Background Che Apalache was formed in B...

2022 film score by John Carpenter, Cody Carpenter and Daniel DaviesHalloween Ends (Original Motion Picture Soundtrack)Film score by John Carpenter, Cody Carpenter and Daniel DaviesReleasedOctober 13, 2022 (2022-10-13)Recorded2021–2022Genre Electronic film score Length42:29LabelSacred BonesHalloween soundtrack chronology Halloween Kills(2021) Halloween Ends(2022) John Carpenter, Cody Carpenter and Daniel Davies chronology Firestarter(2022) Halloween Ends(2022) Singles ...

 

WonsteinInformasi latar belakangNama lahirJeong Ji-wonAsalCheongju, South Korea[1]GenreHip-hopR&BPekerjaanPenyanyi rappenyanyiTahun aktif2018–sekarangLabelBeautiful NoiseNama KoreaHangul정지원 Alih AksaraJeong JiwonMcCune–ReischauerChŏng Chiwŏn Jeong Ji-won (Hangul: 정지원), lebih dikenal dengan nama panggung Wonstein (Hangul: 원슈타인) adalah penyanyi rap dan penyanyi asal Korea Selatan.[2] Ia memulai debut pada tahun 2018 dengan singel S...

 

NHS mental health trust Tees, Esk and Wear Valleys NHS Foundation TrustTypeNHS hospital trustHeadquartersWest Park Hospital, DarlingtonRegion servedCounty Durham and north of North YorkshireEstablishments Lanchester Road Hospital, Durham Roseberry Park Hospital, Middlesbrough West Park Hospital, Darlington Auckland Park Hospital, Bishop Auckland Cross Lane Hospital, Scarborough Budget£423 millionChairPaul MurphyChief executiveBrent KilmurrayWebsitewww.tewv.nhs.ukCare Quality CommissionC...

Madras ArmyActive1757–1895 (as the Madras Army)1895–1908 (as the Madras Command of the Indian Army)BranchPresidency armiesTypeCommandSize47,000 (1876)[1]Garrison/HQOotacamund, Nilgiris districtMilitary unit The Madras Army was the army of the Presidency of Madras, one of the three presidencies of British India within the British Empire. The presidency armies, like the presidencies themselves, belonged to the East India Company until the Government of India Act 1858 (passed in the ...

 

Castle and an estate in The Netherlands NijenhuisHuis Nijenhuis near DiepenheimGeneral informationStatusRijksmonumentLocationDiepenheimCoordinates52°12′12″N 6°34′27″E / 52.20333°N 6.57417°E / 52.20333; 6.57417Websitewww.nijenhuisenwesterflier.nl H. M. van Eck: Overview of Nijenhuis, watercolor, 1827. Nijenhuis is a castle and an estate near Diepenheim in the municipality Hof van Twente, Netherlands (province Overijssel). History The Nijenhuis is first menti...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!