Share to: share facebook share twitter share wa share telegram print page

Word-sense induction

In computational linguistics, word-sense induction (WSI) or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.

Approaches and methods

The output of a word-sense induction algorithm is a clustering of contexts in which the target word occurs or a clustering of words related to the target word. Three main methods have been proposed in the literature:[1][2]

  • Context clustering
  • Word clustering
  • Co-occurrence graphs

Context clustering

The underlying hypothesis of this approach is that, words are semantically similar if they appear in similar documents, with in similar context windows, or in similar syntactic contexts.[3] Each occurrence of a target word in a corpus is represented as a context vector. These context vectors can be either first-order vectors, which directly represent the context at hand, or second-order vectors, i.e., the contexts of the target word are similar if their words tend to co-occur together. The vectors are then clustered into groups, each identifying a sense of the target word. A well-known approach to context clustering is the Context-group Discrimination algorithm [4] based on large matrix computation methods.

Word clustering

Word clustering is a different approach to the induction of word senses. It consists of clustering words, which are semantically similar and can thus bear a specific meaning. Lin’s algorithm [5] is a prototypical example of word clustering, which is based on syntactic dependency statistics, which occur in a corpus to produce sets of words for each discovered sense of a target word.[6] The Clustering By Committee (CBC) [7] also uses syntactic contexts, but exploits a similarity matrix to encode the similarities between words and relies on the notion of committees to output different senses of the word of interest. These approaches are hard to obtain on a large scale for many domain and languages.

Co-occurrence graphs

The main hypothesis of co-occurrence graphs assumes that the semantics of a word can be represented by means of a co-occurrence graph, whose vertices are co-occurrences and edges are co-occurrence relations. These approaches are related to word clustering methods, where co-occurrences between words can be obtained on the basis of grammatical [8] or collocational relations.[9] HyperLex is the successful approaches of a graph algorithm, based on the identification of hubs in co-occurrence graphs, which have to cope with the need to tune a large number of parameters.[10] To deal with this issue several graph-based algorithms have been proposed, which are based on simple graph patterns, namely Curvature Clustering, Squares, Triangles and Diamonds (SquaT++), and Balanced Maximum Spanning Tree Clustering (B-MST).[11] The patterns aim at identifying meanings using the local structural properties of the co-occurrence graph. A randomized algorithm which partitions the graph vertices by iteratively transferring the mainstream message (i.e. word sense) to neighboring vertices[12] is Chinese Whispers. By applying co-occurrence graphs approaches have been shown to achieve the state-of-the-art performance in standard evaluation tasks.

Applications

  • Word-sense induction has been shown to benefit Web Information Retrieval when highly ambiguous queries are employed.[9]
  • Simple word-sense induction algorithms boost Web search result clustering considerably and improve the diversification of search results returned by search engines such as Yahoo![13]
  • Word-sense induction has been applied to enrich lexical resources such as WordNet.[14]

Software

  • SenseClusters is a freely available open source software package that performs both context clustering and word clustering.

See also

References

  1. ^ Navigli, R. (2009). "Word Sense Disambiguation: A Survey" (PDF). ACM Computing Surveys. 41 (2): 1–69. doi:10.1145/1459352.1459355. S2CID 461624.
  2. ^ Nasiruddin, M. (2013). A State of the Art of Word Sense Induction: A Way Towards Word Sense Disambiguation for Under-Resourced Languages (PDF). TALN-RÉCITAL 2013. Les Sables d'Olonne, France. pp. 192–205.
  3. ^ Van de Cruys, T. (2010). "Mining for Meaning. The Extraction of Lexico-Semantic Knowledge from Text" (PDF).
  4. ^ Schütze, H. (1998). Dimensions of meaning. 1992 ACM/IEEE Conference on Supercomputing. Los Alamitos, CA: IEEE Computer Society Press. pp. 787–796. doi:10.1109/SUPERC.1992.236684.
  5. ^ Lin, D. (1998). Automatic retrieval and clustering of similar words (PDF). 17th International Conference on Computational linguistics (COLING). Montreal, Canada. pp. 768–774.
  6. ^ Van de Cruys, Tim; Apidianaki, Marianna (2011). "Latent Semantic Word Sense Induction and Disambiguation" (PDF).
  7. ^ Lin, D.; Pantel, P. (2002). Discovering word senses from text. 8th International Conference on Knowledge Discovery and Data Mining (KDD). Edmonton, Canada. pp. 613–619. CiteSeerX 10.1.1.12.6771.
  8. ^ Widdows, D.; Dorow, B. (2002). A graph model for unsupervised lexical acquisition (PDF). 19th International Conference on Computational Linguistics (COLING). Taipei, Taiwan. pp. 1–7.
  9. ^ a b Véronis, J. (2004). "Hyperlex: Lexical cartography for information retrieval" (PDF). Computer Speech and Language. 18 (3): 223–252. CiteSeerX 10.1.1.66.6499. doi:10.1016/j.csl.2004.05.002.
  10. ^ Agirre, E.; Martinez, D.; De Lacalle, O. Lopez; Soroa, A. Two graph-based algorithms for state-of-the-art WSD (PDF). 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP). Sydney, Australia. pp. 585–593.
  11. ^ Di Marco, A.; Navigli, R. (2013). "Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction" (PDF). Computational Linguistics. 39 (3): 709–754. doi:10.1162/coli_a_00148. S2CID 1775181.
  12. ^ Biemann, C. (2006). "Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems" (PDF).
  13. ^ Navigli, R.; Crisafulli, G. Inducing Word Senses to Improve Web Search Result Clustering (PDF). 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010). Massachusetts, USA: MIT Stata Center. pp. 116–126.
  14. ^ Nasiruddin, M.; Schwab, D.; Tchechmedjiev, A.; Sérasset, G.; Blanchon, H. Induction de sens pour enrichir des ressources lexicales (Word Sense Induction for the Enrichment of Lexical Resources) (PDF). 21ème conférence sur le Traitement Automatique des Langues Naturelles (TALN 2014). Marseille, France. pp. 598–603.

Read other articles:

Dieser Artikel beschreibt die Schmalspurbahn in der Hohen Tatra. Zu den gleichnamigen Straßenbahnfahrzeugen siehe Tatra-Straßenbahn. Poprad-Tatry TEŽ–Štrbské Pleso Strecke der Elektrische TatrabahnKursbuchstrecke (ZSSK):183Streckenlänge:29 kmSpurweite:1000 mm (Meterspur)Stromsystem:1,5 kV =Maximale Neigung: 65 ‰Höchstgeschwindigkeit:60 km/h Legende 0,000 Poprad-Tatry TEŽ Anschluss von Podolínec und 0,000 Poprad-Tatry TEŽ von Hauptbahn Košice–Žil...

Єрман'єльрос. Ерманъель62°17′58″ пн. ш. 57°04′11″ сх. д. / 62.29960000002777321° пн. ш. 57.06990000002777208° сх. д. / 62.29960000002777321; 57.06990000002777208Витік болото Иджид-Палнюр• висота, м 135 мГирло річка Вой-Вож• координати 62°17′58″ пн. ш. 57°04′11″ сх. д. / 62.299...

Castillo de Esponellá Bien de interés culturalPatrimonio histórico de España LocalizaciónPaís España EspañaComunidad Cataluña CataluñaProvincia GeronaGeronaLocalidad EsponelláDatos generalesCategoría MonumentoCódigo RI-51-0005893[1]​Declaración 8 de noviembre de 1988Estilo arquitectura militar[editar datos en Wikidata] El Castillo de Esponellá es un castillo situado en Esponellá, en el Pla de l'Estany. Este castillo, que fue erigido en castillo fronte...

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (يوليو 2018) أشياء لا تُوصف: الجنس والأكاذيب والثورة كتاب من تأليف الصحفية البريطانية والمؤلفة والناشطة السياسية لوري بيني. أشياء لا توصف: الجنس والأكاذيب والثورة Unspeakable...

1954 California lieutenant gubernatorial election ← 1950 November 2, 1954 1958 →   Nominee Harold J. Powers Edward R. Roybal Party Republican Democratic Popular vote 2,185,918 1,764,035 Percentage 55.34% 44.66% Lieutenant Governor before election Harold J. Powers Republican Elected Lieutenant Governor Harold J. Powers Republican Elections in California Federal government U.S. President 1852 1856 1860 1864 1868 1872 1876 1880 1884 1888 1892 1896 1900 1904 1908 1...

2018 South Australian state election ← 2014 17 March 2018 2022 → All 47 seats in the South Australian House of Assembly24 seats were needed for a majority11 (of the 22) seats in the South Australian Legislative Council   First party Second party   Leader Steven Marshall Jay Weatherill Party Liberal Labor Leader since 4 February 2013 21 October 2011 Leader's seat Dunstan Cheltenham Last election 22 seats 23 seats Seats won 25 19 Seat cha...

Christian virgin and saint For other uses, see Saint Agnes (disambiguation). SaintAgnes of RomeSaint Agnes by Domenichino (c. 1620)Virgin and martyrBornc. 291Rome, ItalyDiedc. 304Rome, ItalyVenerated inCatholic Church, Eastern Orthodox Churches, Oriental Orthodox Churches, Anglican Churches, and Lutheran Churches.CanonizedPre-congregationMajor shrineChurch of Sant'Agnese fuori le mura and the Church of Sant'Agnese in Agone, both in RomeFeast21 January; before Pope John ...

← 2000Parlamentswahl 20012004 → (Stimmenanteile in %)[1]  %50403020100 45,6237,199,103,881,170,810,561,67 UNFPAJVPTNAdSLMCeEPDPSUSonst. Gewinne und Verluste im Vergleich zu 2000  %p   6   4   2   0  -2  -4  -6  -8 +5,41 −7,91+3,11+2,66+1,17+0,23−0,91−3,76 UNFPAJVPTNAdSLMCeEPDPSUSonst.Vorlage:Wahldiagramm/Wartung/Anmerkungen Anmerkungen:d Die Tamil National Allia...

James Paul Moody James Paul Moody (21 Agustus 1887 – 15 April 1912) adalah seorang pelaut Inggris yang menjadi salah satu korban tewas tenggelamnya RMS Titanic. Kala kejadian tersebut, ia diteriaki Gunung es, tepat di depan! oleh Frederick Fleet. Ia kemudian membantu untuk menurunkan sekoci nomor 12, 14 dan 16. Meskipun rekan-rekannya berulang kali memintanya untuk masuk ke perahu, dia menolak dan mati di kapal yang tenggelam. Dia dikenang karena pengorbanannya membantu orang ...

Upcoming tactical shooter video game 2023 video gameSix Days in FallujahDeveloper(s)Highwire GamesPublisher(s)VicturaComposer(s)Elliot LeungPlatform(s)Microsoft WindowsPlayStation 4PlayStation 5Xbox OneXbox Series X/SReleaseMicrosoft WindowsJune 22, 2023 (early access)Genre(s)Tactical shooterMode(s)Single-player, co-op mode Six Days in Fallujah is a tactical first-person shooter video game developed by Highwire Games and published by Victura. Set in the Second Battle of Fallujah of the Iraq W...

American comic book anthology series Marvel PremiereCover for Marvel Premiere #1 (1972) featuring Adam Warlock. Art by Gil Kane and Dan AdkinsPublication informationPublisherMarvel ComicsFormatAnthologyGenre Superhero Publication dateApril 1972–August 1981No. of issues61Creative teamWritten by List Steve Englehart (9-14), Ed HanniganDavid Anthony Kraft (45-46), Stan Lee, David MichelinieJim SalicrupRoger Stern (50), Roy Thomas (1-2, 15, 29-30, 33-37) Penciller(s) List Jerry BinghamFrank...

South Korean actress In this Korean name, the family name is Kim. Kim Yoo-binBorn (2005-01-14) January 14, 2005 (age 18)South KoreaOccupationActressYears active2010–2015Korean nameHangul김유빈Revised RomanizationGim YubinMcCune–ReischauerKim Yubin Kim Yoo-bin (born January 14, 2005) is a former South Korean child actress.[1][2][3][4][5][6] Filmography Film Year Title Role 2009 Tidal Wave Jin-soo 2013 The Gifted Hands Da-hee Tele...

Polish footballer This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (October 2019) Leszek WalankiewiczPersonal informationDate of birth (1959-08-18) 18 August 1959 (age 64)Place of birth PrzemyślPosition(s) DefenderSenior career*Years Team Apps (Gls)1979–1996 Hutnik Kraków 1997–2000 KS Cracovia *Club domestic league appearances and goals Leszek Walankiewicz (born 18 August 1...

Austrian golfer Nicole GergelyNicole Gergely at the 2009 Women's British OpenPersonal informationBorn (1984-11-12) 12 November 1984 (age 39)Judenburg, Styria, AustriaSporting nationality AustriaResidencePöls, Styria, AustriaCareerTurned professional2005Former tour(s)Ladies European Tour (joined 2006)Professional wins1Number of wins by tourLadies European Tour1Best results in LPGA major championshipsChevron ChampionshipDNPWomen's PGA C'shipDNPU.S. Women's OpenDNPWomen's British Open...

2013 history book A World Not to Come AuthorRaúl CoronadoSubjectHispanic American literary criticism, Mexican history, Hispanic American Studies, 19th century American historyPublishedJune 2013 (Harvard University Press)[1]Pages574[1]ISBN978-0-674-07261-9 [2] A World Not to Come: A History of Latino Writing and Print Culture is a 2013 history book by Raúl Coronado about the development of Latino identity through the use of writing and print culture in the 19th centur...

1997 American filmRetroactiveDVD coverDirected byLouis MorneauStory byMichael Hamilton-WrightRobert StraussPhillip BadgerProduced byJeffrey D. IversDavid BixlerBrad KrevoyMichael NadeauSteven StablerStarringJames BelushiKylie TravisShannon WhirryFrank WhaleyJesse BorregoM. Emmet WalshCinematographyGeorge MooradianEdited byGlenn GarlandMusic byTim TrumanDistributed byOrion PicturesRelease date January 1, 1997 (1997-01-01) Running time91 minutesCountryUnited StatesLanguageEnglish...

Георг Фрідріх Пухтанім. Georg Friedrich Puchta Народився 31 серпня 1798(1798-08-31)[1][2][…]Кадольцбург, Фюрт, Середня Франконія, Баварія[4]Помер 8 січня 1846(1846-01-08)[1][2][…] (47 років)Берлін, Королівство Пруссія[4]Країна  Баварія Королівство БаваріяДіяльність правник,&...

Jalan Tol Cinere-Jagorawi (Cijago)Informasi ruteBagian dari Jalan Tol Lingkar Luar Jakarta 2Dikelola oleh PT Trans Lingkar Kita Jaya (TLKJ)Panjang:14.64 km (9,10 mi)Berdiri:27 Januari 2012; 11 tahun lalu (2012-01-27) – sekarangSejarah:Dibangun tahun 2011–2023Persimpangan besarUjung Barat: Jalan Tol Serpong–Cinere Jalan Tol Depok–Antasari Simpang Susun LimoSimpang Susun KrukutSimpang Susun KukusanSimpang Susun MargondaSimpang Susun CisalakSimpang Susun CimanggisUjung...

Not to be confused with Getaldić (Austrian noble family). GhetaldiGetaldićCoat of armsCountryRepublic of Ragusa The House of Ghetaldi or Getaldić, Latin: Ghetaldus, Ghetaldius) was a noble family of the Republic of Ragusa. History The Ghetaldi were said to have come from Taranto, in 940, at the same time as the Caboga.[1] In 1809, Ivan Ghetaldi sold some land on Pelješac.[2] In 1847 they were given Austrian nobility.[3] Notable members Marino Ghetaldi (1568–1628)...

Progressive Auto Sales ArenaProgressive Auto Sales ArenaLocation within OntarioShow map of OntarioProgressive Auto Sales ArenaLocation within CanadaShow map of CanadaFormer namesSarnia Sports and Entertainment Centre (1999–2009; 2015–2016) RBC Centre (2009–2015)Address1455 London RoadLocationSarnia, Ontario, CanadaCoordinates42°58′37″N 82°20′49″W / 42.97694°N 82.34694°W / 42.97694; -82.34694Public transitRoute 9, Sarnia TransitOwnerCity of SarniaOpera...

Kembali kehalaman sebelumnya