Collocation

In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words that make it up. This contrasts with an idiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated.

There are about seven main types of collocations: adjective + noun, noun + noun (such as collective nouns), noun + verb, verb + noun, adverb + adjective, verbs + prepositional phrase (phrasal verbs), and verb + adverb.

Collocation extraction is a computational technique that finds collocations in a document or corpus, using various computational linguistics elements resembling data mining.

Expanded definition

Collocations are partly or fully fixed expressions that become established through repeated context-dependent use. Such terms as crystal clear, middle management, nuclear family, and cosmetic surgery are examples of collocated pairs of words.

Collocations can be in a syntactic relation (such as verb–object: make and decision), lexical relation (such as antonymy), or they can be in no linguistically defined relation. Knowledge of collocations is vital for the competent use of a language: a grammatically correct sentence will stand out as awkward if collocational preferences are violated. This makes collocation a common focus for language teaching.

Corpus linguists specify a key word in context (KWIC) and identify the words immediately surrounding them, to illustrate the way words are used in practice.

The processing of collocations involves a number of parameters, the most important of which is the measure of association, which evaluates whether the co-occurrence is purely by chance or statistically significant. Due to the non-random nature of language, most collocations are classed as significant, and the association scores are simply used to rank the results. Commonly used measures of association include mutual information, t scores, and log-likelihood.[1][2]

Rather than select a single definition, Gledhill[3] proposes that collocation involves at least three different perspectives: co-occurrence, a statistical view, which sees collocation as the recurrent appearance in a text of a node and its collocates;[4][5][6] construction, which sees collocation either as a correlation between a lexeme and a lexical-grammatical pattern,[7] or as a relation between a base and its collocative partners;[8] and expression, a pragmatic view of collocation as a conventional unit of expression, regardless of form.[9][10] These different perspectives contrast with the usual way of presenting collocation in phraseological studies. Traditionally speaking, collocation is explained in terms of all three perspectives at once, in a continuum:

Free combination ↔ bound collocation ↔ frozen idiom

In dictionaries

In 1933, Harold Palmer's Second Interim Report on English Collocations highlighted the importance of collocation as a key to producing natural-sounding language, for anyone learning a foreign language.[11] Thus from the 1940s onwards, information about recurrent word combinations became a standard feature of monolingual learner's dictionaries. As these dictionaries became "less word-centred and more phrase-centred",[12] more attention was paid to collocation. This trend was supported, from the beginning of the 21st century, by the availability of large text corpora and intelligent corpus-querying software, making it possible to provide a more systematic account of collocation in dictionaries. Using these tools, dictionaries such as the Macmillan English Dictionary and the Longman Dictionary of Contemporary English included boxes or panels with lists of frequent collocations.[13]

There are also a number of specialized dictionaries devoted to describing the frequent collocations in a language.[14] These include (for Spanish) Redes: Diccionario combinatorio del español contemporaneo (2004), (for French) Le Robert: Dictionnaire des combinaisons de mots (2007), and (for English) the LTP Dictionary of Selected Collocations (1997) and the Macmillan Collocations Dictionary (2010).[15]

Statistically significant collocation

Student's t-test can be used to determine whether the occurrence of a collocation in a corpus is statistically significant.[16] For a bigram , let be the unconditional probability of occurrence of in a corpus with size , and let be the unconditional probability of occurrence of in the corpus. The t-score for the bigram is calculated as:

where is the sample mean of the occurrence of , is the number of occurrences of , is the probability of under the null-hypothesis that and appear independently in the text, and is the sample variance. With a large , the t-test is equivalent to a Z-test.

See also

References

  1. ^ Dunning, Ted (1993): "Accurate methods for the statistics of surprise and coincidence Archived 2012-08-05 at the Wayback Machine". Computational Linguistics 19, 1 (Mar. 1993), 61–74.
  2. ^ Dunning, Ted (2008-03-21). "Surprise and Coincidence". blogspot.com. Archived from the original on 2012-01-20. Retrieved 2012-04-09.
  3. ^ Gledhill C. (2000): Collocations in Science Writing Archived 2023-06-29 at the Wayback Machine, Narr, Tübingen
  4. ^ Firth J.R. (1957): Papers in Linguistics 1934–1951. Oxford: Oxford University Press.
  5. ^ Sinclair J. (1996): "The Search for Units of Meaning", in Textus, IX, 75–106.
  6. ^ Smadja F. A & McKeown, K. R. (1990): "Automatically extracting and representing collocations for language generation Archived 2015-09-06 at the Wayback Machine", Proceedings of ACL'90, 252–259, Pittsburgh, Pennsylvania.
  7. ^ Hunston S. & Francis G. (2000): Pattern Grammar — A Corpus-Driven Approach to the Lexical Grammar of English Archived 2023-06-29 at the Wayback Machine, Amsterdam, John Benjamins
  8. ^ Hausmann F. J. (1989): Le dictionnaire de collocations. In Hausmann F.J., Reichmann O., Wiegand H.E., Zgusta L.(eds), Wörterbücher : ein internationales Handbuch zur Lexikographie. Dictionaries. Dictionnaires. Berlin/New-York : De Gruyter. 1010–1019.
  9. ^ Moon R. (1998): Fixed Expressions and Idioms, a Corpus-Based Approach. Oxford, Oxford University Press.
  10. ^ Frath P. & Gledhill C. (2005): "Free-Range Clusters or Frozen Chunks? Reference as a Defining Criterion for Linguistic Units[dead link]", in Recherches anglaises et Nord-américaines, vol. 38 :25–43
  11. ^ Cowie, A.P., English Dictionaries for Foreign Learners, Oxford University Press 1999:54–56
  12. ^ Bejoint, H., The Lexicography of English, Oxford University Press 2010: 318
  13. ^ "MED Second Edition – Key features – Macmillan". macmillandictionaries.com. Archived from the original on 2020-09-28. Retrieved 2011-08-24.
  14. ^ Herbst, T. and Klotz, M. 'Syntagmatic and Phraseological Dictionaries' in Cowie, A.P. (Ed.) The Oxford History of English Lexicography, 2009: part 2, 234–243
  15. ^ "Macmillan Collocation Dictionary – How it was written - Macmillan". macmillandictionaries.com. Archived from the original on 2018-12-21. Retrieved 2011-08-24.
  16. ^ Manning, Chris; Schütze, Hinrich (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. pp. 163–166. ISBN 0262133601.

Read other articles:

Direktorat JenderalKerja Sama Multilateral Kementerian Luar Negeri Republik IndonesiaSusunan organisasiDirektur JenderalFebrian Alphyanto RuddyardKantor pusatJl. Pejambon No.6. Jakarta Pusat, 10110Situs webwww.kemlu.go.id Direktorat Jenderal Kerja Sama Multilateral adalah unsur pelaksana di Kementerian Luar Negeri Republik Indonesia di bidang hubungan politik luar negeri multilateral. Direktorat Jenderal Kerja Sama Multilateral berada dibawah dan bertanggung jawab kepada Menteri luar neg...

 

 

Cinema of ItalyPenayangan Perdana Film di Lapangan Santo MarkusJumlah layar3,217 (2011)[1] • Per kapita5.9 per 100,000 (2011)[1]Distributor utamaMedusa Film S.P.A. 16.7%Warner Bros. Italia S.P.A. 13.8%20th Century Fox Italia S.P.A. 13.7%[2]Film fitur yang diproduksi (2013)[3]Total167Jumlah admisi (2013)[3]Total97,380,572 • Per kapita1.50 (2012)[4]Film nasional30,208,422 (31.0%)Keuntungan Box Office (2013)&#...

 

 

Cet article est une ébauche concernant l’Ukraine. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Musée Ivan-HoncharInformations généralesOuverture 1993Visiteurs par an 35 000Site web (uk) www.honchar.org.uaLocalisationAdresse 19 Lavrska Street (d) Kiev UkraineCoordonnées 50° 25′ 56″ N, 30° 33′ 28″ Emodifier - modifier le code - modifier Wikidata Le mu...

عملية الخان-كالا جزء من حرب الشيشان الثانية معلومات عامة التاريخ 22 يونيو 2001 - 28 يونيو 2001 البلد روسيا  الموقع الخان كلا, الشيشان43°15′32″N 45°32′06″E / 43.259°N 45.535°E / 43.259; 45.535  النتيجة انتصار روسي المتحاربون روسيا الشيشان القادة عربي بريف ⚔ الخسائر 1+ 18+ مدنيون مجه

 

 

CikondangDesaNegara IndonesiaProvinsiJawa BaratKabupatenCianjurKecamatanCibeberKode Kemendagri32.03.03.2008 Luas264.155 HaJumlah penduduk5582Kepadatan- Air terjun Cikondang Cikondang adalah desa di kecamatan Cibeber, Cianjur, Jawa Barat, Indonesia. Pranala luar (Indonesia) Keputusan Menteri Dalam Negeri Nomor 050-145 Tahun 2022 tentang Pemberian dan Pemutakhiran Kode, Data Wilayah Administrasi Pemerintahan, dan Pulau tahun 2021 (Indonesia) Peraturan Menteri Dalam Negeri Nomor 72 Tahun 20...

 

 

Vicky MonoLahirYupi Yupiki6 Agustus 1984 (umur 39)Bandung, Jawa BaratPekerjaanPemusikPenulis laguPengusahaTahun aktif2005–sekarangKarier musikGenreMetalcoreDeath metalInstrumenGitarDrumArtis terkaitSuarahgalokaPaper GangsterAnggotaDeadsquadMantan anggotaBurgerkillHeaven Fall Yupi Yupiki (populer dengan nama Vicky Mono) (lahir 6 Agustus 1984) adalah pemusik dan pengusaha Indonesia yang berasal dari Bandung. Ia dikenal sebagai vokalis yang pernah bergabung dengan band metal Burgerki...

Law enforcement agency Vijayawada City Policeవిజయవాడ నగర పోలీసుAgency overviewFormed1983; 40 years ago (1983)Jurisdictional structureOperations jurisdictionVijayawada, Andhra Pradesh, IndiaLegal jurisdictionVijayawadaOperational structureHeadquartersVijayawada City Police, Vijayawada, Andhra Pradesh- 520002Agency executiveB. Sreenivasulu IPS, CommissionerParent agencyAndhra Pradesh PoliceWebsiteOfficial website The 'Vijayawada City Police ,is ...

 

 

Division of the Virginia Company Virginia Company of PlymouthThe 1606 grants by James I to the London and Plymouth companies. The overlapping area (yellow) was granted to both companies on the condition that neither found a settlement within 100 miles (160 km) of the other. Jamestown is noted by J. The Spanish settlement of Saint Augustine, the French settlements of Québec and Port-Royal, and Popham are also shownTrade namePlymouth CompanyIndustryMaritime transport, tradeFounded(10 ...

 

 

Ne doit pas être confondu avec Cap Peirce. Cet article est une ébauche concernant la mer, l’Alaska et l’Arctique. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Carte interactive du cap Peirce Jeunes morses au cap Peirce. Le cap Peirce est un cap de l'Alaska situé au sud-est de la mer de Béring. Faune Le cap Peirce est connu pour héberger des colonies de morses et de phoques (phoque commun, phoque tache...

This article is about the album. For the song, see Loliwe (song). 2011 studio album by ZaharaLoliweStudio album by ZaharaReleased6 September 2011 (2011-09-06)Recorded2011GenreAfro-soulLabelTS RecordsProducerRobbie MalingaMojalefa ThebeZahara chronology Loliwe(2011) The Beginning Live(2012) Singles from Loliwe LoliweReleased: 31 August 2011 NdizaReleased: 25 November 2011 Loliwe (English: The Train) is the debut studio album by South African singer Zahara, released on 6 ...

 

 

American lawyer (born 1961) This article uses bare URLs, which are uninformative and vulnerable to link rot. Please consider converting them to full citations to ensure the article remains verifiable and maintains a consistent citation style. Several templates and tools are available to assist in formatting, such as reFill (documentation) and Citation bot (documentation). (August 2022) (Learn how and when to remove this template message) Ignacia S. Moreno Ignacia Soledad Moreno (born May 8, 1...

 

 

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (مايو 2020) و تأثير نموذج القانون تنص على أن علاقة طبيعية موجودة لكل فرد بين التردد (مراقبة) أو احتمال (التنبؤ) لحدث المهووسين دون أي علاج ( ( R c ) {\displaystyle \scriptstyle {(Rc)}} ) وتكرار ...

CPU Sony Emotion Engine Emotion Engine dalam papan induk PS2 Emotion Engine adalah unit pemrosesan pusat yang dikembangkan dan diproduksi oleh Sony Computer Entertainment dan Toshiba untuk digunakan di konsol permainan PlayStation 2. Prosesor ini juga digunakan pada model PlayStation 3 awal yang dijual di Jepang dan Amerika Utara (Nomor Model CECHAxx & CECHBxx) untuk memberikan dukungan permainan PlayStation 2. Produksi massal Emotion Engine dimulai pada 1999 dan berakhir pada akhir 2012 ...

 

 

CGI children's television series (2011–2013) Franklin and FriendsAlso known asFranklin and Friends: Into the WoodsGenre Animation Children's television series Fantasy Based onFranklin the Turtle series by Paulette Bourgeois and Brenda ClarkDeveloped byJeff SweeneyVoices ofGraeme JokicElizabeth Saunders (as Elizabeth Brown) Richard NewmanCamden AngelisJohn StockerCountry of origin Canada Singapore No. of seasons2No. of episodes52ProductionRunning time23 minutesProduction companiesInfinite Fr...

 

 

Shopping mall in Alexandria, VirginiaLandmark MallMall interior, 2015LocationAlexandria, VirginiaCoordinates38°48′58.3″N 77°7′54.1″W / 38.816194°N 77.131694°W / 38.816194; -77.131694AddressDuke St., I-395 and Van Dorn St.Opening dateAugust 4, 1965 (Enclosed in 1990)Closing dateJanuary 31, 2017Previous namesLandmark CenterOwnerHoward Hughes CorporationNo. of stores and services0 (125 at peak)No. of anchor tenants0 (3 at peak)Total retail floor area675,000 sq...

Brazilian esports player This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: Col...

 

 

1848 rebellion in British Ceylon Further information: History of Sri Lanka This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (October 2014) (Learn how and when to remove this template message) Matale rebellion (මාතලේ කැරැල්ල)Part of the Kandyan Wars 1796-1818Date1848LocationBritish CeylonResult British victoryBelligerents Kandyan rebels &...

 

 

African Americans living in the Southern United States This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Black Southerners – news · newspapers · books · scholar · JSTOR (December 2017) (Learn how and when to remove this template message) Black SouthernersSouthern counties that were at least 40% Black or Africa...

Election in Wisconsin Main article: 1880 United States presidential election 1880 United States presidential election in Wisconsin ← 1876 November 2, 1880 1884 →   Nominee James A. Garfield Winfield S. Hancock Party Republican Democratic Home state Ohio Pennsylvania Running mate Chester A. Arthur William H. English Electoral vote 10 0 Popular vote 144,398 114,644 Percentage 54.04% 42.91% County Results Garfield   40-50%   50-6...

 

 

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada September 2016. Nathaniel Mendez-Laing Informasi pribadiNama lengkap Nathaniel Otis Mendez-Laing[1]Tanggal lahir 15 April 1992 (umur 32)Tempat lahir Birmingham, InggrisTinggi 1,78 m (5 ft 10 in)Posisi bermain Gelandang sayapInformasi klub...

 

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!