Ontology learning

Ontology learning (ontology extraction,ontology augmentation generation, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

Typically, the process starts by extracting terms and concepts or noun phrases from plain text using linguistic processors such as part-of-speech tagging and phrase chunking. Then statistical[1] or symbolic[2][3] techniques are used to extract relation signatures, often based on pattern-based[4] or definition-based[5] hypernym extraction techniques.

Procedure

Ontology learning (OL) is used to (semi-)automatically extract whole ontologies from natural language text.[6][7] The process is usually split into the following eight tasks, which are not all necessarily applied in every ontology learning system.

Domain terminology extraction

During the domain terminology extraction step, domain-specific terms are extracted, which are used in the following step (concept discovery) to derive concepts. Relevant terms can be determined, e.g., by calculation of the TF/IDF values or by application of the C-value / NC-value method. The resulting list of terms has to be filtered by a domain expert. In the subsequent step, similarly to coreference resolution in information extraction, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept. The most common methods therefore are clustering and the application of statistical similarity measures.

Concept discovery

In the concept discovery step, terms are grouped to meaning bearing units, which correspond to an abstraction of the world and therefore to concepts. The grouped terms are these domain-specific terms and their synonyms, which were identified in the domain terminology extraction step.

Concept hierarchy derivation

In the concept hierarchy derivation step, the OL system tries to arrange the extracted concepts in a taxonomic structure. This is mostly achieved with unsupervised hierarchical clustering methods. Because the result of such methods is often noisy, a supervision step, e.g., user evaluation, is added. A further method for the derivation of a concept hierarchy exists in the usage of several patterns that should indicate a sub- or supersumption relationship. Patterns like “X, that is a Y” or “X is a Y” indicate that X is a subclass of Y. Such pattern can be analyzed efficiently, but they often occur too infrequently to extract enough sub- or supersumption relationships. Instead, bootstrapping methods are developed, which learn these patterns automatically and therefore ensure broader coverage.

Learning of non-taxonomic relations

In the learning of non-taxonomic relations step, relationships are extracted that do not express any sub- or supersumption. Such relationships are, e.g., works-for or located-in. There are two common approaches to solve this subtask. The first is based upon the extraction of anonymous associations, which are named appropriately in a second step. The second approach extracts verbs, which indicate a relationship between entities, represented by the surrounding words. The result of both approaches need to be evaluated by an ontologist to ensure accuracy.

Rule discovery

During rule discovery,[8] axioms (formal description of concepts) are generated for the extracted concepts. This can be achieved, e.g., by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree. The result of this process is a list of axioms, which, afterwards, is comprehended to a concept description. This output is then evaluated by an ontologist.

Ontology population

At this step, the ontology is augmented with instances of concepts and properties. For the augmentation with instances of concepts, methods based on the matching of lexico-syntactic patterns are used. Instances of properties are added through the application of bootstrapping methods, which collect relation tuples.

Concept hierarchy extension

In this step, the OL system tries to extend the taxonomic structure of an existing ontology with further concepts. This can be performed in a supervised manner with a trained classifier or in an unsupervised manner via the application of similarity measures.

Frame and Event detection

During frame/event detection, the OL system tries to extract complex relationships from text, e.g., who departed from where to what place and when. Approaches range from applying SVM with kernel methods to semantic role labeling (SRL)[9] to deep semantic parsing techniques.[10]

Tools

Dog4Dag (Dresden Ontology Generator for Directed Acyclic Graphs) is an ontology generation plugin for Protégé 4.1 and OBOEdit 2.1. It allows for term generation, sibling generation, definition generation, and relationship induction. Integrated into Protégé 4.1 and OBO-Edit 2.1, DOG4DAG allows ontology extension for all common ontology formats (e.g., OWL and OBO). Limited largely to EBI and Bio Portal lookup service extensions.[11]

See also

Bibliography

  • P. Buitelaar, P. Cimiano (Eds.). Ontology Learning and Population: Bridging the Gap between Text and Knowledge, Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2008.
  • P. Buitelaar, P. Cimiano, and B. Magnini (Eds.). Ontology Learning from Text: Methods, Evaluation and Applications, Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2005.
  • Wong, W. (2009), "Learning Lightweight Ontologies from Text across Different Domains using the Web as Background Knowledge[permanent dead link]". Doctor of Philosophy thesis, University of Western Australia.
  • Wong, W., Liu, W. & Bennamoun, M. (2012), "Ontology Learning from Text: A Look back and into the Future". ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.
  • Thomas Wächter, Götz Fabian, Michael Schroeder: DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé. SWAT4LS London, 2011. doi:10.1145/2166896.2166926

References

  1. ^ A. Maedche and S.Staab. Learning ontologies for the semantic web.In Semantic Web Worskhop 2001.
  2. ^ Roberto Navigli and Paola Velardi. Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites, Computational Linguistics,30(2), MIT Press, 2004, pp.151-179.
  3. ^ P.Velardi, S.Faralli, R.Navigli. OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press,2013, pp.665-707.
  4. ^ Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Fourteenth International Conference on Computational Linguistics, pages 539--545, Nantes, France, July 1992.
  5. ^ R.Navigli, P. Velardi. Learning Word-Class Lattices for Definition and Hypernym Extraction.Proc.of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, July 11–16, 2010, pp.1318-1327.
  6. ^ Cimiano, Philipp; Völker, Johanna; Studer, Rudi (2006). "Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text", Information, Wissenschaft und Praxis, 57, p. 315 - 320, http://people.aifb.kit.edu/pci/Publications/iwp06.pdf%5B%5D (retrieved: 18.06.2012).
  7. ^ Wong, W., Liu, W. & Bennamoun, M. (2012), "Ontology Learning from Text: A Look back and into the Future". ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.
  8. ^ Johanna Völker; Pascal Hitzler; Cimiano, Philipp (2007). "Acquisition of OWL DL Axioms from Lexical Resources", Proceedings of the 4th European conference on The Semantic Web, p. 670 - 685, http://smartweb.dfki.de/Vortraege/lexo_2007.pdf (retrieved: 18.06.2012).
  9. ^ Coppola B.; Gangemi A.; Gliozzo A.; Picca D.; Presutti V. (2009). "Frame Detection over the Semantic Web", Proceedings of the European Semantic Web Conference (ESWC2009), Springer, 2009.
  10. ^ Presutti V.; Draicchio F.; Gangemi A. (2009). "Knowledge extraction based on Discourse Representation Theory and Linguistic Frames", Proceedings of the Conference on Knowledge Engineering and Knowledge Management (EKAW2012), LNCS, Springer, 2012.
  11. ^ Thomas Wächter, Götz Fabian, Michael Schroeder: DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé. SWAT4LS London, 2011. doi:10.1145/2166896.2166926 http://www.biotec.tu-dresden.de/research/schroeder/dog4dag/


Read other articles:

Vườn quốc gia Đức bao gồm 14 vườn quốc gia:   Di sản thế giới hoặc là một phần của di sản thế giới Tên Hình ảnh Bang Vườn quốc gia biển Wadden Schleswig-Holstein Schleswig-Holstein Vườn quốc gia biển Wadden Hamburg Hamburg Vườn quốc gia biển Wadden Hạ Saxon Quần đảo Đông Frisia, Hạ Saxon Vườn quốc gia Jasmund Rügen, Mecklenburg-Vorpommern Vườn quốc gia Vùng đầm phá Tây Pomerania Mecklenburg-Vorp...

 

 

باريس تورز 1933 تفاصيل السباقسلسلة28. باريس تورزالتاريخ1933البلد فرنساالمنصةالفائز Jules Merviel [الإنجليزية]‏الثاني أنطونين ماغني (France-Sport-Wolber  [لغات أخرى]‏)الثالث لودفيغ غيير ▶19321934◀ توثيق باريس تورز 1933 هو سباق دراجات هوائية، وهو الموسم رقم 28 من باريس تورز، وأقيم

 

 

كاميرون جون (بالإنجليزية: Cameron John)‏  معلومات شخصية الميلاد 24 أغسطس 1999 (العمر 24 سنة) الطول 5 قدم 11 بوصة (1.81 م)[1][1] مركز اللعب مدافع الجنسية المملكة المتحدة  معلومات النادي النادي الحالي روتشديل الرقم 24 مسيرة الشباب سنوات فريق 0000–2015 ساوثيند يونايتد 2015–2016 وو

2014 single by Scotty McCreeryFeelin' ItSingle by Scotty McCreeryfrom the album See You Tonight ReleasedApril 14, 2014 (2014-04-14)Recorded2013GenreCountryLength3:18LabelMercury Nashville19Songwriter(s) Frank Rogers Matthew West Producer(s)Frank RogersScotty McCreery singles chronology See You Tonight (2013) Feelin' It (2014) Southern Belle (2015) Feelin' It is a song written by Frank Rogers and Matthew West, and recorded by American country music artist Scotty McCreery. The so...

 

 

Asesinato de Abby Choi Coordenadas 22°28′26″N 114°13′48″E / 22.474017, 114.229989Blanco(s) Abby ChoiFecha 21 de febrero de 2023Tipo de ataque AsesinatoMuertos 1Motivación Delito Pasional[editar datos en Wikidata] El asesinato de Abby Choi fue un homicidio suscitado el 21 de febrero de 2023, tres días después de que fuese reportada como desaparecida la socialite, influencer y modelo de Hong Kong de 28 años. Su cuerpo decapitado se encontró en Tai Po, un...

 

 

Goldman Sachs economist Not to be confused with Peter Oppenheimer (economist) or Peter Oppenheimer. Peter C. OppenheimerAlma materLondon School of Economics (BSc)Occupation(s)Chief Global Equity Strategist, Goldman Sachs Peter C. Oppenheimer is chief global equity strategist and head of Macro Research in Europe within Global Investment Research at Goldman Sachs.[1] Oppenheimer joined Goldman Sachs in 2002 as European and global strategist and was named managing director in 2003 a...

The Best of Philip K. Dick Cover of the first edition.AuthorPhilip K. DickCover artistVincent Di FateCountryUnited StatesLanguageEnglishSeriesBallantine's Classic Library of Science FictionGenreScience fictionPublisherDel Rey BooksPublication date1977Media typePrint (paperback)Pagesxiv, 450ISBN0-345-25359-0OCLC2645491Dewey Decimal813/.5/4LC ClassPS3554.I3 B4Preceded byThe Best of C. M. Kornbluth Followed byThe Best of Fredric Brown  The Best of Philip K. Dic...

 

 

Contents 0–9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z See also References External links List of alien races in DC Comics is a list of fictional extraterrestrial races that have appeared in comic book titles published by DC Comics, as well as properties from other media that are listed below, with brief descriptions and accompanying citations. Overview There are countless different extraterrestrial races in the DC Comics universe. The vast majority are humanoid in structure. Unit...

 

 

Municipality in Trøndelag, Norway This article is about the municipality in Trøndelag, Norway. For the town in Namsos, Norway, see Namsos (town). Municipality in Trøndelag, NorwayNamsos Municipality Namsos kommuneNåavmesjenjaelmien tjïelteMunicipality FlagCoat of armsTrøndelag within NorwayNamsos within TrøndelagCoordinates: 64°29′38″N 11°30′42″E / 64.49389°N 11.51167°E / 64.49389; 11.51167CountryNorwayCountyTrøndelagDistrictNamdalenEstablished1846&...

Suburban rail service in Milan Saronno–Milano Passante–LodiAn S1 train at Lodi.OverviewService typeCommuter railSystemMilan suburban railway serviceStatusOperationalLocaleMilan, ItalyFirst service2004Current operator(s)TrenordWebsiteTrenord (in Italian)RouteTerminiSaronnoLodiStops25Distance travelled55 km (34 mi)Line(s) used Milan–Saronno Milan Passante Milan–Bologna TechnicalRolling stockTreno Servizio RegionaleTrack gauge1,435 mm (4 ft 8+1⁄2 in)El...

 

 

Brush used for applying makeup or face paint Makeup brushes A makeup brush is a tool with bristles, used for the application of makeup or face painting. The bristles may be made out of natural or synthetic materials, while the handle is usually made out of plastic or wood. When cosmetics are applied using the appropriate brush, they blend better onto the skin. There is a large variety of shapes and sizes of makeup brushes, depending on the face area where makeup will be applied, the cosmetic ...

 

 

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Februari 2023. Ayam panggang Bangka adalah makanan yang dibuat untuk keperluan upacara Perang Ketupat yang dilakukan rutin setiap tahun sekali. Bahan yang digunakan untuk pembuatan ayam panggang berupa ayam kampung dengan ukuran relatif lebih besar, dengan bumbu cab...

Victor van Vriesland Victor van Vriesland (1962) Algemene informatie Volledige naam Victor Emanuel van Vriesland Geboren 27 oktober 1892 Geboorteplaats Haarlem Overleden 29 oktober 1974 Overlijdensplaats Amsterdam Land  Nederland Beroep criticus, dichter, vertaler Werk Jaren actief 1915 - 1972 Bekende werken Spiegel van de Nederlandse poëzie door alle eeuwen Onderscheidingen P.C. Hooftprijs Dbnl-profiel Portaal    Literatuur Victor Emanuel van Vriesland (Haarlem, 27 oktober 18...

 

 

هذه صفحة توثيق قالب:رسم بياني لتوزيع الصفحات المقالية لمشروع ويكي الفرعية، لشرح القالب وتصنيفه، وهي لا تدخل في استخدامه. الاستعمال {{رسم بياني لتوزيع الصفحات المقالية لمشروع ويكي | المحاذاة = | المشروع = }} أمثلة توصيف يظهر {{رسم بياني لتوزيع الصفحات المقالية لمشروع ويكي|المش...

 

 

Main character in the Texas Chainsaw Massacre series This article is about the fictional character. For the 2017 film, see Leatherface (2017 film). For other uses, see Leatherface (disambiguation). Fictional character LeatherfaceThe Texas Chainsaw Massacre characterGunnar Hansen as Leatherface from The Texas Chain Saw Massacre (1974)First appearanceThe Texas Chain Saw Massacre (1974)Created byKim HenkelTobe HooperPortrayed byGunnar HansenBill JohnsonR. A. MihailoffRobert JacksAndrew Bryniarsk...

South Korean girl group Young PosseOfficial logoBackground informationOriginSeoul, South KoreaGenresK-popYears active2023 (2023)–presentLabelsDSPBeatsMembers Sunhye Yeonjung Jiana Doeun Jieun WebsiteOfficial website Young Posse (Korean: 영파씨; RR: Yeongpassi; stylized in all caps) is a South Korean girl group under DSP Media and co-produced by Beats Entertainment, an independent record label. The group consists of members Sunhye, Yeonjung, Jiana, Doeun, and Ji...

 

 

Schematic view on Sisu Nemo structure. Sisu Nemo is a hydraulic radial piston motor type developed and initially produced by Suomen Autoteollisuus (SAT). The system was patented in 1961. The motor produces a high torque at low speed and it has been primarily used to power both civil and military lorry trailers. A number of other applications have been designated for various industrial applications. Development The idea of the motor came from DI Ilmari Louhio who worked in SAT as design engine...

 

 

Firebase 6Firebase 6Coordinates14°37′44″N 107°43′12″E / 14.629°N 107.72°E / 14.629; 107.72 (Firebase 6)TypeArmySite informationConditionabandonedSite historyBuilt1967In use1967–73Battles/warsVietnam War Firebase 6 (also known as Hill 1001) is a former U.S. Army and Army of the Republic of Vietnam (ARVN) base southwest of Đắk Tô in the Central Highlands of Vietnam. History The base was originally established in November 1967 during the Bat...

Card game BinokelTraditional Swiss and Swabian game; ancestor of American pinochleThe suit of Leaves in a Binokel packOriginWürttembergAlternative namesBinocleTypePoint-trickPlayers2 – 4Age range10+Cards2 x 24DeckGerman (Württemberg pattern)Rank (high→low)A 10 K O U 7PlayAnticlockwisePlaying time20–30 minutes/gameRelated gamesBézique • Marjolet • Pinochle Binokel is a card game for two to eight players that originated in Switzerland as Binocle, but sprea...

 

 

Hungarian canoeist The native form of this personal name is Tótka Sándor. This article uses Western name order when mentioning individuals. Sándor TótkaSandor Totka at ECH 2016Personal informationNationalityHungarianBorn (1994-07-27) 27 July 1994 (age 29)Mezőtúr, Hungary[1]Height188 cm (6 ft 2 in)Weight84 kg (185 lb)SportCountryHungarySportCanoe sprintClubÚjpesti TE Medal record Men's canoe sprint Representing  Hungary Olympic Games 2020 T...

 

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!