Share to: share facebook share twitter share wa share telegram print page

Ontology learning

Ontology learning (ontology extraction,ontology augmentation generation, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

Typically, the process starts by extracting terms and concepts or noun phrases from plain text using linguistic processors such as part-of-speech tagging and phrase chunking. Then statistical[1] or symbolic[2][3] techniques are used to extract relation signatures, often based on pattern-based[4] or definition-based[5] hypernym extraction techniques.

Procedure

Ontology learning (OL) is used to (semi-)automatically extract whole ontologies from natural language text.[6][7] The process is usually split into the following eight tasks, which are not all necessarily applied in every ontology learning system.

Domain terminology extraction

During the domain terminology extraction step, domain-specific terms are extracted, which are used in the following step (concept discovery) to derive concepts. Relevant terms can be determined, e.g., by calculation of the TF/IDF values or by application of the C-value / NC-value method. The resulting list of terms has to be filtered by a domain expert. In the subsequent step, similarly to coreference resolution in information extraction, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept. The most common methods therefore are clustering and the application of statistical similarity measures.

Concept discovery

In the concept discovery step, terms are grouped to meaning bearing units, which correspond to an abstraction of the world and therefore to concepts. The grouped terms are these domain-specific terms and their synonyms, which were identified in the domain terminology extraction step.

Concept hierarchy derivation

In the concept hierarchy derivation step, the OL system tries to arrange the extracted concepts in a taxonomic structure. This is mostly achieved with unsupervised hierarchical clustering methods. Because the result of such methods is often noisy, a supervision step, e.g., user evaluation, is added. A further method for the derivation of a concept hierarchy exists in the usage of several patterns that should indicate a sub- or supersumption relationship. Patterns like “X, that is a Y” or “X is a Y” indicate that X is a subclass of Y. Such pattern can be analyzed efficiently, but they often occur too infrequently to extract enough sub- or supersumption relationships. Instead, bootstrapping methods are developed, which learn these patterns automatically and therefore ensure broader coverage.

Learning of non-taxonomic relations

In the learning of non-taxonomic relations step, relationships are extracted that do not express any sub- or supersumption. Such relationships are, e.g., works-for or located-in. There are two common approaches to solve this subtask. The first is based upon the extraction of anonymous associations, which are named appropriately in a second step. The second approach extracts verbs, which indicate a relationship between entities, represented by the surrounding words. The result of both approaches need to be evaluated by an ontologist to ensure accuracy.

Rule discovery

During rule discovery,[8] axioms (formal description of concepts) are generated for the extracted concepts. This can be achieved, e.g., by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree. The result of this process is a list of axioms, which, afterwards, is comprehended to a concept description. This output is then evaluated by an ontologist.

Ontology population

At this step, the ontology is augmented with instances of concepts and properties. For the augmentation with instances of concepts, methods based on the matching of lexico-syntactic patterns are used. Instances of properties are added through the application of bootstrapping methods, which collect relation tuples.

Concept hierarchy extension

In this step, the OL system tries to extend the taxonomic structure of an existing ontology with further concepts. This can be performed in a supervised manner with a trained classifier or in an unsupervised manner via the application of similarity measures.

Frame and Event detection

During frame/event detection, the OL system tries to extract complex relationships from text, e.g., who departed from where to what place and when. Approaches range from applying SVM with kernel methods to semantic role labeling (SRL)[9] to deep semantic parsing techniques.[10]

Tools

Dog4Dag (Dresden Ontology Generator for Directed Acyclic Graphs) is an ontology generation plugin for Protégé 4.1 and OBOEdit 2.1. It allows for term generation, sibling generation, definition generation, and relationship induction. Integrated into Protégé 4.1 and OBO-Edit 2.1, DOG4DAG allows ontology extension for all common ontology formats (e.g., OWL and OBO). Limited largely to EBI and Bio Portal lookup service extensions.[11]

See also

Bibliography

  • P. Buitelaar, P. Cimiano (Eds.). Ontology Learning and Population: Bridging the Gap between Text and Knowledge, Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2008.
  • P. Buitelaar, P. Cimiano, and B. Magnini (Eds.). Ontology Learning from Text: Methods, Evaluation and Applications, Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2005.
  • Wong, W. (2009), "Learning Lightweight Ontologies from Text across Different Domains using the Web as Background Knowledge[permanent dead link]". Doctor of Philosophy thesis, University of Western Australia.
  • Wong, W., Liu, W. & Bennamoun, M. (2012), "Ontology Learning from Text: A Look back and into the Future". ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.
  • Thomas Wächter, Götz Fabian, Michael Schroeder: DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé. SWAT4LS London, 2011. doi:10.1145/2166896.2166926

References

  1. ^ A. Maedche and S.Staab. Learning ontologies for the semantic web.In Semantic Web Worskhop 2001.
  2. ^ Roberto Navigli and Paola Velardi. Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites, Computational Linguistics,30(2), MIT Press, 2004, pp.151-179.
  3. ^ P.Velardi, S.Faralli, R.Navigli. OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press,2013, pp.665-707.
  4. ^ Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Fourteenth International Conference on Computational Linguistics, pages 539--545, Nantes, France, July 1992.
  5. ^ R.Navigli, P. Velardi. Learning Word-Class Lattices for Definition and Hypernym Extraction.Proc.of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, July 11–16, 2010, pp.1318-1327.
  6. ^ Cimiano, Philipp; Völker, Johanna; Studer, Rudi (2006). "Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text", Information, Wissenschaft und Praxis, 57, p. 315 - 320, http://people.aifb.kit.edu/pci/Publications/iwp06.pdf%5B%5D (retrieved: 18.06.2012).
  7. ^ Wong, W., Liu, W. & Bennamoun, M. (2012), "Ontology Learning from Text: A Look back and into the Future". ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.
  8. ^ Johanna Völker; Pascal Hitzler; Cimiano, Philipp (2007). "Acquisition of OWL DL Axioms from Lexical Resources", Proceedings of the 4th European conference on The Semantic Web, p. 670 - 685, http://smartweb.dfki.de/Vortraege/lexo_2007.pdf (retrieved: 18.06.2012).
  9. ^ Coppola B.; Gangemi A.; Gliozzo A.; Picca D.; Presutti V. (2009). "Frame Detection over the Semantic Web", Proceedings of the European Semantic Web Conference (ESWC2009), Springer, 2009.
  10. ^ Presutti V.; Draicchio F.; Gangemi A. (2009). "Knowledge extraction based on Discourse Representation Theory and Linguistic Frames", Proceedings of the Conference on Knowledge Engineering and Knowledge Management (EKAW2012), LNCS, Springer, 2012.
  11. ^ Thomas Wächter, Götz Fabian, Michael Schroeder: DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé. SWAT4LS London, 2011. doi:10.1145/2166896.2166926 http://www.biotec.tu-dresden.de/research/schroeder/dog4dag/


Read other articles:

Skyscraper in Boston Exchange PlaceExchange Place at duskGeneral informationTypeOfficeLocation53 State Street, Boston, MassachusettsCoordinates42°21′30″N 71°03′23″W / 42.35832°N 71.05645°W / 42.35832; -71.05645Completed1984HeightRoof510 ft (160 m)Technical detailsFloor count40Floor area1,121,599 sq ft (104,200.0 m2)Lifts/elevators24Design and constructionArchitect(s)WZMH ArchitectsDeveloperBrookfield PropertiesReferences[1] ...

Joachim Seyppel,am 4. November 1967 in Berlin fotografiert von Horst Sturm Joachim Seyppel (* 3. November 1919 in Groß-Lichterfelde bei Berlin; † 25. Dezember 2012 in Wismar) war ein deutscher Schriftsteller und Literaturwissenschaftler. Inhaltsverzeichnis 1 Leben 2 Werke 3 Literatur 4 Weblinks 5 Einzelnachweise Leben Joachim Seyppel war der Sohn eines kaufmännischen Angestellten und einer Putzmacherin. Er besuchte das Grunewald-Gymnasium[1] in Berlin-Grunewald, wo er 1938 sein Ab...

Defunct roller coaster This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Ultra Twister Six Flags – news · newspapers · books · scholar · JSTOR (November 2007) (Learn how and when to remove this template message) Ultra TwisterUltra Twister at AstroWorldSix Flags AstroWorldCoordinates40°08′23″N 74°26...

Selección de fútbol sub-23 de Marruecos Datos generalesPaís MarruecosCódigo FIFA MARFederación Real Federación Marroquí de FútbolConfederación CAFSeudónimo(s) Leones del AtlasSeleccionador  Issame CharaïEquipaciones Primera Segunda Mejor(es) resultado(s) Sin datosPeor(es) resultado(s) Sin datosCopa Africana de Naciones Sub-23Participaciones 2 (primera vez en 2011)Mejor resultado Campeón en 2023Juegos OlímpicosParticipaciones 4 (primera vez en 1992)Mejor resultado Fase de Gru...

Die Liste von Sakralbauten in Springe nennt Kirchengebäude und andere Sakralbauten in Springe, Region Hannover, Niedersachsen. Liste Bild Name Ort Koordinaten Konfession der Gemeinde St.-Nicolai-Kirche Alferde 52° 9′ 50,1″ N, 9° 42′ 23,5″ O52.1639166666679.7065277777778 evangelisch-lutherisch St.-Vincenz-Kirche Altenhagen I 52° 10′ 11″ N, 9° 31′ 5,6″ O52.1697222222229.5182222222222 evangelisch-lutherisch Alte Kap...

French Huguenot settler in Dutch Cape Colony (1629–1664) Maria van RiebeeckBornMaria de la Queillerie(1629-10-28)28 October 1629Rotterdam, Dutch RepublicDied2 November 1664(1664-11-02) (aged 35)Dutch MalaccaSpouseJan van RiebeeckChildrenAbraham van Riebeeck7 others Maria van Riebeeck (née de la Queillerie; 28 October 1629 – 2 November 1664) was a French Huguenot who was the first wife of Jan van Riebeeck, the Dutch colonial administrator and first commander of the settlement at ...

Bajo Nuevo Bajo Nuevo (Spaans: Bajo Nuevo), ook wel bekend onder de naam Petreleilanden (Spaans: Islas Petrel), is een kleine onbewoonde eilandengroep bestaande uit een koraalrif en een aantal kleine met gras bedekte eilanden. Bajo Nuevo ligt in de Caraïbische Zee op een afstand van 110 km ten oosten van Serranilla. Het rif werd het het eerst getoond op Nederlandse kaarten uit 1634 en werd in 1660 herontdekt door de Engelse piraat John Glover. Tegenwoordig behoort de eilandengroep toe aan Co...

Karavel Portugis. Ini adalah model standar karavel yang digunakan Portugis dalam pelayarannya. Karavel dengan layar latin bisa berlayar menantang angin, lebih kuat dari kapal berlayar kotak. Karavel ini bisa mengangkut 20 pelaut.[1] Karavel (Bahasa Portugis: caravela, IPA: [kɐɾɐˈvɛlɐ]) adalah kapal layar kecil yang mudah dikemudikan dari abad 15 yang banyak digunakan Portugis untuk melintasi pantai Afrika Barat dan Samudera Atlantik. Layar latin (segitiga) yang digunakan me...

Meidensha Corporation株式会社明電舎JenisPublik K.K.Kode emitenTYO: 6508Templat:NAGKomponen Nikkei 225IndustriPeralatan listrikDidirikan22 Desember 1897PendiriHosui ShigemuneKantorpusatThinkPark Tower, 2-1-1 Osaki, Shinagawa-ku, Tokyo, 141-6029 JepangTokohkunciYuji Hamasaki, (Chairman)Takeshi Miida, (Presiden)ProdukSistem infrastruktur sosialSistem industriPerawatan dan perbaikanLahan yasanPendapatan JPY 230,3 milyar (FY 2014) (US$ 1,91 milyar) (FY 2014)Laba bersih JPY 6,8 milyar (FY 20...

Artikel ini tentang album Siti Nurhaliza. untuk lagu gitaris Chicago Otis Rush, lihat All Your Love (I Miss Loving). All Your LoveAlbum studio karya Siti NurhalizaDirilis16 September 2011 (2011-09-16) (Digital)26 September 2011 (2011-09-26) (Fisik)DirekamJanuari 2011 – September 2011GenrePop, dance-pop, R&BDurasi42:57LabelWhat's Up EntertainmentProduserBryan Bouro, Christian Alexanda, Tom DieselKronologi Siti Nurhaliza Tahajjud Cinta(2009)Tahajjud Cinta2009 All Your Love(2...

Bearskin Airlines C-FFZN SA227-AC Metroliner operating out of Red Lake, Ontario, c. 2007 In July 2016, 225 Fairchild Swearingen Metroliners were in airline service: 170 in Americas, 28 in Asia Pacific & Middle East and 27 in Europe. Its airline operators with six or more aircraft were :[1] 47: Ameriflight 27: Aeronaves TSM 23: Key Lime Air 20: Perimeter Aviation 16: Bearskin Airlines 12: Encore Air Cargo 10: Sierra West Airlines 8: Berry Aviation 8: SkyCare Air Ambulance 7: T...

Italian stream connecting to the Tiber CremeraLocationCountryItalyPhysical characteristicsSource  • locationMonte Silio MouthTiber • locationLabaro • coordinates41°59′14″N 12°29′52″E / 41.98722°N 12.49778°E / 41.98722; 12.49778Length36.7 km (22.8 mi)Basin size103 km2 (40 sq mi)Discharge  • averageabout 5 m3/s (180 cu ft/s) Basin featur...

Municipality in Vaud, SwitzerlandBoulensMunicipality Coat of armsLocation of Boulens BoulensShow map of SwitzerlandBoulensShow map of Canton of VaudCoordinates: 46°41′N 6°43′E / 46.683°N 6.717°E / 46.683; 6.717CountrySwitzerlandCantonVaudDistrictGros-de-VaudGovernment • MayorSyndicArea[1] • Total3.42 km2 (1.32 sq mi)Elevation718 m (2,356 ft)Population (31 December 2018)[2] • Total...

American soldier, police detective, and screenwriter For other people named Bill Clark, see William Clark (disambiguation). Walter ClarkPromotion to Major (General Barker, Bill Clark, Walter Clark Sr.)BornSt. John's, Newfoundland, CanadaOther namesBill ClarkOccupation(s)Writer, executive producer, retired detective (NYPD) Bill Clark is a former New York Police Department first grade detective and an award-winning television writer and producer. He was a veteran NYPD Detective First Grade...

Regency in Indonesia Regency in Central Java, IndonesiaJepara Regency Kabupaten JeparaRegency Coat of armsMotto: Trus Karyo Tataning Bumi (Javanese: Keep working hard to build regional)Location of Jepara Regency in Central JavaCoordinates: 6°32′0″S 110°40′0″E / 6.53333°S 110.66667°E / -6.53333; 110.66667CountryIndonesiaProvinceCentral JavaCapitalJeparaGovernment • RegentEdy Suprianta (until Regent Election 2024) • Vice Regent-Ar...

Kekaisaran Partia247 SM–224 MKeseluruhan wilayah yang pernah dikuasai wangsa ArsakIbu kotaTisfon,[1] Ahmadan, Komis, Susa, Nisa, Arsak, RagaBahasa yang umum digunakanYunani (bahasa resmi),[2] Partawa (bahasa resmi),[3] Aram (basantara)[2][4]Agama Mazdayasna Agama asli Babel[5] PemerintahanMonarki Feodal[6]Syahansyah • 247–211 pra-Masehi Arsak I (pertama)• 208–224 Masehi Ardawan IV (terakhir) LegislatifDewan ...

Funds held in demand deposit accounts Part of a series on financial servicesBanking Types of banks Advising Banq Bulge bracket Central Commercial Community development Cooperative Credit union Custodian Depository Development Direct Export credit agency Investment Industrial Merchant Middle market Mutual savings National Neobank Offshore Participation Payments Postal savings Private Public Retail Savings Savings and loan Universal Wholesale Bank holding company Lists of banks Accounts ·...

Resolusi 1493Dewan Keamanan PBBRepublik Demokratik KongoTanggal28 Juli 2003Sidang no.4.797KodeS/RES/1493 (Dokumen)TopikSituasi di Republik Demokratik KongoRingkasan hasil15 mendukungTidak ada menentangTidak ada abstainHasilDiadopsiKomposisi Dewan KeamananAnggota tetap Tiongkok Prancis Rusia Britania Raya Amerika SerikatAnggota tidak tetap Angola Bulgaria Chili Kamerun Spanyol Jerman Guinea Meksiko Pakistan Sy...

Miss Grand Indonesia 2018Nadia Purwoko, Miss Grand Indonesia 2018.Tanggal21 Juli 2018TempatJakarta Convention Center, Senayan, Jakarta PusatPembawa acaraEvan SandersMaya SepthaPengisi acaraJudikaSheryl SheinafiaRizky FebianSoundwavePenyiaranSCTVPeserta30Finalis/Semifinalis10PemenangNadia Purwoko BengkuluPersahabatanNi Komang Dian Maya Nusa Tenggara Barat2019 →lbs Miss Grand Indonesia 2018 adalah edisi perdana kontes kecantikan Miss Grand Indonesia[1] yan...

Election in Pennsylvania Main article: 1988 United States presidential election 1988 United States presidential election in Pennsylvania ← 1984 November 8, 1988 1992 →   Nominee George H. W. Bush Michael Dukakis Party Republican Democratic Home state Texas Massachusetts Running mate Dan Quayle Lloyd Bentsen Electoral vote 25 0 Popular vote 2,300,087 2,194,944 Percentage 50.70% 48.39% County Results Bush   50-60%  60-70% &...

Kembali kehalaman sebelumnya