Share to: share facebook share twitter share wa share telegram print page

Multi-document summarization

Multi-document summarization is an automatic procedure aimed at extraction of information from multiple texts written about the same topic. The resulting summary report allows individual users, such as professional information consumers, to quickly familiarize themselves with information contained in a large cluster of documents. In such a way, multi-document summarization systems are complementing the news aggregators performing the next step down the road of coping with information overload.

Key benefits and difficulties

Multi-document summarization creates information reports that are both concise and comprehensive. With different opinions being put together & outlined, every topic is described from multiple perspectives within a single document. While the goal of a brief summary is to simplify information search and cut the time by pointing to the most relevant source documents, comprehensive multi-document summary should in theory contain the required information, hence limiting the need for accessing original files to cases when refinement is required. In practice, it is hard to summarize multiple documents with conflicting views and biases. In fact, it is almost impossible to achieve clear extractive summarization of documents with conflicting views. Abstractive summarization is the preferred venue in this case.

Automatic summaries present information extracted from multiple sources algorithmically, without any editorial touch or subjective human intervention, thus making it completely unbiased. The difficulties remain, if doing automatic extractive summaries of documents with conflicting views.

Technological challenges

The multi-document summarization task is more complex than summarizing a single document, even a long one. The difficulty arises from thematic diversity within a large set of documents. A good summarization technology aims to combine the main themes with completeness, readability, and concision. The Document Understanding Conferences,[1] conducted annually by NIST, have developed sophisticated evaluation criteria for techniques accepting the multi-document summarization challenge.

An ideal multi-document summarization system not only shortens the source texts, but also presents information organized around the key aspects to represent diverse views. Success produces an overview of a given topic. Such text compilations should also follow basic requirements for an overview text compiled by a human. The multi-document summary quality criteria are as follows:

  • clear structure, including an outline of the main content, from which it is easy to navigate to the full text sections
  • text within sections is divided into meaningful paragraphs
  • gradual transition from more general to more specific thematic aspects
  • good readability.

The latter point deserves an additional note. Care is taken to ensure that the automatic overview shows:

  • no paper-unrelated "information noise" from the respective documents (e.g., web pages)
  • no dangling references to what is not mentioned or explained in the overview
  • no text breaks across a sentence
  • no semantic redundancy.

Real-life systems

The multi-document summarization technology is now coming of age - a view supported by a choice of advanced web-based systems that are currently available.

  • ReviewChomp presents summaries of customer reviews for any given product or service. Some products have thousands of online reviews which renders the reviews unreadable by humans in real time. Search for the product or service is performed by the website.
  • Ultimate Research Assistant[2] - performs text mining on Internet search results to help summarize and organize them and make it easier for the user to perform online research. Specific text mining techniques used by the tool include concept extraction, text summarization, hierarchical concept clustering (e.g., automated taxonomy generation), and various visualization techniques, including tag clouds and mind maps.
  • iResearch Reporter[3] - Commercial Text Extraction and Text Summarization system, free demo site accepts user-entered query, passes it on to Google search engine, retrieves multiple relevant documents, produces categorized, easily readable natural language summary reports covering multiple documents in retrieved set, all extracts linked to original documents on the Web, post-processing, entity extraction, event and relationship extraction, text extraction, extract clustering, linguistic analysis, multi-document, full text, natural language processing, categorization rules, clustering, linguistic analysis, text summary construction tool set.
  • Newsblaster[4] is a system that helps users find news that is of the most interest to them. The system automatically collects, clusters, categorizes, and summarizes news from several sites on the web (CNN, Reuters, Fox News, etc.) on a daily basis, and it provides users an interface to browse the results.
  • NewsInEssence[5] may be used to retrieve and summarize a cluster of articles from the web. It can start from a URL and retrieve documents that are similar, or it can retrieve documents that match a given set of keywords. NewsInEssence also downloads news articles daily and produces news clusters from them.
  • NewsFeed Researcher[6] is a news portal performing continuous automatic summarization of documents initially clustered by the news aggregators (e.g., Google News). NewsFeed Researcher is backed by a free online engine covering major events related to business, technology, U.S. and international news. This tool is also available in on-demand mode allowing a user to build a summaries on selected topics.
  • Scrape This[7] is like a search engine, but instead of providing links to the most relevant websites based on a query, it scrapes the pertinent information off of the relevant websites and provides the user with a consolidated multi-document summary, along with dictionary definitions, images, and videos.
  • JistWeb[8] is a query specific multiple document summariser.

As auto-generated multi-document summaries increasingly resemble the overviews written by a human, their use of extracted text snippets may one day face copyright issues in relation to the fair use copyright concept.

Bibliography

  • Günes Erkan; Dragomir R. Radev (1 December 2004). "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization". Journal of Artificial Intelligence Research. 22: 457–479. arXiv:1109.2128. doi:10.1613/JAIR.1523. ISSN 1076-9757. Wikidata Q81312697.
  • Dragomir R. Radev, Hongyan Jing, Malgorzata Styś, and Daniel Tam. Centroid-based summarization of multiple documents. Information Processing and Management, 40:919–938, December 2004. [5]
  • Kathleen R. McKeown and Dragomir R. Radev. Generating summaries of multiple news articles. In Proceedings, ACM Conference on Research and Development in Information Retrieval SIGIR'95, pages 74–82, Seattle, Washington, July 1995. [6]
  • C.-Y. Lin, E. Hovy, "From single to multi-document summarization: A prototype system and its evaluation", In "Proceedings of the ACL", pp. 457–464, 2002
  • Kathleen McKeown, Rebecca J. Passonneau, David K. Elson, Ani Nenkova, Julia Hirschberg, "Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization", SIGIR’05, Salvador, Brazil, August 15–19, 2005 [7]
  • R. Barzilay, N. Elhadad, K. R. McKeown, "Inferring strategies for sentence ordering in multidocument news summarization", Journal of Artificial Intelligence Research, v. 17, pp. 35–55, 2002
  • M. Soubbotin, S. Soubbotin, "Trade-Off Between Factors Influencing Quality of the Summary", Document Understanding Workshop (DUC), Vancouver, B.C., Canada, October 9–10, 2005 [8]
  • C Ravindranath Chowdary, and P. Sreenivasa Kumar. "Esum: an efficient system for query-specific multi-document summarization." In ECIR (Advances in Information Retrieval), pp. 724–728. Springer Berlin Heidelberg, 2009.

See also

References

  1. ^ "Document Understanding Conferences". Nlpir.nist.gov. 2014-09-09. Retrieved 2016-01-10.
  2. ^ "Generate Research Report". Ultimate Research Assistant. Retrieved 2016-01-10.
  3. ^ "iResearch Reporter service". Iresearch-reporter.com. Archived from the original on 2013-06-09. Retrieved 2016-01-10.
  4. ^ [1] Archived April 16, 2013, at the Wayback Machine
  5. ^ [2] Archived April 11, 2011, at the Wayback Machine
  6. ^ "News Feed Researcher | General Stuff". Newsfeedresearcher.com. Retrieved 2016-01-10.
  7. ^ [3] Archived September 19, 2009, at the Wayback Machine
  8. ^ [4] Archived May 29, 2013, at the Wayback Machine

Read other articles:

Finsterforst Finsterforst auf dem Wave-Gotik-Treffen 2014 Allgemeine Informationen Herkunft Schwarzwald, Deutschland Genre(s) Folk Metal Gründung 2004 Website www.finsterforst.de Gründungsmitglieder E-Gitarre Simon Schillinger Keyboard Sebastian Scherrer Bass Tobias Weinreich Gesang, zwölfsaitige Gitarre Marco Schomas (bis 2009) Aktuelle Besetzung Gesang Oliver Berlin (seit 2010) E-Gitarre, Keyboard (Studio-Orchestration), Chorgesang Simon Schillinger E-Gitarre, Sackpfeifen David Schuldis ...

« Rajons » et « Lielpilsētas » redirigent ici. Pour le découpage administratif qui s'applique depuis 2009, voir Organisation territoriale de la Lettonie. Les anciens rajons de Lettonie numérotés par ordre alphabétique Jusqu'à la réforme administrative du 1er juillet 2009 la Lettonie était divisée en 26 districts (en letton : rajons) et 7 villes à statut spécial (en letton : lielpilsētas), notées par des astérisques : Aizkraukles rajons Alū...

Der Titel dieses Artikels ist mehrdeutig. Weitere Bedeutungen sind unter Malchin (Begriffsklärung) aufgeführt. Wappen Deutschlandkarte 53.73916666666712.76194444444410Koordinaten: 53° 44′ N, 12° 46′ O Basisdaten Bundesland: Mecklenburg-Vorpommern Landkreis: Mecklenburgische Seenplatte Amt: Malchin am Kummerower See Höhe: 10 m ü. NHN Fläche: 109,27 km2 Einwohner: 7272 (31. Dez. 2022)[1] Bevölkerungsdichte: 67 Einwohner j...

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Confessions of the Mind – news · newspapers · books · scholar · JSTOR (May 2018) (Learn how and when to remove this template message) Confessions of the MindStudio album by The HolliesReleasedNovember 1970RecordedNovember 1969 – 4 May 1970StudioAbbey Road...

Definitions of technical terms, jargon, diver slang and acronyms used in underwater diving Surface-supplied divers riding a stage to the underwater workplace This is a glossary of technical terms, jargon, diver slang and acronyms used in underwater diving. The definitions listed are in the context of underwater diving. There may be other meanings in other contexts. Underwater diving can be described as a human activity – intentional, purposive, conscious and subjectively meaningful sequ...

The Tokyo Metropolitan Government Board of Education (東京都教育委員会 Tōkyō-to Kyōiku Iinkai) is the board of education in Tokyo, Japan. The board directly manages all of the public high schools in all 23 special wards, the Western Tokyo, and all islands under Tokyo's jurisdiction. In 2019, policies requiring students who do not naturally have black hair to dye it as such were struck down.[1] In 2017, as stated by survey results, 57% of the state-operated schools in the me...

Science and technology museum in Illinois, U.S.Museum of Science and Industry, ChicagoThe south facade of the Museum of Science and Industry overlooks a reflecting lagoon in Jackson ParkFormer namePalace of Fine ArtsEstablished1933; 90 years ago (1933)Location5700 South Lake Shore Drive(at East 57th Street),Chicago, Illinois, U.S., 60637TypeScience and technology museumVisitors1.5 million (2016)[1]Public transit accessCTA Bus routes:Routes 6 and 28(to 56th Street and...

  Paracuellos de la Riberaباراكويوس دي لا ريبيرا (بالإسبانية: Paracuellos de la Ribera)‏[1]    باراكويوس دي لا ريبيرا باراكويوس دي لا ريبيرا تقسيم إداري البلد  إسبانيا[2] المنطقة أراغون المسؤولون المقاطعة سرقسطة خصائص جغرافية إحداثيات 41°25′23″N 1°33′46″W / 41.423056°N 1.5628852°Wþ...

2006 studio album by Genghis TronDead Mountain MouthStudio album by Genghis TronReleasedJune 6, 2006RecordedJanuary 10 – January 19, 2006StudioGod City Studios, Salem, MAGenre Avant-garde metal grindcore[1] electronica[2] Length31:29LabelCrucial Blast (CD)Lovepump United (vinyl)ProducerKurt BallouGenghis Tron chronology Cape of Hate(2006) Dead Mountain Mouth(2006) Triple Black Diamond(2007) Professional ratingsReview scoresSourceRatingAllMusicPositive[1]Decib...

Roti lapis daging dada ayam Burger daging ayam Di Amerika Utara, roti lapis daging ayam adalah roti lapis yang biasanya terdiri dari daging dada ayam tanpa tulang, tanpa kulit, yang disajikan di antara irisan roti, roti, atau roti gulung. Variasi dari apa yang orang Amerika Utara anggap sebagai Roti Lapis Daging Ayam termasuk burger daging ayam atau roti isi daging ayam, ayam pedas, dan roti lapis salad ayam. Negara-negara di luar Amerika Utara umumnya menganggap roti lapis daging ayam hanya ...

Monte Rotondo massifMonte Rotondo from TraloncaHighest pointPeakMonte RotondoElevation2,622 m (8,602 ft)Coordinates42°12′58″N 9°03′28″E / 42.21611°N 9.05778°E / 42.21611; 9.05778Geography LocationCorsica, France The Monte Rotondo massif (French: Massif du Monte Rotondo) is a chain of mountains on the southern side of Corsica, France. It takes its name from Monte Rotondo, the highest peak. Location The Monte Rotondo massif is one of the four m...

Commuter rail line in New Jersey and New York Pascack Valley LineA Hoboken Terminal-bound train at River Edge.OverviewOwnerNew Jersey TransitLocaleNorthern New Jersey and Hudson Valley, New York, United StatesTerminiHoboken TerminalSpring ValleyStations18ServiceTypeCommuter railSystemNew Jersey Transit Rail OperationsMetro-North RailroadOperator(s)New Jersey TransitRolling stockF40PH-3C/GP40PH-2/GP40FH-2/PL42AC/ALP-45DP locomotivesComet VDaily ridership7,200 (weekday average, FY 2012)[1&#...

Royal Resident for Majesty in Mandalay, MyanmarMandalay Palaceမန္တလေး နန်းတော်Mandalay Palace groundsLocation in MyanmarAlternative namesMya Nan San Kyaw Golden PalaceGeneral informationTypeRoyal Resident for MajestyLocationMandalayCountry MyanmarCoordinates21°59′34.59″N 96°5′45.28″E / 21.9929417°N 96.0959111°E / 21.9929417; 96.0959111Construction started1857Completed1859OwnerGovernment of Myanmar Palace wall on the moat w...

Higashimatsushima 東松島市Kota BenderaLambangLocation of Higashimatsushima in Miyagi PrefectureNegara JepangWilayahTōhokuPrefekturMiyagiPemerintahan • WalikotaIwao AtsumiLuas • Total101,36 km2 (3,914 sq mi)Populasi (Oktober 1, 2019) • Total39.138 • Kepadatan386/km2 (1,000/sq mi)Zona waktuUTC+9 (Japan Standard Time)Simbol kota • PohonTusam• BungaSakuraNomor telepon0225-82-1111Alamat36-1 Kamikawado,...

Ilustrasi pengadilan penyihir tahun 1876. Pengadilan penyihir Salem adalah pengadilan terhadap orang-orang yang dituduh sebagai penyihir di County Essex, Suffolk, dan Middlesex, di koloni Massachusetts pada masa antara Februari 1692 hingga Mei 1693. Peristiwa ini digambarkan sebagai bahaya dari ekstremisme agama dan penuduhan yang salah.[1] Catatan kaki ^ Adams, Gretchen A. The Specter of Salem: Remembering the Witch Trials in Nineteenth-Century America, Chicago, U of Chicago P, 2009 ...

Unprotected cruiser of the German Imperial Navy SMS Geier, 1894 History German Empire NameGeier NamesakeVulture Laid down1893 Launched18 October 1894 Commissioned24 October 1895 FateCaptured by the US Navy, 6 April 1917 United States NameSchurz Acquired6 April 1917 Commissioned15 September 1917 FateSunk 21 June 1918 after collision General characteristics Class and typeBussard-class unprotected cruiser Displacement Normal: 1,608 t (1,583 long tons) Full load: 1,918 t (1,888 long ton...

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Seni tradisional Jawa – berita · surat kabar · buku · cendekiawan · JSTOR Seni Tradisional Jawa secara sempit berarti karya seni yang diciptakan dan berasal dari Pulau Jawa, Indonesia. Beberapa contoh da...

Shopping mall in Georgia, United StatesNorth Point MallLocationAlpharetta, Georgia, United StatesCoordinates34°02′53″N 84°17′41″W / 34.048012°N 84.294738°W / 34.048012; -84.294738Opening dateOctober 20, 1993; 30 years ago (October 20, 1993)DeveloperHomart Development CompanyManagementTrademark PropertyOwnerTrademark PropertyNo. of stores and services138No. of anchor tenants4Total retail floor area1,327,313 sq ft (123,000 m2).No. ...

Ongoing COVID-19 viral pandemic in Peru You can help expand this article with text translated from the corresponding article in Spanish. (June 2020) Click [show] for important translation instructions. View a machine-translated version of the Spanish article. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the translation is accurate, rather than simply copy-pasting machine-...

Google Street ViewRilis perdana25 Mei 2007; 16 tahun lalu (2007-05-25)PlatformAndroid, iOS, webTersedia dalamBeberapa BahasaSitus webwww.google.com/streetview/ Cakupan Google Street View Google Street View merupakan sebuah fitur Google Maps yang diperkenalkan tahun 2007 dan menyediakan pemandangan jalan 360° dan membolehkan pengguna melihat bagian dari kota pilihan mereka dan wilayah metropolitan sekitarnya pada tingkat dasar. Ketika diluncurkan tanggal 25 Mei 2007, hanya lima kota yang...

Kembali kehalaman sebelumnya