Share to: share facebook share twitter share wa share telegram print page

Web query classification

A Web query topic classification/categorization is a problem in information science. The task is to assign a Web search query to one or more predefined categories, based on its topics. The importance of query classification is underscored by many services provided by Web search. A direct application is to provide better search result pages for users with interests of different categories. For example, the users issuing a Web query "apple" might expect to see Web pages related to the fruit apple, or they may prefer to see products or news related to the computer company. Online advertisement services can rely on the query classification results to promote different products more accurately. Search result pages can be grouped according to the categories predicted by a query classification algorithm. However, the computation of query classification is non-trivial. Different from the document classification tasks, queries submitted by Web search users are usually short and ambiguous; also the meanings of the queries are evolving over time. Therefore, query topic classification is much more difficult than traditional document classification tasks.

Difficulties

Web query topic classification is to automatically assign a query to some predefined categories. Different from the traditional document classification tasks, there are several major difficulties which hinder the progress of Web query understanding:

Derive an appropriate feature representation for Web queries

Many queries are short and query terms are noisy. As an example, in the KDDCUP 2005 dataset, queries containing 3 words are most frequent (22%). Furthermore, 79% queries have no more than 4 words. A user query often has multiple meanings. For example, "apple" can mean a kind of fruit or a computer company. "Java" can mean a programming language or an island in Indonesia. In the KDDCUP 2005 dataset, most of the queries contain more than one meaning. Therefore, only using the keywords of the query to set up a vector space model for classification is not appropriate.

Query-enrichment based methods[1][2] start by enriching user queries to a collection of text documents through search engines. Thus, each query is represented by a pseudo-document which consists of the snippets of top ranked result pages retrieved by search engine. Subsequently, the text documents are classified into the target categories using synonym based classifier or statistical classifiers, such as Naive Bayes (NB) and Support Vector Machines (SVMs).

Adapting to changes of the queries and categories over time

The meanings of queries may also evolve over time. Therefore, the old labeled training queries may be out-of-data and useless soon. How to make the classifier adaptive over time becomes a big issue. For example, the word "Barcelona" has a new meaning of the new micro-processor of AMD, while it refers to a city or football club before 2007. The distribution of the meanings of this term is therefore a function of time on the Web.

Intermediate taxonomy based method[3] first builds a bridging classifier on an intermediate taxonomy, such as Open Directory Project (ODP), in an offline mode. This classifier is then used in an online mode to map user queries to the target categories via the intermediate taxonomy. The advantage of this approach is that the bridging classifier needs to be trained only once and is adaptive for each new set of target categories and incoming queries.

Using unlabeled query logs to help with query classification

Since the manually labeled training data for query classification is expensive, how to use a very large web search engine query log as a source of unlabeled data to aid in automatic query classification becomes a hot issue. These logs record the Web users' behavior when they search for information via a search engine. Over the years, query logs have become a rich resource which contains Web users' knowledge about the World Wide Web.

Query clustering method[4] tries to associate related queries by clustering "session data", which contain multiple queries and click-through information from a single user interaction. They take into account terms from result documents that a set of queries has in common. The use of query keywords together with session data is shown to be the most effective method of performing query clustering.

Selectional preference based method[5] tries to exploit some association rules between the query terms to help with the query classification. Given the training data, they exploit several classification approaches including exact-match using labeled data, N-Gram match using labeled data and classifiers based on perception. They emphasize on an approach adapted from computational linguistics named selectional preferences. If x and y form a pair (x; y) and y belongs to category c, then all other pairs (x; z) headed by x belong to c. They use unlabeled query log data to mine these rules and validate the effectiveness of their approaches on some labeled queries.

Applications

  • Metasearch engines send a user's query to multiple search engines and blend the top results from each into one overall list. The search engine can organize the large number of Web pages in the search results, according to the potential categories of the issued query, for the convenience of Web users' navigation.
  • Vertical search, compared to general search, focuses on specific domains and addresses the particular information needs of niche audiences and professions. Once the search engine can predict the category of information a Web user is looking for, it can select a certain vertical search engine automatically, without forcing the user to access the vertical search engine explicitly.
  • Online advertising[6][7] aims at providing interesting advertisements to Web users during their search activities. The search engine can provide relevant advertising to Web users according to their interests, so that the Web users can save time and effort in research while the advertisers can reduce their advertising costs.

All these services rely on the understanding Web users' search intents through their Web queries.

See also

References

  1. ^ Shen et al. "Q2C@UST: Our Winning Solution to Query Classification". ACM SIGKDD Exploration, December 2005, Volume 7, Issue 2.
  2. ^ Shen et al. "Query Enrichment for Web-query Classification". ACM TOIS, Vol. 24, No. 3, July 2006.
  3. ^ Shen et al. "Building bridges for web query classification". ACM SIGIR, 2006.
  4. ^ Wen et al. "Query Clustering Using User Logs", ACM TOIS, Volume 20, Issue 1, January 2002.
  5. ^ Beitzel et al. "Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs", ACM TOIS, Volume 25, Issue 2, April 2007.
  6. ^ Data Mining and Audience Intelligence for Advertising (ADKDD'07), KDD workshop 2007
  7. ^ Targeting and Ranking for Online Advertising (TROA'08), WWW workshop 2008

Further reading

Read other articles:

Peruvian footballer (born 1952) Rubén Toribio Díaz Díaz in 1978Personal informationFull name Ruben Toribio Díaz RivasDate of birth (1952-04-17) April 17, 1952 (age 71)Place of birth Lima, Lima Province, PeruHeight 1.77 m (5 ft 9+1⁄2 in)Position(s) DefenderSenior career*Years Team Apps (Gls)1972–1973 Deportivo Municipal 1974–1976 Universitario 1977–1986 Sporting Cristal International career1972–1985 Peru 89 (2) *Club domestic league appearances and goals, ...

NHK紅白歌合戦 > 第13回NHK紅白歌合戦 第13回NHK紅白歌合戦 会場の東京宝塚劇場(写真は太平洋戦争以前)ジャンル 大型音楽番組司会者 森光子(紅組)宮田輝アナウンサー(白組)石井鐘三郎アナウンサー(総合)出演者 #出場歌手参照審査員 #審査員参照オープニング 『乾杯の歌』エンディング 『蛍の光』国・地域 日本言語 日本語製作制作 NHK 放送放送チャンネル...

Peta infrastruktur dan tata guna lahan di Komune Chailly-en-Bière.  = Kawasan perkotaan  = Lahan subur  = Padang rumput  = Lahan pertanaman campuran  = Hutan  = Vegetasi perdu  = Lahan basah  = Anak sungaiChailly-en-BièreNegaraPrancisArondisemenMelunKantonPerthesAntarkomuneCommunauté de communes du Pays de BièrePemerintahan • Wali kota (2008-2014) Henri Lebarq • Populasi12.129Kode INSEE/pos77069 / 2 Population sans doubl...

Ben Johnson Información personalNacimiento 24 de agosto de 1946 (77 años)Llandudno (Reino Unido) Nacionalidad Británica y galesaInformación profesionalOcupación Pintor y grabador [editar datos en Wikidata] Ben Johnson (24 de agosto de 1946) es un pintor británico, conocido por su serie de grandes y detallados paisajes urbanos.[1]​ Biografía 'Acercándose al Mirador', Ben Johnson, 2013, Acrílico sobre Lienzo, 225 x 150 cm Nació en Llandudno, Gales, en 1946. Estudió en...

Gymnasium Antonianum Vechta Schulform Gymnasium Gründung 1719 Adresse Willohstraße 19 Ort 49377 Vechta Land Niedersachsen Staat Deutschland Koordinaten 52° 43′ 28″ N, 8° 17′ 22″ O52.7244694444448.2893694444444Koordinaten: 52° 43′ 28″ N, 8° 17′ 22″ O Träger Landkreis Vechta Schüler 933 Lehrkräfte 90 Leitung Inge Wenzel Website www.antonianum-vechta.de Das Gymnasium Antonianum Vechta (GAV) ist ein allgemeinbild...

Australian judge The Right HonourableSir Frank KittoAC, KBE, QCHigh Court in 1952, Kitto far right, back rowJustice of the High Court of AustraliaIn office10 May 1950 – 1 August 1970Nominated byRobert MenziesPreceded bySir George RichSucceeded bySir Harry Gibbs Personal detailsBorn30 July 1903Melbourne, Victoria, AustraliaDied15 February 1994Armidale, New South Wales, Australia Sir Frank Walters Kitto, AC, KBE, QC (30 July 1903 – 15 February 19...

Name of several superheroes in DC Comics The Fury of Firestorm redirects here. For the Flash episode, see The Fury of Firestorm (The Flash). This article is about the superhero. For other uses, see Firestorm (disambiguation). FirestormThe Ronnie Raymond/Martin Stein version of Firestorm by Yıldıray ÇınarPublication informationPublisherDC ComicsFirst appearanceFirestorm the Nuclear Man #1 (March 1978)Created by Gerry Conway Al Milgrom In-story informationAlter ego Ronald Roy Ronnie Raymond...

Japanese aircraft manufacturer and aviation engine manufacturer throughout World War II Nakajima Aircraft CompanyFounded1918FounderChikuhei NakajimaDefunct1945SuccessorFuji Heavy Industries (Subaru Corporation)HeadquartersTokyo, Japan Founder, Chikuhei Nakajima The Nakajima Aircraft Company (中島飛行機株式会社, Nakajima Hikōki Kabushiki Kaisha) was a prominent Japanese aircraft manufacturer and aviation engine manufacturer throughout World War II. It continues as the car and aircraf...

غرب البوسنة مقاطعة غرب البوسنة المستقلة Autonomna Pokrajina Zapadna Bosna 1993 – 1995 مقاطعة غرب البوسنة المستقلةFlag (1993–95) مقاطعة غرب البوسنة المستقلةشعار خريطة توضح موقع غرب البوسنة (سماوي) بينهماجمهورية كرايينا الصربية وجمهورية صرب البوسنة. عاصمة فليكا كلادوشا  نظام الحكم غير محدّ

Always You / Innocent LiesSingel oleh Schiller bersama Anggundari album AtemlosDirilis2010FormatCD singleDirekam2010GenrePopDurasi4:37LabelIsland RecordsPenciptaAnggun, Christopher von DeylenProduserSchiller Always You adalah sebuah lagu karya musisi Jerman Schiller dengan vokal utama oleh penyanyi Indonesia Anggun C. Sasmi. Lagu ini diciptakan oleh Anggun dan Christopher von Deylen untuk album Schiller bertajuk Atemlos (2010). Lagu ini dirilis sebagai singel double-A side dengan lagu Innocen...

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Desember 2022. Míla MyslíkováLahir(1933-02-14)14 Februari 1933Třebíč, CekoslowakiaMeninggal11 Februari 2005(2005-02-11) (umur 71)Praha, Republik CekoPekerjaanPemeranTahun aktif1954-1993 Bohumila Míla Myslíková (14 Februari 1933 – 1...

American politician For the Scottish footballer, see Steve Kerrigan (footballer). Steve KerriganChair of the Massachusetts Democratic PartyIncumbentAssumed office April 24, 2023Preceded byGus BickfordChief Executive Officer of the Democratic National ConventionIn office2009–2012Preceded byLeah D. DaughtrySucceeded byAmy Dacey Personal detailsBornStephen Kerrigan (1971-09-17) September 17, 1971 (age 52)Lancaster, Massachusetts, U.S.Political partyDemocraticSpouse Jacob Watts ...

Mystery novella by Rex Stout Murder Is CornyShort story by Rex StoutCorn prepared for roastingCountryUnited StatesLanguageEnglishGenre(s)Detective fictionPublicationPublished inTrio for Blunt InstrumentsPublisherViking PressMedia typeHardcoverPublication dateApril 24, 1964SeriesNero Wolfe Murder Is Corny is a Nero Wolfe mystery novella by Rex Stout, first published in April 1964 in the short-story collection Trio for Blunt Instruments (Viking Press). It was the last Nero Wolfe novella to be w...

Hill fort in Telangana, India Elgandal FortKarimnagar, Telangana Elgandal FortElgandal FortCoordinates18°25′15″N 79°02′33″E / 18.420751°N 79.042601°E / 18.420751; 79.042601TypeFortSite informationOpen tothe publicYesSite historyBuilt1083-1323MaterialsStone Elgandal Fort is situated amidst palm groves on the banks of the Manair River (a tributary of the Godavari River), approximately 10 kilometres (6.2 mi) from Karimnagar on the Sircilla Road ...

Philippine television series This article is about the 1992 television series. For the 2012 series, see Valiente (2012 TV series). For the surname, see Doreen Valiente. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Valiente 1992 TV series – news · newspapers · books · scholar · JSTOR (October 2021) (L...

S-16 Sikorsky S-16 circa 1915 Role FighterType of aircraft National origin Russian Empire Manufacturer RBVZ (Russo-Baltic Wagon Works) Designer Igor Sikorsky First flight 6 February 1915 Introduction January 1916 Retired 1923 Primary users Imperial Russian Air ServiceSoviet Air Force The Sikorsky S-16, or RBVZ S-XVI (named after its manufacturer), was a Russian equi-span single-bay two-seat biplane designed by Igor Sikorsky in 1914-15. Conceived in response to demand for an escort fighte...

Ir. Soekarno sebagai arsitek atau Bung Karno adalah presiden pertama Indonesia yang juga dikenal sebagai arsitek alumni dari Technische Hoogeschool te Bandoeng (sekarang ITB) di Bandung dengan mengambil jurusan teknik sipil dan tamat pada tahun 1926. [cat. 1][cat. 2][1] Pekerjaan Ir. Soekarno pada tahun 1926 mendirikan biro insinyur bersama Ir. Anwari, banyak mengerjakan rancang bangun bangunan. Selanjutnya bersama Ir. Rooseno juga merancang dan membangun rumah-rumah d...

National flag Republic of KenyaUseNational flag, civil and state ensign Proportion2:3Adopted12 December 1963; 59 years ago (1963-12-12)DesignA horizontal tricolour of black, white-edged red, and green with two crossed white spears behind a red, white, and black Maasai shield Kenyan flag at Lodwar Airport Kenyan flags at the Kenyatta Mausoleum The flag of Kenya (Swahili: Bendera ya Kenya) is a tricolour of black, red, and green with two white edges imposed with a red, white a...

Doku redirects here. For the Belgian footballer, see Jérémy Doku. Japanese professional wrestler and actress Kairi SaneKairi in 2019Birth nameKaori Housako (宝迫 香織, Hōsako Kaori)Born (1988-09-23) September 23, 1988 (age 35)Hikari, Yamaguchi Prefecture, JapanAlma materHosei University[1]Professional wrestling careerRing name(s)Doku[2]Kairi[3] Kairi Hojo[4]Kairian[5]Kairian 3.0[6]Kairi Sane[7]Pirates Kaiser[8]Billed ...

One Flew SouthBackground informationOriginNashville, Tennessee, United StatesGenresCountryYears active2005-2009[1]LabelsDeccaPast membersEddie BushRoyal ReedChris RobertsWebsitehttp://www.oneflewsouth.com/ One Flew South was an American country music group composed of Eddie Bush, Chris Roberts, and Royal Reed, all three of whom sing lead vocals and play acoustic guitar. The group's first recording was a song for the soundtrack to the 2006 Disney animated film The Fox and the Hound 2. ...

Kembali kehalaman sebelumnya