Federated search

Federated search retrieves information from a variety of sources via a search application built on top of one or more search engines.[1] A user makes a single query request which is distributed to the search engines, databases or other query engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user. Federated search can be used to integrate disparate information resources within a single large organization ("enterprise") or for the entire web.

Federated search, unlike distributed search, requires centralized coordination of the searchable resources. This involves both coordination of the queries transmitted to the individual search engines and fusion of the search results returned by each of them.

Purpose

Federated search came about to meet the need of searching multiple disparate content sources with one query. This allows a user to search multiple databases at once in real time, arrange the results from the various databases into a useful form and then present the results to the user.

As such, it is an information aggregation or integration approach - it provides single point access to many information resources, and typically returns the data in a standard or partially homogenized form. Other approaches include constructing an Enterprise data warehouse, Data lake, or Data hub. Federated Search queries many times in many ways (each source is queried separately) where other approaches import and transform data many times, typically in overnight batch processes. Federated search provides a real-time view of all sources (to the extent they are all online and available).

In industrial search engines, such as LinkedIn, federated search is used to personalize vertical preference for ambiguous queries.[2] For instance, when a user issues a query like "machine learning" on LinkedIn, he or she could mean to search for people with machine learning skill, jobs requiring machine learning skill or content about the topic. In such cases, federated search could exploit user intent (e.g., hiring, job seeking or content consuming) to personalize the vertical order for each individual user.

Process

As described by Peter Jacso (2004[3]), federated searching consists of (1) transforming a query and broadcasting it to a group of disparate databases or other web resources, with the appropriate syntax, (2) merging the results collected from the databases, (3) presenting them in a succinct and unified format with minimal duplication, and (4) providing a means, performed either automatically or by the portal user, to sort the merged result set.

Federated search portals, either commercial or open access, generally search public access bibliographic databases, public access Web-based library catalogues (OPACs), Web-based search engines like Google and/or open-access, government-operated or corporate data collections. These individual information sources send back to the portal's interface a list of results from the search query. The user can review this hit list. Some portals will merely screen scrape the actual database results and not directly allow a user to enter the information source's application. More sophisticated ones will de-dupe the results list by merging and removing duplicates. There are additional features available in many portals, but the basic idea is the same: to improve the accuracy and relevance of individual searches as well as reduce the amount of time required to search for resources.

This process allows federated search some key advantages when compared with existing crawler-based search engines. Federated search need not place any requirements or burdens on owners of the individual information sources, other than handling increased traffic. Federated searches are inherently as current as the individual information sources, as they are searched in real time.

Implementation

federated search engine
Federating across three search engines

One application of federated searching is the metasearch engine. However, the metasearch approach does not overcome the shortcomings of the component search engines, such as incomplete indexes. Documents that are not indexed by search engines create what is known as the deep Web, or invisible Web. Google Scholar is one example of many projects trying to address this, by indexing electronic documents that search engines ignore. And the metasearch approach, like the underlying search engine technology, only works with information sources stored in electronic form.

One of the main challenges of metasearch, is ensuring that the search query is compatible with the component search engines that are being federated and combined. When the search vocabulary or data model of the search system is different from the data model of one or more of the foreign target systems, the query must be translated into each of the foreign target systems. This can be done using simple data-element translation or may require semantic translation. For example, if one search engine allows for quoting of exact strings or n-grams and another does not, the query must be translated to be compatible with each search engine. To translate a quoted exact string query, it can be broken down into a set of overlapping N-grams that are most likely to give the desired search results in each search engine.

Another challenge faced in the implementation of federated search engines is scalability. It is difficult to maintain the performance, the response speed, of a federated search engine as it combines more and more information sources together. One implementation of federated search that has begun to address this issue is WorldWideScience, hosted by the U.S. Department of Energy's Office of Scientific and Technical Information. WorldWideScience[4] is composed of more than 40 information sources, several of which are federated search portals themselves. One such portal is Science.gov[5] which itself federates more than 30 information sources representing most of the R&D output of the U.S. Federal government. Science.gov returns its highest ranked results to WorldWideScience, which then merges and ranks these results with the search returned by the other information sources that comprise WorldWideScience.[5] This approach of cascaded federated search enables large number of information sources to be searched via a single query.

Another application Sesam running in both Norway and Sweden has been built on top of an open sourced platform specialised for federated search solutions. Sesat,[6] an acronym for Sesam Search Application Toolkit, is a platform that provides much of the framework and functionality required for handling parallel and pipelined searches and displaying them elegantly in a user interface, allowing engineers to focus on the index/database configuration tuning.

To personalize vertical orders in federated search, LinkedIn search engine[2] exploits the searcher's profile and recent activities to infer his or her intent, such as hiring, job seeking and content consuming, then uses the intent, along with many other signals, to rank vertical orders that are personally relevant to the individual searcher.

SWIRL Search[7] is an open source federated search engine, released under the Apache 2.0 license. It includes pre-built connectors to popular open source search engines, and re-ranks results using cosine vector similarity.

Challenges

Federated searches present a number of significant challenges, as compared with conventional, single-source searches:

1. Passing of credentials

When federated search is performed against secure data sources, the users' credentials must be passed on to each underlying search engine, so that appropriate security is maintained. If the user has different login credentials for different systems, there must be a means to map their login ID to each search engine's security domain.[8]

2. Mapping results list navigators into a common form

Suppose three real-estate sites are searched, each provides a list of hyperlinked city names to click on, to see matches only in each city. Ideally these facets would be combined into one set, but that presents additional technical challenges.[9] The system also needs to understand "next page" links if it's going to allow the user to page through the combined results.

Some of this challenge of mapping to a common form can be solved if the federated resources support linked open data via RDF. Ontologies (rules) can be added to map results to common forms using that technology.

3. Sorting and scoring results

Each web resource has its own notion of relevance score, and may support some sorted results orders. Relevance varies greatly among "federates" in the search, so knowing how to interleave results to show the most relevant is difficult or impossible.

4. Robust query

Federated search may have to restrict itself to the minimal set of query capabilities that are common to all federates. E.g. if Google supports negation and quoted phrases, but science.gov does not, it will be impossible for the federated search to support negated, quoted phrases.

5. Availability and timeout

As the number of federates (federated sources) grows, the likelihood of one or more slow or offline federates becomes high. The federated search must decide when to consider a federate offline, or wait for a slow response. Response times will be dictated by the slowest federate of the bunch.

6. Development and testing within an enterprise (vs. on the public internet)

Development groups should typically not hit live, production systems as they do regular work, much less intensive load testing. Also, some resources are secure, and should not be arbitrarily queried and exposed in development due to privacy and security concerns. Therefore, the development, testing and performance test environments must include installation and configuration for many sub-systems to allow safe, secure testing.

For the overall federated system to be HA/DR, every sub-system must be HA/DR.

Similarly, performance modeling and capacity planning for the federated system requires modeling, planning and sometimes expansion of all federates.

For all of the above reasons, within an enterprise, a data hub or data lake may be preferable, or a hybrid approach. Data hubs and lakes simplify development and access, but may incur some time lag before data is available (without special synchronizing logic). On the web, federation is more typical.

See also

References

  1. ^ "What is Federated Search?". Coveo Blog. Coveo. 16 June 2020. Retrieved June 29, 2020.
  2. ^ a b Arya, Dhruv; Ha-Thuc, Viet; Sinha, Shakti (2015). "Personalized Federated Search at LinkedIn". Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM). pp. 1699–1702. arXiv:1602.04924. doi:10.1145/2806416.2806615. ISBN 9781450337946.
  3. ^ Thoughts About Federated Searching. Jacsó, Péter, Information Today, Oct 2004, Vol. 21, Issue 9
  4. ^ WorldWideScience
  5. ^ a b Science.gov
  6. ^ "Sesat". Archived from the original on 2015-07-20. Retrieved 2019-08-17.
  7. ^ "SWIRL SEARCH". GitHub. Retrieved 2022-09-08.
  8. ^ Mapping Security Requirements to Enterprise Search
  9. ^ 20+ Differences Between Internet vs. Enterprise Search - part 1

Further reading

Read other articles:

Iwan Ma'ruf ZainudinKomandan Korem 043/Garuda HitamPetahanaMulai menjabat 29 Maret 2023PendahuluRuslan EffendyPa Sahli Tingkat II Kasad Bidang SiberMasa jabatan4 Augustus 2022 – 29 Maret 2023PendahuluYusran YunusPenggantiYogi Gunawan Informasi pribadiLahir0 September 1968 (umur 55)IndonesiaSuami/istriRatna Iwan Ma'rufAlma materAkademi Militer (1991)Karier militerPihak IndonesiaDinas/cabang TNI Angkatan DaratMasa dinas1991—sekarangPangkat Brigadir Jenderal T...

 

Fandango Media, LLCFandango headquarters in Beverly HillsSebelumnyaticketmakers.comJenisKerjasamaDidirikan27 April 2000; 23 tahun lalu (2000-04-27)KantorpusatBeverly Hills, California, U.S.JasaMedia daringPemilikNBCUniversal (70%)Warner Bros. Discovery (30%)DivisiFandango MovieclipsFandangoNOWMovies.comMovieTickets.comRotten TomatoesFandango LatamVuduSitus webfandango.comCatatan kaki / referensi[1][2][3][4] Fandango Media, LLC adalah sebuah perusahaan...

 

Pertunjukan Dewi Theater spesial kelulusan Jessica VerandaTanggal25 Mei 2017 Pertunjukan Terakhir spesial kelulusan Jessica Vania(12 Maret 2017) '''Pertunjukan Dewi Theater spesial kelulusan Jessica Veranda(25 Mei 2017) - LokasiTeater JKT48, lantai 4 FX Sudirman, Jakarta SelatanPartisipanAnggota JKT48 Tim J formasi lama alias Team J Era Shania Periode I Acara Pertunjukan Dewi Theater spesial kelulusan Jessica Veranda diadakan di Teater JKT48 pada 25 Mei 2017.[1] Pertunjukan ini menamp...

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (أبريل 2019) جوشوا سيمون   معلومات شخصية الميلاد سنة 1979 (العمر 43–44 سنة)  يافا  مواطنة إسرائيل  الحياة العملية المواضيع فن  المهنة صحفي،  ومخرج أفلام،  ...

 

Condado de Conversano El Condado de Conversano en el siglo XVIICoordenadas 40°58′00″N 17°07′00″E / 40.966669444444, 17.116669444444Capital ConversanoEntidad Entidad desaparecidaIdioma oficial Latín, italiano • Otros idiomas Dialecto apulo-bareseSuperficie   • Total 200 km²Habitantes 2.500 (siglo XVII)Religión CatolicismoFundación 1054Desaparición 1806Forma de gobierno Monarquía absoluta[editar datos en Wikidata] El Condado...

 

AAFC AFHC Tipo Instituto de investigaciónIndustria Investigación hortícola en la región del Atlántico en Canadá.Fundación 1911, « Kentville Research Centre »Sede central , CanadáMinister Honourable Marie-Claude Bibeau Parliamentary Secretary, Francis DrouinPropietario Gobierno de CanadáEmpleados 120Empresa matriz [1]Coordenadas 45°04′08″N 64°28′41″O / 45.068888888889, -64.478055555556Sitio web Página oficial[editar datos en Wikidata] E...

Map all coordinates using: OpenStreetMap Download coordinates as: KML GPX (all coordinates) GPX (primary coordinates) GPX (secondary coordinates) Cadastral in New South Wales, AustraliaGordonNew South WalesLocation in New South Wales Lands administrative divisions around Gordon: Narromine Lincoln Bligh Kennedy Gordon Wellington Ashburnham Ashburnham Wellington Gordon County is one of the 141 Cadastral divisions of New South Wales. The Macquarie River is the north-eastern boundary. Gordon Coun...

 

This is a list of medalists from the ICF Canoe Sprint World Championships in women's kayak. K-1 200 m Debuted: 1994. Games Gold Silver Bronze 1994 Mexico City  Rita Kőbán (HUN)  Anna Olsson (SWE)  Caroline Brunet (CAN) 1995 Duisburg  Rita Kőbán (HUN)  Caroline Brunet (CAN)  Anna Olsson (SWE) 1997 Dartmouth  Caroline Brunet (CAN)  Josefa Idem (ITA)  Jacqui Mengler (AUS) 1998 Szeged  Caroline Brun...

 

حقي توفيق المفتي حقي المفتي يوم تخرجه من الكلية العسكرية برتبة ملازم ثاني في عام 1936م. معلومات شخصية الميلاد 1330 هـ/1912م المملكة العراقية /عنة الوفاة 1424 هـ/2004م العراق /بغداد الجنسية عراقي اللقب آل عريم العاني الأب الرحالة توفيق الفراتي أخ الدكتور قاسم المفتي الحياة العملي

Morton Grove Morton Grove Bahnhof Lage in Illinois Morton Grove (Illinois) Morton Grove Basisdaten Gründung: 1895 Staat: Vereinigte Staaten Bundesstaat: Illinois County: Cook County Koordinaten: 42° 2′ N, 87° 47′ W42.041111111111-87.786388888889190Koordinaten: 42° 2′ N, 87° 47′ W Zeitzone: Central (UTC−6/−5) Einwohner: – Metropolregion: 25.297 (Stand: 2020) 9.443.356 (Stand: 2005) Haushalte: 8.786 (Stand: 2020) Fl...

 

إتش إم إكس إتش إم إكس إتش إم إكس التسمية المفضلة للاتحاد الدولي للكيمياء البحتة والتطبيقية 7،5،3،1-تيترانيترو-7،5،3،1-تيترازوسين أسماء أخرى أوكتاهيدرو-7،5،3،1-تيترانيترو-7،5،3،1-تيترازوسينOctahydro-1,3,5,7-tetranitro-1,3,5,7-tetrazocine المعرفات رقم CAS 2691-41-0 Y بوب كيم (PubChem) 17596 مواصفات الإدخال النصي

 

Big ManGenreMelodrama, KeteganganDitulis olehChoi Jin-wonSutradaraJi Young-sooPemeranKang Ji-hwan Choi Daniel Lee Da-hee Jung So-minNegara asalKorea SelatanBahasa asliKoreaJmlh. episode16ProduksiProduser eksekutifJung Hae-ryongProduserJi Byung-hyun Park Woo-ramLokasi produksiKoreaRumah produksiKim Jong-hak Production KBS MediaRilisJaringan asliKorean Broadcasting SystemRilis asli28 April (2014-04-28) –17 Juni 2014 (2014-6-17)Pranala luarSitus web Big Man (Hangul: 빅...

President of the Chamber of DeputiesPresidente da Câmara dos DeputadosLogo of the ChamberIncumbentArthur Lirasince 1 February 2021National Congress Style Mr. President, or even simply President(informal) The Most Excellent Mr. President of the Chamber(formal) His Excellency(alternative formal, diplomatic) StatusPresiding officerMember ofNational Defense CouncilCouncil of the RepublicResidenceLago Sul, Brasília, Federal DistrictSeatNational Congress, Brasília, Federal DistrictAppointer...

 

Black evening or cocktail dress For other uses, see Little Black Dress (disambiguation). A little black dress from 1964 worn by Anneke Grönloh at Eurovision 1964 The little black dress (LBD) is a black evening or cocktail dress, cut simply and often quite short. Fashion historians ascribe the origins of the little black dress to the 1920s designs of Coco Chanel.[1] It is intended to be long-lasting, versatile, affordable, and widely accessible. Its ubiquity is such that it is often s...

 

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Oktober 2016. Pancalomba pertama kali tercatat di Yunani Kuno, melibatkan cabang Lempar cakram dan Lempar lembing. Pancalomba merupakan suatu olahraga multi-cabang yang terdiri dari lima disiplin. Pancalomba pertama tercatat di Yunani Kuno dan merupakan bagian dari ...

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Windows Contacts – news · newspapers · books · scholar · JSTOR (August 2014) (Learn how and when to remove ...

 

2017 film directed by Jamal Hill DeucesFilm release posterDirected byJamal HillWritten by Jamal Hill Curtis Bryant Produced by Otis Best Justin Moritt Ron Robinson Starring Larenz Tate Meagan Good Lance Gross Siya Productioncompanies Flavor Unit Entertainment BET Distributed byNetflixRelease date April 1, 2017 (2017-04-01) Running time87 minutesCountryUnited StatesLanguageEnglish Deuces is an American crime drama written and directed by Jamal Hill. The film stars Larenz Tate, M...

 

Rowing was an African Games event at its inaugural edition in 2007 in Algiers, Algeria. The second African Games rowing regatta took place at the 2019 African Games in Rabat, Morocco. Editions Games Host city Host country Top country Rowing events 2007 All-Africa Games (9th edition) Algiers  Algeria  Algeria 8 (details) 2019 African Games (12th edition) Rabat  Morocco  Algeria 9 (details) External links World Rowing e-Magazine (worldrowingmagazine.com) vteInternational row...

Iranian techno and house DJ and producer For other uses, see Sharam (disambiguation). SharamBackground informationBirth nameSharam TayebiBorn (1970-08-12) August 12, 1970 (age 53)Tehran, IranGenresElectronic, houseOccupation(s)DJ, remixer, producerYears active1992–presentLabelsUltra, Yoshitoshi, Spinnin'Websitewww.sharam.comMusical artist Sharam Tayebi (Persian: شهرام طیبی, born August 12, 1970),[1] better known as Sharam, is an Iranian techno and house DJ and producer...

 

French-American conductor Maazel in 2003 Lorin Varencove Maazel (/məˈzɛl/,[1] March 6, 1930 – July 13, 2014) was an American conductor, violinist and composer. He began conducting at the age of eight and by 1953 had decided to pursue a career in music. He had established a reputation in the concert halls of Europe by 1960 but, by comparison, his career in the U.S. progressed far more slowly. He served as music director of The Cleveland Orchestra, Orchestre National de France...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!