Share to: share facebook share twitter share wa share telegram print page

Federated search

Federated search retrieves information from a variety of sources via a search application built on top of one or more search engines.[1] A user makes a single query request which is distributed to the search engines, databases or other query engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user. Federated search can be used to integrate disparate information resources within a single large organization ("enterprise") or for the entire web.

Federated search, unlike distributed search, requires centralized coordination of the searchable resources. This involves both coordination of the queries transmitted to the individual search engines and fusion of the search results returned by each of them.

Purpose

Federated search came about to meet the need of searching multiple disparate content sources with one query. This allows a user to search multiple databases at once in real time, arrange the results from the various databases into a useful form and then present the results to the user.

As such, it is an information aggregation or integration approach - it provides single point access to many information resources, and typically returns the data in a standard or partially homogenized form. Other approaches include constructing an Enterprise data warehouse, Data lake, or Data hub. Federated Search queries many times in many ways (each source is queried separately) where other approaches import and transform data many times, typically in overnight batch processes. Federated search provides a real-time view of all sources (to the extent they are all online and available).

In industrial search engines, such as LinkedIn, federated search is used to personalize vertical preference for ambiguous queries.[2] For instance, when a user issues a query like "machine learning" on LinkedIn, he or she could mean to search for people with machine learning skill, jobs requiring machine learning skill or content about the topic. In such cases, federated search could exploit user intent (e.g., hiring, job seeking or content consuming) to personalize the vertical order for each individual user.

Process

As described by Peter Jacso (2004[3]), federated searching consists of (1) transforming a query and broadcasting it to a group of disparate databases or other web resources, with the appropriate syntax, (2) merging the results collected from the databases, (3) presenting them in a succinct and unified format with minimal duplication, and (4) providing a means, performed either automatically or by the portal user, to sort the merged result set.

Federated search portals, either commercial or open access, generally search public access bibliographic databases, public access Web-based library catalogues (OPACs), Web-based search engines like Google and/or open-access, government-operated or corporate data collections. These individual information sources send back to the portal's interface a list of results from the search query. The user can review this hit list. Some portals will merely screen scrape the actual database results and not directly allow a user to enter the information source's application. More sophisticated ones will de-dupe the results list by merging and removing duplicates. There are additional features available in many portals, but the basic idea is the same: to improve the accuracy and relevance of individual searches as well as reduce the amount of time required to search for resources.

This process allows federated search some key advantages when compared with existing crawler-based search engines. Federated search need not place any requirements or burdens on owners of the individual information sources, other than handling increased traffic. Federated searches are inherently as current as the individual information sources, as they are searched in real time.

Implementation

federated search engine
Federating across three search engines

One application of federated searching is the metasearch engine. However, the metasearch approach does not overcome the shortcomings of the component search engines, such as incomplete indexes. Documents that are not indexed by search engines create what is known as the deep Web, or invisible Web. Google Scholar is one example of many projects trying to address this, by indexing electronic documents that search engines ignore. And the metasearch approach, like the underlying search engine technology, only works with information sources stored in electronic form.

One of the main challenges of metasearch, is ensuring that the search query is compatible with the component search engines that are being federated and combined. When the search vocabulary or data model of the search system is different from the data model of one or more of the foreign target systems, the query must be translated into each of the foreign target systems. This can be done using simple data-element translation or may require semantic translation. For example, if one search engine allows for quoting of exact strings or n-grams and another does not, the query must be translated to be compatible with each search engine. To translate a quoted exact string query, it can be broken down into a set of overlapping N-grams that are most likely to give the desired search results in each search engine.

Another challenge faced in the implementation of federated search engines is scalability. It is difficult to maintain the performance, the response speed, of a federated search engine as it combines more and more information sources together. One implementation of federated search that has begun to address this issue is WorldWideScience, hosted by the U.S. Department of Energy's Office of Scientific and Technical Information. WorldWideScience[4] is composed of more than 40 information sources, several of which are federated search portals themselves. One such portal is Science.gov[5] which itself federates more than 30 information sources representing most of the R&D output of the U.S. Federal government. Science.gov returns its highest ranked results to WorldWideScience, which then merges and ranks these results with the search returned by the other information sources that comprise WorldWideScience.[5] This approach of cascaded federated search enables large number of information sources to be searched via a single query.

Another application Sesam running in both Norway and Sweden has been built on top of an open sourced platform specialised for federated search solutions. Sesat,[6] an acronym for Sesam Search Application Toolkit, is a platform that provides much of the framework and functionality required for handling parallel and pipelined searches and displaying them elegantly in a user interface, allowing engineers to focus on the index/database configuration tuning.

To personalize vertical orders in federated search, LinkedIn search engine[2] exploits the searcher's profile and recent activities to infer his or her intent, such as hiring, job seeking and content consuming, then uses the intent, along with many other signals, to rank vertical orders that are personally relevant to the individual searcher.

SWIRL Search[7] is an open source federated search engine, released under the Apache 2.0 license. It includes pre-built connectors to popular open source search engines, and re-ranks results using cosine vector similarity.

Challenges

Federated searches present a number of significant challenges, as compared with conventional, single-source searches:

1. Passing of credentials

When federated search is performed against secure data sources, the users' credentials must be passed on to each underlying search engine, so that appropriate security is maintained. If the user has different login credentials for different systems, there must be a means to map their login ID to each search engine's security domain.[8]

2. Mapping results list navigators into a common form

Suppose three real-estate sites are searched, each provides a list of hyperlinked city names to click on, to see matches only in each city. Ideally these facets would be combined into one set, but that presents additional technical challenges.[9] The system also needs to understand "next page" links if it's going to allow the user to page through the combined results.

Some of this challenge of mapping to a common form can be solved if the federated resources support linked open data via RDF. Ontologies (rules) can be added to map results to common forms using that technology.

3. Sorting and scoring results

Each web resource has its own notion of relevance score, and may support some sorted results orders. Relevance varies greatly among "federates" in the search, so knowing how to interleave results to show the most relevant is difficult or impossible.

4. Robust query

Federated search may have to restrict itself to the minimal set of query capabilities that are common to all federates. E.g. if Google supports negation and quoted phrases, but science.gov does not, it will be impossible for the federated search to support negated, quoted phrases.

5. Availability and timeout

As the number of federates (federated sources) grows, the likelihood of one or more slow or offline federates becomes high. The federated search must decide when to consider a federate offline, or wait for a slow response. Response times will be dictated by the slowest federate of the bunch.

6. Development and testing within an enterprise (vs. on the public internet)

Development groups should typically not hit live, production systems as they do regular work, much less intensive load testing. Also, some resources are secure, and should not be arbitrarily queried and exposed in development due to privacy and security concerns. Therefore, the development, testing and performance test environments must include installation and configuration for many sub-systems to allow safe, secure testing.

For the overall federated system to be HA/DR, every sub-system must be HA/DR.

Similarly, performance modeling and capacity planning for the federated system requires modeling, planning and sometimes expansion of all federates.

For all of the above reasons, within an enterprise, a data hub or data lake may be preferable, or a hybrid approach. Data hubs and lakes simplify development and access, but may incur some time lag before data is available (without special synchronizing logic). On the web, federation is more typical.

See also

References

  1. ^ "What is Federated Search?". Coveo Blog. Coveo. 16 June 2020. Retrieved June 29, 2020.
  2. ^ a b Arya, Dhruv; Ha-Thuc, Viet; Sinha, Shakti (2015). "Personalized Federated Search at LinkedIn". Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM). pp. 1699–1702. arXiv:1602.04924. doi:10.1145/2806416.2806615. ISBN 9781450337946.
  3. ^ Thoughts About Federated Searching. Jacsó, Péter, Information Today, Oct 2004, Vol. 21, Issue 9
  4. ^ WorldWideScience
  5. ^ a b Science.gov
  6. ^ "Sesat". Archived from the original on 2015-07-20. Retrieved 2019-08-17.
  7. ^ "SWIRL SEARCH". GitHub. Retrieved 2022-09-08.
  8. ^ Mapping Security Requirements to Enterprise Search
  9. ^ 20+ Differences Between Internet vs. Enterprise Search - part 1

Further reading

Read other articles:

American actor (1907–1985) Rudd WeatherwaxWeatherwax and Lassie, 1955BornRuddell Bird Weatherwax(1907-09-23)September 23, 1907Engle, New Mexico TerritoryDiedFebruary 25, 1985(1985-02-25) (aged 77)Other namesRuddel Weatherwax Ruddell Bird Rudd Weatherwax (September 23, 1907 – February 25, 1985) was an American actor, animal trainer, and breeder. He and his brother Frank are best remembered for training dogs for motion pictures and television.[1] Their collie, Pal...

Rahm Stadt Dortmund Koordinaten: 51° 31′ N, 7° 23′ O51.5233333333337.387222222222283Koordinaten: 51° 31′ 24″ N, 7° 23′ 14″ O Höhe: 83 m ü. NHN Fläche: 1,86 km² Einwohner: 1131 (31. Dez. 2018) Bevölkerungsdichte: 608 Einwohner/km² Eingemeindung: 10. Juni 1914 Postleitzahlen: 44369, 44379 Vorwahl: 0231 Unterbezirk: 832 Karte Lage des statistischen Bezirks Jungferntal-Rahm in Dortmund Evang...

Malta Kapitän Matthew Asciak Aktuelles ITF-Ranking 110 Statistik Erste Teilnahme 1986 Davis-Cup-Teilnahmen 24 Bestes Ergebnis Europa/Afrika-Gruppenzone II Ewige Bilanz 27:53 Erfolgreichste Spieler Meiste Siege gesamt Gordon Asciak (37) Meiste Einzelsiege Gordon Asciak (17) Meiste Doppelsiege Gordon Asciak (20) Bestes Doppel Gordon Asciak / Mark Schembri (13) Meiste Teilnahmen Gordon Asciak (40) Meiste Jahre Gordon Asciak (14) Letzte Aktualisierung der Infobox: 1. Juni 2012 Die maltesische Da...

Russian poet (1803–1873) Tiutchev redirects here. For the racehorse, see Tiutchev (horse). Fyodor TyutchevTyutchev as painted by Stepan AlexandrovskyBorn5 December [O.S. 23 November] 1803Ovstug near Bryansk, Oryol Governorate, Russian EmpireDied27 July [O.S. 15 July] 1873 (aged 69)Saint Petersburg, Russian EmpireSpouse(s)Eleonore PetersonErnestine von DörnbergIssueAnna TyutchevaDaria TyutchevaEkaterina TyutchevaMaria TyutchevaDimitri TyutchevIvan TyutchevElena TyutchevaFydor TyutchevNikola...

Elena Contreras Patiño (2021) Elena Contreras Patiño (* 6. April 1988 in Madrid) ist eine spanische Fußballschiedsrichterin. Contreras Patiño leitete ab der Saison 2017/18 Spiele in der Primera División.[1] Sie debütierte am 3. September 2017 im Spiel zwischen dem Zaragoza CFF und dem FC Barcelona (0:9). Am Ende der Saison 2020/2021 stieg sie in die zweite Frauenliga ab. Weblinks Commons: Elena Contreras Patiño – Sammlung von Bildern, Videos und Audiodateien Elena C...

Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. L'admissibilité de cette page est à vérifier (août 2023). Motif : Aucune source secondaire centrée. Une mention dans un article plus général suffit. Vous êtes invité à compléter l'article pour expliciter son admissibilité, en y apportant des sources secondaires de qualité, ainsi qu'à discuter de son admissibilité. Si rien n'est fait, cet article sera proposé au débat d'admissibilité un an a...

Sinful Woman Cover of the first editionAuthorJames M. CainCountryUnited StatesLanguageEnglishGenreDetective fictionPublisher AvonPublication date1947Media typePrint (paperback)ISBN0887390897 Sinful Woman is a detective novel by James M. Cain that appeared originally as a paperback in 1947 by Avon publishers.[1][2] Sinful Woman was the most commercially successful of three paperbacks Cain wrote for Avon in the late 1940s and early 1950s (the other two are Jealous Woman (19...

Soviet wrestler Makharbek Khadartsev Makharbek Khadartsev Medal record Men's freestyle wrestling Olympic Games Representing the  Soviet Union 1988 Seoul 90 kg Representing the  Unified Team 1992 Barcelona 90 kg Representing  Russia 1996 Atlanta 90 kg World Championships Representing the  Soviet Union 1986 Budapest 90 kg 1987 Clermont-Ferrand 90 kg 1989 Martigny 90 kg 1990 Tokyo 90 kg 1991 Varna 90 kg Representing  Russia 1994 Istanbul 90 kg 1995 Atlanta 90 kg 1993 Tor...

Peta infrastruktur dan tata guna lahan di Komune Boissy-la-Rivière.  = Kawasan perkotaan  = Lahan subur  = Padang rumput  = Lahan pertanaman campuran  = Hutan  = Vegetasi perdu  = Lahan basah  = Anak sungaiBoissy-la-RivièreNegaraPrancisArondisemenÉtampesKantonMérévilleAntarkomunenone as of 2007Kode INSEE/pos91079 /  Boissy-la-Rivière merupakan sebuah desa kecil dan komune di département Essonne, di region Île-de-France di Prancis. Demografi...

2017 EP by Steve LacySteve Lacy's DemoEP by Steve LacyReleasedFebruary 24, 2017Genre Soul funk Length13:33LabelThree QuarterAWALProducerSteve LacySteve Lacy chronology Steve Lacy's Demo(2017) Apollo XXI(2019) Singles from Steve Lacy's Demo SomeReleased: November 1, 2016[1] Dark RedReleased: February 20, 2017[2] Steve Lacy's Demo is the first extended play[3] by American musician Steve Lacy. It was released on February 24, 2017, by Three Quarter (3Qtr) and AWAL....

Ulric L. CrockerLahir(1843-09-05)5 September 1843OhioMeninggal2 Februari 1913(1913-02-02) (umur 69)IllinoisTempat pemakamanMedora Cemetery, Medora, IllinoisPengabdianAmerika SerikatDinas/cabangAngkatan Darat Amerika SerikatUnion ArmyPangkatPrivatKesatuan 6th Michigan Volunteer Cavalry RegimentPerang/pertempuranPerang Saudara Amerika • Pertempuran Cedar CreekPenghargaanMedal of Honor Ulric Lyona Crocker (5 September 1843 – 2 Februari 1913) adalah seorang perwira Uni...

District of Madrid in SpainPuente de VallecasDistrict of MadridPanoramic viewCountry SpainAut. community MadridMunicipality MadridArea • Total14.89 km2 (5.75 sq mi)Population244,151 • Density16,270/km2 (42,100/sq mi)Madrid district number13 Puente de Vallecas ([ˈpwente ðe βaˈʝekas], Bridge of Vallecas) is one of the 21 districts of the city of Madrid, Spain. It forms, with the district of Villa de Vallecas, the geographical area of Vall...

Road bridge connecting Penang Island with the Malay Peninsula Expressway 36Penang BridgeJambatan Pulau Pinang பினாங்கு பாலம் 槟威大桥Penang Bridge in redRoute informationLength13.5 km (8.4 mi)Existed1970's–presentHistoryOpened 3 August 1985[citation needed], inaugurated 14 September 1985[1]Major junctionsEast end North–South Expressway Northern Route North–South Expressway Northern Route and FT 3112 Jalan Perusahan Perai ...

Series of animated short films Japan Animator ExpoThe logo for Japan Animator Expo feature its stick figure mascot with a face composed of the characters(ーター)日本アニメ(ーター)見本市(Nihon Animētā Mihon'ichi) Original net animationDirected by Various Ōtarō Maijō (#1, 25)Azuma Tani (#2)Hibiki Yoshizaki (#3, #20A, #31)Akira Honma (#4)Yoshikazu Yasuhiko (#5)Ichirō Itano (#5)Takeshi Honda (#6)Mahiro Maeda (#6, #13)Tadashi Hiramatsu (#7)Akemi Hayashi (#8)Akira A...

Water sports venue in Amsterdam Olympic Sports Park Swim StadiumFull nameOlympic Sports Park Swim StadiumLocationAmsterdam, NetherlandsCoordinates52°20′51″N 4°51′22″E / 52.347611°N 4.856061°E / 52.347611; 4.856061Capacity6,000ConstructionOpened1928Demolished1929TenantsSwimming and Water Polo events for the 1928 Summer Olympics, The Olympic Sports Park Swim Stadium was a venue used for the diving, swimming, water polo, and the swimming portion of the modern ...

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Maret 2023. Rudal Javelin dengan sistem aktuasi sirip berada di belakang. Control Actuation System (CAS) atau Sistem Aktuasi Kontrol adalah subsistem panduan arah target untuk benda terbang tak berawak seperti rudal peluru kendali maupun peluru meriam artileri yang ...

Roman Catholic hermitage in Italy Fonte Avellana Monastery Fonte Avellana or the Venerable Hermitage of the Holy Cross, is a Roman Catholic hermitage in Serra Sant'Abbondio in the Marche region of Italy.[1] It was once also the name of an order of hermits based at this hermitage.[2] History Fonte Avellana was established by a group of hermits living at that site around 980. The tradition of the monastery holds that it was founded by Ludolfi Pamfili, a former soldier, later her...

9K330 Tor9K331“道尔-M1” / 9K332“道尔-M2”北約代號: SA-15「臂铠」Gauntlet 9K330 運輸發射及雷達車(TLAR)类型中低空地對空導彈系統原产地 蘇聯 俄羅斯服役记录服役期间1986–現在生产历史研发者金刚石-安泰: 安泰设计局(系统总体設計) 格鲁申机器制造设计局(導彈弹体設計) “牛郎星”设计局(艦用版本設計) 研发日期1975生产商IEMZ Kupol地鐵車輛機械製造廠(車輛...

Pastiglie LeonePackaging of Leone candyFounded1857FounderLuigi LeoneHeadquartersItalyWebsitewww.pastiglieleone.it Pastiglie Leone is an Italian candy manufacturer of candies, jellies, gummy sweets, liquorice, fine chocolate and sugar- and calorie-free pastilles. The candies are produced in a variety of flavors. Pastiglie Leone was founded by Luigi Leone in 1857 in Alba before moving to Turin.[1] The company also was an Italian Royal Warrant of Appointment holder. Flavors Many flavors ...

1939 American filmFull ConfessionTheatrical release posterDirected byJohn FarrowWritten byJerome CadyBased onstory by Leo BirinskiProduced byRobert SiskStarringVictor McLaglenSally EilersBarry FitzgeraldJoseph CalleiaCinematographyJ. Roy HuntEdited byHarry MarkerMusic byRoy WebbProductioncompanyRKO Radio PicturesDistributed byRKO Radio PicturesRelease date September 8, 1939 (1939-09-08) Running time78 minsCountryUnited StatesLanguageEnglish Full Confession is a 1939 is a US pro...

Kembali kehalaman sebelumnya