R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984[2] and has found significant use in both theoretical and applied contexts.[3] A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" (to display them in a navigation system) or "find the nearest gas station" (although not taking roads into account). The R-tree can also accelerate nearest neighbor search[4] for various distance metrics, including great-circle distance.[5]
R-tree idea
The key idea of the data structure is to group nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree; the "R" in R-tree is for rectangle. Since all objects lie within this bounding rectangle, a query that does not intersect the bounding rectangle also cannot intersect any of the contained objects. At the leaf level, each rectangle describes a single object; at higher levels the aggregation includes an increasing number of objects. This can also be seen as an increasingly coarse approximation of the data set.
Similar to the B-tree, the R-tree is also a balanced search tree (so all leaf nodes are at the same depth), organizes the data in pages, and is designed for storage on disk (as used in databases). Each page can contain a maximum number of entries, often denoted as . It also guarantees a minimum fill (except for the root node), however best performance has been experienced with a minimum fill of 30%–40% of the maximum number of entries (B-trees guarantee 50% page fill, and B*-trees even 66%). The reason for this is the more complex balancing required for spatial data as opposed to linear data stored in B-trees.
As with most trees, the searching algorithms (e.g., intersection, containment, nearest neighbor search) are rather simple. The key idea is to use the bounding boxes to decide whether or not to search inside a subtree. In this way, most of the nodes in the tree are never read during a search. Like B-trees, R-trees are suitable for large data sets and databases, where nodes can be paged to memory when needed, and the whole tree cannot be kept in main memory. Even if data can be fit in memory (or cached), the R-trees in most practical applications will usually provide performance advantages over naive check of all objects when the number of objects is more than few hundred or so. However, for in-memory applications, there are similar alternatives that can provide slightly better performance or be simpler to implement in practice. [citation needed] To maintain in-memory computing for R-tree in a computer cluster where computing nodes are connected by a network, researchers have used RDMA (Remote Direct Memory Access) to implement data-intensive applications under R-tree in a distributed environment.[6] This approach is scalable for increasingly large applications and achieves high throughput and low latency performance for R-tree.
The key difficulty of R-tree is to build an efficient tree that on one hand is balanced (so the leaf nodes are at the same height) on the other hand the rectangles do not cover too much empty space and do not overlap too much (so that during search, fewer subtrees need to be processed). For example, the original idea for inserting elements to obtain an efficient tree is to always insert into the subtree that requires least enlargement of its bounding box. Once that page is full, the data is split into two sets that should cover the minimal area each. Most of the research and improvements for R-trees aims at improving the way the tree is built and can be grouped into two objectives: building an efficient tree from scratch (known as bulk-loading) and performing changes on an existing tree (insertion and deletion).
R-trees do not guarantee good worst-case performance, but generally perform well with real-world data.[7] While more of theoretical interest, the (bulk-loaded) Priority R-tree variant of the R-tree is worst-case optimal,[8] but due to the increased complexity, has not received much attention in practical applications so far.
When data is organized in an R-tree, the neighbors within a given distance r and the k nearest neighbors (for any Lp-Norm) of all points can efficiently be computed using a spatial join.[9][10] This is beneficial for many algorithms based on such queries, for example the Local Outlier Factor. DeLi-Clu,[11] Density-Link-Clustering is a cluster analysis algorithm that uses the R-tree structure for a similar kind of spatial join to efficiently compute an OPTICS clustering.
Data in R-trees is organized in pages that can have a variable number of entries (up to some pre-defined maximum, and usually above a minimum fill). Each entry within a non-leaf node stores two pieces of data: a way of identifying a child node, and the bounding box of all entries within this child node. Leaf nodes store the data required for each child, often a point or bounding box representing the child and an external identifier for the child. For point data, the leaf entries can be just the points themselves. For polygon data (that often requires the storage of large polygons) the common setup is to store only the MBR (minimum bounding rectangle) of the polygon along with a unique identifier in the tree.
Search
In range searching, the input is a search rectangle (Query box). Searching is quite similar to searching in a B+ tree. The search starts from the root node of the tree. Every internal node contains a set of rectangles and pointers to the corresponding child node and every leaf node contains the rectangles of spatial objects (the pointer to some spatial object can be there). For every rectangle in a node, it has to be decided if it overlaps the search rectangle or not. If yes, the corresponding child node has to be searched also. Searching is done like this in a recursive manner until all overlapping nodes have been traversed. When a leaf node is reached, the contained bounding boxes (rectangles) are tested against the search rectangle and their objects (if there are any) are put into the result set if they lie within the search rectangle.
For priority search such as nearest neighbor search, the query consists of a point or rectangle. The root node is inserted into the priority queue. Until the queue is empty or the desired number of results have been returned the search continues by processing the nearest entry in the queue. Tree nodes are expanded and their children reinserted. Leaf entries are returned when encountered in the queue.[12] This approach can be used with various distance metrics, including great-circle distance for geographic data.[5]
Insertion
To insert an object, the tree is traversed recursively from the root node. At each step, all rectangles in the current directory node are examined, and a candidate is chosen using a heuristic such as choosing the rectangle which requires least enlargement. The search then descends into this page, until reaching a leaf node. If the leaf node is full, it must be split before the insertion is made. Again, since an exhaustive search is too expensive, a heuristic is employed to split the node into two. Adding the newly created node to the previous level, this level can again overflow, and these overflows can propagate up to the root node; when this node also overflows, a new root node is created and the tree has increased in height.
Choosing the insertion subtree
The algorithm needs to decide in which subtree to insert. When a data object is fully contained in a single rectangle, the choice is clear. When there are multiple options or rectangles in need of enlargement, the choice can have a significant impact on the performance of the tree.
The objects are inserted into the subtree that needs the least enlargement. A Mixture heuristic is employed throughout. What happens next is it tries to minimize the overlap (in case of ties, prefer least enlargement and then least area); at the higher levels, it behaves similar to the R-tree, but on ties again preferring the subtree with smaller area. The decreased overlap of rectangles in the R*-tree is one of the key benefits over the traditional R-tree.
Splitting an overflowing node
Since redistributing all objects of a node into two nodes has an exponential number of options, a heuristic needs to be employed to find the best split. In the classic R-tree, Guttman proposed two such heuristics, called QuadraticSplit and LinearSplit. In quadratic split, the algorithm searches for the pair of rectangles that is the worst combination to have in the same node, and puts them as initial objects into the two new groups. It then searches for the entry which has the strongest preference for one of the groups (in terms of area increase) and assigns the object to this group until all objects are assigned (satisfying the minimum fill).
There are other splitting strategies such as Greene's Split,[13] the R*-tree splitting heuristic[14] (which again tries to minimize overlap, but also prefers quadratic pages) or the linear split algorithm proposed by Ang and Tan[15] (which however can produce very irregular rectangles, which are less performant for many real world range and window queries). In addition to having a more advanced splitting heuristic, the R*-tree also tries to avoid splitting a node by reinserting some of the node members, which is similar to the way a B-tree balances overflowing nodes. This was shown to also reduce overlap and thus increase tree performance.
Finally, the X-tree[16] can be seen as a R*-tree variant that can also decide to not split a node, but construct a so-called super-node containing all the extra entries, when it doesn't find a good split (in particular for high-dimensional data).
Effect of different splitting heuristics on a database with US postal districts
Guttman's quadratic split.[2] Pages in this tree overlap a lot.
Guttman's linear split.[2] Even worse structure, but also faster to construct.
Greene's split.[13] Pages overlap much less than with Guttman's strategy.
Ang-Tan linear split.[15] This strategy produces sliced pages, which often yield bad query performance.
R* tree topological split.[14] The pages overlap much less since the R*-tree tries to minimize page overlap, and the reinsertions further optimized the tree. The split strategy prefers quadratic pages, which yields better performance for common map applications.
Bulk loaded R* tree using Sort-Tile-Recursive (STR). The leaf pages do not overlap at all, and the directory pages overlap only little. This is a very efficient tree, but it requires the data to be completely known beforehand.
M-trees are similar to the R-tree, but use nested spherical pages. Splitting these pages is, however, much more complicated and pages usually overlap much more.
Deletion
Deleting an entry from a page may require updating the bounding rectangles of parent pages. However, when a page is underfull, it will not be balanced with its neighbors. Instead, the page will be dissolved and all its children (which may be subtrees, not only leaf objects) will be reinserted. If during this process the root node has a single element, the tree height can decrease.
This section needs expansion. You can help by adding to it. (October 2011)
Bulk-loading
Nearest-X: Objects are sorted by their first coordinate ("X") and then split into pages of the desired size.
Packed Hilbert R-tree: variation of Nearest-X, but sorting using the Hilbert value of the center of a rectangle instead of using the X coordinate. There is no guarantee the pages will not overlap.
Sort-Tile-Recursive (STR):[17] Another variation of Nearest-X, that estimates the total number of leaves required as , the required split factor in each dimension to achieve this as , then repeatedly splits each dimensions successively into equal sized partitions using 1-dimensional sorting. The resulting pages, if they occupy more than one page, are again bulk-loaded using the same algorithm. For point data, the leaf nodes will not overlap, and "tile" the data space into approximately equal sized pages.
Overlap Minimizing Top-down (OMT):[18] Improvement over STR using a top-down approach which minimizes overlaps between slices and improves query performance.
^Roussopoulos, N.; Kelley, S.; Vincent, F. D. R. (1995). "Nearest neighbor queries". Proceedings of the 1995 ACM SIGMOD international conference on Management of data – SIGMOD '95. p. 71. doi:10.1145/223784.223794. ISBN0897917316.
^ abSchubert, E.; Zimek, A.; Kriegel, H. P. (2013). "Geodetic Distance Queries on R-Trees for Indexing Geographic Data". Advances in Spatial and Temporal Databases. Lecture Notes in Computer Science. Vol. 8098. p. 146. doi:10.1007/978-3-642-40235-7_9. ISBN978-3-642-40234-0.
^Mengbai Xiao, Hao Wang, Liang Geng, Rubao Lee, and Xiaodong Zhang (2022). " An RDMA-enabled In-memory Computing Platform for R-tree on Clusters". ACM Transactions on Spatial Algorithms and Systems. pp. 1–26. doi:10.1145/3503513.{{cite conference}}: CS1 maint: multiple names: authors list (link)
^Böhm, Christian; Krebs, Florian (2003-09-01). "Supporting KDD Applications by the k-Nearest Neighbor Join". Database and Expert Systems Applications. Lecture Notes in Computer Science. Vol. 2736. Springer, Berlin, Heidelberg. pp. 504–516. CiteSeerX10.1.1.71.454. doi:10.1007/978-3-540-45227-0_50. ISBN9783540408062.
^Achtert, Elke; Böhm, Christian; Kröger, Peer (2006). "DeLi-Clu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking". In Ng, Wee Keong; Kitsuregawa, Masaru; Li, Jianzhong; Chang, Kuiyu (eds.). Advances in Knowledge Discovery and Data Mining, 10th Pacific-Asia Conference, PAKDD 2006, Singapore, April 9-12, 2006, Proceedings. Lecture Notes in Computer Science. Vol. 3918. Springer. pp. 119–128. doi:10.1007/11731139_16.
^Kuan, J.; Lewis, P. (1997). "Fast k nearest neighbour search for R-tree family". Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat. No.97TH8237). p. 924. doi:10.1109/ICICS.1997.652114. ISBN0-7803-3676-3.
^ abGreene, D. (1989). "An implementation and performance analysis of spatial data access methods". [1989] Proceedings. Fifth International Conference on Data Engineering. pp. 606–615. doi:10.1109/ICDE.1989.47268. ISBN978-0-8186-1915-1. S2CID7957624.
^ abAng, C. H.; Tan, T. C. (1997). "New linear node splitting algorithm for R-trees". In Scholl, Michel; Voisard, Agnès (eds.). Proceedings of the 5th International Symposium on Advances in Spatial Databases (SSD '97), Berlin, Germany, July 15–18, 1997. Lecture Notes in Computer Science. Vol. 1262. Springer. pp. 337–349. doi:10.1007/3-540-63238-7_38.
Este artículo o sección necesita referencias que aparezcan en una publicación acreditada.Este aviso fue puesto el 5 de abril de 2015. Organización de Estados Centroamericanos (ODECA) Bandera Escudo Himno: La Granadera noicon Situación de Organización de Estados Centroamericanos Sede San Salvador Idiomas oficiales español Tipo Organismo regional Fundación 14 de octubre de 1951 Superficie • Total 423.016[1] km² Población • Total • Densida...
Short story by L. Sprague de CampThe Owl and the ApeShort story by L. Sprague de CampW. E. Terry's illustration ofthe story in ImaginationCountryUnited StatesLanguageEnglishGenre(s)FantasyPublicationPublished inImagination: Stories of Science and FantasyMedia typePrint (Magazine)Publication dateNovember, 1951ChronologySeriesPusadian series The Eye of Tandyla The Hungry Hercynian The Owl and the Ape is a fantasy story by American writer L. Sprague de Camp, part of his Pusadian...
هذه المقالة بحاجة لصندوق معلومات. فضلًا ساعد في تحسين هذه المقالة بإضافة صندوق معلومات مخصص إليها. هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (سبتمبر 2017) جدار الأعظمية هو جدار فاصل خرساني بنته القوات الأمريكية ...
Jargon and esoteric terms used in BDSM Most of the terms used in Wikipedia glossaries are already defined and explained within Wikipedia itself. However, lists like the following indicate where new articles need to be written and are also useful for looking up and comparing large numbers of terms together. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sou...
Campeonato MundialAtletismo 2022 Provas de pista 100 m masc fem 200 m masc fem 400 m masc fem 800 m masc fem 1500 m masc fem 5000 m masc fem 10000 m masc fem 100 m com barreiras fem 110 m com barreiras masc 400 m com barreiras masc fem 3000 mcom obstáculos masc fem Revezamento 4×100 m masc fem Revezamento 4×400 m masc fem mis Provas de estrada Maratona masc fem 20 km marcha atlética masc fem 35 km marcha atlética masc fem Provas de campo Salto em distância masc fem Salto triplo ...
Eurovision Song Contest 2017Country GermanyNational selectionSelection processUnser Song 2017Selection date(s)9 February 2017Selected entrantLevinaSelected songPerfect LifeSelected songwriter(s)Lindsey RayLindy RobbinsDave BassettFinals performanceFinal result25th, 6 pointsGermany in the Eurovision Song Contest ◄2016 • 2017 • 2018► Germany participated in the Eurovision Song Contest 2017 with the song Perfect Life written by Lindsey Ray, Lindy Robbin...
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Pink Cream 69 album – news · newspapers · books · scholar · JSTOR (November 2006) (Learn how and when to remove this template message) 1989 studio album by Pink Cream 69Pink Cream 69Studio album by Pink Cream 69Released2 October 1989RecordedMay–A...
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Metropolitan University Prague – news · newspapers · books · scholar · JSTOR (December 2013) (Learn how and when to remove this template message) Metropolitan University PragueMetropolitní univerzita PrahaTypePrivateEstablished2001RectorMichal KlímaLocationPr...
2006 studio album by CalibanThe Undying DarknessStudio album by CalibanReleased4 April 2006 (2006-04-04)RecordedSeptember–October 2005StudioPrincipal Studios, Senden, GermanyGenreMetalcoreLength46:12LabelAbacus RecordingsProducerAnders FridénCaliban chronology The Split Program II(2005) The Undying Darkness(2006) The Awakening(2007) Professional ratingsReview scoresSourceRatingAllMusic[1]Blabbermouth.net6/10[2]Punknews.org[3]Sputnikmusic3.0/5&...
American actor (born 1973) This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: Shawn Harrison actor – news · newspapers · books · scholar · JSTOR (September 2018) (Learn how and when to remove ...
Papiro P {\displaystyle {\mathfrak {P}}} 52Manoscritto del Nuovo Testamento P {\displaystyle {\mathfrak {P}}} 52, recto (parte anteriore)NomeRylands P52 TestoVangelo secondo Giovanni 18,31–33,37–38 Datazionefine II/inizi III secolo (precedentemente 125–175) Scritturagreca RitrovamentoEgitto ConservazioneJohn Rylands Library Editio princepsC. H. Roberts, «An Unpublished Fragment of the Fourth Gospel in the John Rylands Library» (Manchester, 1935) Dimensioneframmento CategoriaI Il Papir...
Dieser Artikel beschreibt die Luftabwehrrakete südafrikanischer Produktion. Für die militärische Bewegung siehe Umkhonto we Sizwe. Umkhonto Allgemeine Angaben Typ Flugabwehrlenkwaffe Hersteller Denel Aerospace Systems Indienststellung 2005 Technische Daten Länge 3,32 m (IR) Durchmesser 180 mm Gefechtsgewicht 135 kg (IR) Spannweite 500 mm Antrieb Erste Stufe Feststoffrakete Geschwindigkeit Mach 2 Reichweite 12 km (IR) Ausstattung Lenkung Trägheitsnavigationsplattform plus Datenlink Zielo...
American judge (1909–1993) David L. BazelonSenior Judge of the United States Court of Appeals for the District of Columbia CircuitIn officeJune 30, 1979 – February 19, 1993Chief Judge of the United States Court of Appeals for the District of Columbia CircuitIn officeOctober 9, 1962 – March 27, 1978Preceded byWilbur Kingsbury MillerSucceeded byJ. Skelly WrightJudge of the United States Court of Appeals for the District of Columbia CircuitIn officeOctober 21, 1949 ...
Church in Lincoln, United KingdomSt Hugh's ChurchSt Hugh of Lincoln ChurchSt Hugh's ChurchLocation in Lincoln53°13′50″N 0°32′09″W / 53.2305°N 0.5357°W / 53.2305; -0.5357OS grid referenceSK9784071388LocationLincolnCountryUnited KingdomDenominationRoman CatholicWebsiteStHughsLincoln.org.ukHistoryStatusActiveDedicationHugh of LincolnArchitectureFunctional statusParish churchHeritage designationGrade II listedDesignated8 July 1991[1]Architect(s)Albert ...
Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!