Database index

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

An index is a copy of selected columns of data, from a table, that is designed to enable very efficient search. An index normally includes a "key" or direct link to the original row of data from which it was copied, to allow the complete row to be retrieved efficiently. Some databases extend the power of indexing by letting developers create indexes on column values that have been transformed by functions or expressions. For example, an index could be created on upper(last_name), which would only store the upper-case versions of the last_name field in the index. Another option sometimes supported is the use of partial index, where index entries are created only for those records that satisfy some conditional expression. A further aspect of flexibility is to permit indexing on user-defined functions, as well as expressions formed from an assortment of built-in functions.

Usage

Support for fast lookup

Most database software includes indexing technology that enables sub-linear time lookup to improve performance, as linear search is inefficient for large databases.

Suppose a database contains N data items and one must be retrieved based on the value of one of the fields. A simple implementation retrieves and examines each item according to the test. If there is only one matching item, this can stop when it finds that single item, but if there are multiple matches, it must test everything. This means that the number of operations in the average case is O(N) or linear time. Since databases may contain many objects, and since lookup is a common operation, it is often desirable to improve performance.

An index is any data structure that improves the performance of lookup. There are many different data structures used for this purpose. There are complex design trade-offs involving lookup performance, index size, and index-update performance. Many index designs exhibit logarithmic (O(log(N))) lookup performance and in some applications it is possible to achieve flat (O(1)) performance.

Policing the database constraints

Indexes are used to police database constraints, such as UNIQUE, EXCLUSION, PRIMARY KEY and FOREIGN KEY. An index may be declared as UNIQUE, which creates an implicit constraint on the underlying table. Database systems usually implicitly create an index on a set of columns declared PRIMARY KEY, and some are capable of using an already-existing index to police this constraint. Many database systems require that both referencing and referenced sets of columns in a FOREIGN KEY constraint are indexed, thus improving performance of inserts, updates and deletes to the tables participating in the constraint.

Some database systems support an EXCLUSION constraint that ensures that, for a newly inserted or updated record, a certain predicate holds for no other record. This can be used to implement a UNIQUE constraint (with equality predicate) or more complex constraints, like ensuring that no overlapping time ranges or no intersecting geometry objects would be stored in the table. An index supporting fast searching for records satisfying the predicate is required to police such a constraint.[1]

Index architecture and indexing methods

Non-clustered

The data is present in arbitrary order, but the logical ordering is specified by the index. The data rows may be spread throughout the table regardless of the value of the indexed column or expression. The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record (page and the row number in the data page in page-organized engines; row offset in file-organized engines).

In a non-clustered index,

  • The physical order of the rows is not the same as the index order.
  • The indexed columns are typically non-primary key columns used in JOIN, WHERE, and ORDER BY clauses.

There can be more than one non-clustered index on a database table.

Clustered

Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored in order. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items is selected.

Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s).

Cluster

When multiple databases and multiple tables are joined, it is called a cluster (not to be confused with clustered index described previously). The records for the tables sharing the value of a cluster key shall be stored together in the same or nearby data blocks. This may improve the joins of these tables on the cluster key, since the matching records are stored together and less I/O is required to locate them.[2] The cluster configuration defines the data layout in the tables that are parts of the cluster. A cluster can be keyed with a B-tree index or a hash table. The data block where the table record is stored is defined by the value of the cluster key.

Column order

The order that the index definition defines the columns in is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.

For example, in a phone book organized by city first, then by last name, and then by first name, in a particular city, one can easily extract the list of all phone numbers. However, it would be very tedious to find all the phone numbers for a particular last name. One would have to look within each city's section for the entries with that last name. Some databases can do this, others just won't use the index.

In the phone book example with a composite index created on the columns (city, last_name, first_name), if we search by giving exact values for all the three fields, search time is minimal—but if we provide the values for city and first_name only, the search uses only the city field to retrieve all matched records. Then a sequential lookup checks the matching with first_name. So, to improve the performance, one must ensure that the index is created on the order of search columns.

Applications and limitations

Indexes are useful for many applications but come with some limitations. Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Smith';. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). With an index the database simply follows the index data structure (typically a B-tree) until the Smith entry has been found; this is much less computationally expensive than a full table scan.

Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@wikipedia.org';. This query would yield an email address for every customer whose email address ends with "@wikipedia.org", but even if the email_address column has been indexed the database must perform a full index scan. This is because the index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term, the database software is unable to use the underlying index data structure (in other words, the WHERE-clause is not sargable). This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: SELECT email_address FROM customers WHERE reverse(email_address) LIKE reverse('%@wikipedia.org');. This puts the wild-card at the right-most part of the query (now gro.aidepikiw@%), which the index on reverse(email_address) can satisfy.

When the wildcard characters are used on both sides of the search word as %wikipedia.org%, the index available on this field is not used. Rather only a sequential search is performed, which takes time.

Types of indexes

Bitmap index

A bitmap index is a special kind of indexing that stores the bulk of its data as bit arrays (bitmaps) and answers most queries by performing bitwise logical operations on these bitmaps. The most commonly used indexes, such as B+ trees, are most efficient if the values they index do not repeat or repeat a small number of times. In contrast, the bitmap index is designed for cases where the values of a variable repeat very frequently. For example, the sex field in a customer database usually contains at most three distinct values: male, female or unknown (not recorded). For such variables, the bitmap index can have a significant performance advantage over the commonly used trees.

Dense index

A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file. In clustered indices with duplicate keys, the dense index points to the first record with that key.[3]

Sparse index

A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.

Reverse index

A reverse-key index reverses the key value before entering it in the index. E.g., the value 24538 becomes 83542 in the index. Reversing the key value is particularly useful for indexing data such as sequence numbers, where new key values monotonically increase.

Inverted index

An inverted index maps a content word to the document containing it, thereby allowing full-text searches.

Primary index

The primary index contains the key fields of the table and a pointer to the non-key fields of the table. The primary index is created automatically when the table is created in the database.

Secondary index

It is used to index fields that are neither ordering fields nor key fields (there is no assurance that the file is organized on key field or primary key field). One index entry for every tuple in the data file (dense index) contains the value of the indexed attribute and pointer to the block or record.

Hash index

A hash index in database is most commonly used index in data management. It is created on a column that contains unique values, such as a primary key or email address.

Linear hashing

Another type of index used in database systems is linear hashing.

Index implementations

Indices can be implemented using a variety of data structures. Popular indices include balanced trees, B+ trees and hashes.[4]

In Microsoft SQL Server, the leaf node of the clustered index corresponds to the actual data, not simply a pointer to data that resides elsewhere, as is the case with a non-clustered index.[5] Each relation can have a single clustered index and many unclustered indices.[6]

Index concurrency control

An index is typically being accessed concurrently by several transactions and processes, and thus needs concurrency control. While in principle indexes can utilize the common database concurrency control methods, specialized concurrency control methods for indexes exist, which are applied in conjunction with the common methods for a substantial performance gain.

Covering index

In most cases, an index is used to quickly locate the data records from which the required data is read. In other words, the index is only used to locate data records in the table and not to return data.

A covering index is a special case where the index itself contains the required data fields and can answer the required data.

Consider the following table (other fields omitted):

ID Name Other Fields
12 Plug ...
13 Lamp ...
14 Fuse ...

To find the Name for ID 13, an index on (ID) is useful, but the record must still be read to get the Name. However, an index on (ID, Name) contains the required data field and eliminates the need to look up the record.

Covering indexes are each for a specific table. Queries which JOIN/ access across multiple tables, may potentially consider covering indexes on more than one of these tables.[7]

A covering index can dramatically speed up data retrieval but may itself be large due to the additional keys, which slow down data insertion and update. To reduce such index size, some systems allow including non-key fields in the index. Non-key fields are not themselves part of the index ordering but only included at the leaf level, allowing for a covering index with less overall index size.

This can be done in SQL with CREATE INDEX my_index ON my_table (id) INCLUDE (name);.[8][9]

Standardization

No standard defines how to create indexes, because the ISO SQL Standard does not cover physical aspects. Indexes are one of the physical parts of database conception among others like storage (tablespace or filegroups).[clarify] RDBMS vendors all give a CREATE INDEX syntax with some specific options that depend on their software's capabilities.

See also

References

  1. ^ PostgreSQL 9.1.2 Documentation: CREATE TABLE
  2. ^ Overview of Clusters Oracle® Database Concepts 10g Release 1 (10.1)
  3. ^ Database Systems: The Complete Book. Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom
  4. ^ Gavin Powell (2006). Chapter 8: Building Fast-Performing Database Models. Wrox Publishing. ISBN 978-0-7645-7490-0. {{cite book}}: |work= ignored (help)
  5. ^ "Clustered Index Structures". SQL Server 2005 Books Online (September 2007). 4 October 2012.
  6. ^ Daren Bieniek; Randy Dess; Mike Hotek; Javier Loria; Adam Machanic; Antonio Soto; Adolfo Wiernik (January 2006). "Chapter 4: Creating Indices". SQL Server 2005 Implementation and Management. Microsoft Press.
  7. ^ Covering Indexes for Query Optimization
  8. ^ "11.9. Index-Only Scans and Covering Indexes". PostgreSQL Documentation. 2023-02-09. Retrieved 2023-04-08.
  9. ^ MikeRayMSFT. "Create indexes with included columns - SQL Server". learn.microsoft.com. Retrieved 2023-04-08.

Read other articles:

Head military official of military or paramilitary maintained by the state This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Adjutant General of Maryland – news · newspapers&#...

 

1929 silent film This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.Find sources: When Dreams Come True 1929 film – news · newspapers · books · scholar · JSTOR (August 2022) When Dreams Come TrueArgentine posterDirected byDuke WorneWritten byArthur HoerlProduced byTrem CarrStarringHelene Costello Rex Lease Cla...

 

Main article: 1932 United States presidential election 1932 United States presidential election in California ← 1928 November 8, 1932 1936 → Turnout80.65% (of registered voters) 0.87 pp 65.22% (of eligible voters) 8.24 pp[1]   Nominee Franklin D. Roosevelt Herbert Hoover Party Democratic Republican Home state New York California Running mate John Nance Garner Charles Curtis Electoral vote 22 0 Popular vote 1,324,157 847,902 Percentage 58....

2023 American horror film Deliver UsTheatrical release posterDirected by Lee Roy Kunz Cru Ennis Written by Lee Roy Kunz Kane Kunz Produced by Cru Ennis Lee Roy Kunz Isaac Bauman Starring Lee Roy Kunz Maria Vera Ratti Alexander Siddig Jaune Kimmel Thomas Kretschmann CinematographyIsaac BaumanEdited byDavid Walsh HeinzMusic byTóti GuðnasonProductioncompanyWorld's Fair PicturesDistributed by Magnet Releasing Release date September 29, 2023 (2023-09-29) (United States) Countr...

 

1975 BBC Production For the South Korean TV series, see Secret Garden (South Korean TV series). For the Singaporean TV series, see Secret Garden (Singaporean TV series). The Secret GardenDVD coverGenreChildren's dramaBased onThe Secret Garden by Frances Hodgson BurnettWritten byDorothea BrookingDirected byDorothea BrookingStarringSarah Hollis AndrewsAndrew HarrisonDavid PattersonJohn WoodnuttCountry of originUnited KingdomOriginal languageEnglishNo. of episodes7ProductionProducerDorothea Broo...

 

This article contains content that is written like an advertisement. Please help improve it by removing promotional content and inappropriate external links, and by adding encyclopedic content written from a neutral point of view. (February 2023) (Learn how and when to remove this template message) Co-educational secondary (year 9-13) school in RollestonRolleston CollegeHoroeka HaemataRolleston College in 2021Address631 Springston Rolleston RoadCanterburyRolleston, 7678InformationSchool typeS...

American retired professional wrestler Joey RyanRyan in March 2016Birth nameJoseph Ryan Meehan[1]Born (1979-11-07) November 7, 1979 (age 44)[2]Los Angeles, California, U.S.[3] [4]Spouse(s) Laura James ​ ​(m. 2016; div. 2019)​Professional wrestling careerRing name(s)Stardong[5]Chase Walker[6]El Gallinero (I)[3]Forsaken[3]Joey Blalock[3]Joey Hollywood[7]Joey RyanJ...

 

1972 Indian filmMaalikDirected byA. BhimsinghStarringRajesh Khanna Sharmila TagoreMusic byKalyanji-AnandjiRelease date 27 October 1972 (1972-10-27) CountryIndiaLanguageHindi Maalik is a 1972 Indian Hindi-language spiritual drama film directed by A. Bhimsingh. The film stars Rajesh Khanna and Sharmila Tagore. Ashok Kumar played a special appearance. It is a loose remake of Tamil film Thunaivan. It was the second box office flop in the career of Rajesh Khanna since the beginning ...

 

Rear engined double decker bus This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Leyland Atlantean – news · newspapers · books · scholar · JSTOR (January 2018) (Learn how and when to remove this template message) Motor vehicle Leyland AtlanteanA Northern Counties-bodied Atlantean seen at the Manchester Museum ...

Town Municipality in Chonburi, ThailandAng Sila อ่างศิลาTown Municipality Clockwise from top: view of Ang Sila from Khao Sam Muk; Na Jasa Tai Chue Shrine; and the seaside coast of the townAng SilaLocation on the Bay of BangkokShow map of Bay of BangkokAng SilaLocation in ThailandShow map of ThailandCoordinates: 13°20′11″N 100°55′40″E / 13.33639°N 100.92778°E / 13.33639; 100.92778Country ThailandProvinceChonburiDistrictMueang ChonburiGov...

 

Peta Provinsi Papua Tengah di Indonesia Berikut adalah daftar distrik/kecamatan dan kelurahan di Provinsi Papua Tengah, Indonesia. Provinsi Papua Tengah terdiri dari 8 kabupaten, 130 distrik, 36 kelurahan, dan 1.172 kampung. Pada tahun 2017, jumlah penduduknya diperkirakan mencapai 1.177.902 jiwa dengan total luas wilayah 61.012,18 km².[1][2] No. Kode Kemendagri Kabupaten/Kota Luas Wilayah (km2) Penduduk (jiwa) 2017 Distrik Kelurahan Kampung 1 91.28 Kab. Deiyai 537,39 88.145 ...

 

College football game2011 Famous Idaho Potato Bowl Ohio Bobcats Utah State Aggies (9–4) (7–5) MAC WAC 24 23 Head coach: Frank Solich Head coach: Gary Andersen 1234 Total Ohio 07107 24 Utah State 90140 23 DateDecember 17, 2011Season2011StadiumBronco StadiumLocationBoise, IdahoMVPWR LaVon Brazill, Ohio[1]FavoriteUtah State by 3[2]RefereePenn Wagers (SEC)Attendance28,076PayoutUS$750,000 per teamUnited States TV coverageNetworkESPNAnnouncersDave Flemming (P...

Casa calle Aceña, nº 5 Bien de interés culturalPatrimonio histórico de España LocalizaciónPaís España EspañaComunidad Andalucía AndalucíaProvincia Huelva HuelvaMunicipio MoguerDatos generalesCategoría Sitio HistóricoDeclaración 20/01/2015Construcción Siglo XIX-XIX[editar datos en Wikidata] La casa de la calle Aceña de Juan Ramón Jiménez se encuentra en la actual calle Santa Ángela de la Cruz número 5 de Moguer, Provincia de Huelva (España). E...

 

Bandar Udara Internasional General Francisco J. MujicaAeropuerto Internacional General Francisco J. MujicaIATA: MLMICAO: MMMM MLMLokasi bandara di MeksikoInformasiJenisPublikPengelolaGrupo Aeroportuario del PacíficoMelayaniMoreliaLokasiÁlvaro Obregón, MichoacánKetinggian dpl1,839 mdplKoordinat19°50′59″N 101°01′31″W / 19.84972°N 101.02528°W / 19.84972; -101.02528Landasan pacu Arah Panjang Permukaan kaki m 05/23 11,155 3,400 Aspal Statistik (201...

 

Genus of Gram-positive bacteria Not to be confused with Streptococcus. Staph redirects here. Not to be confused with Staff. Staphylococcus Scanning electron micrograph of S. aureus colonies: Note the grape-like clustering common to Staphylococcus species. Scientific classification Domain: Bacteria Phylum: Bacillota Class: Bacilli Order: Bacillales Family: Staphylococcaceae Genus: StaphylococcusRosenbach 1884 Species S. argenteus S. arlettae S. agnetis S. aureus S. auricularis S. borealis S. c...

American sea services magazine For other magazines, All pages with titles containing proceedings ProceedingsJanuary 2009 coverEditor-in-ChiefCAPT Bill Hamblet, USN (Ret.)FrequencyMonthlyPublisherVADM Peter H. Daly, USN (Ret.)First issue1874CompanyUnited States Naval InstituteCountryUnited StatesBased inUnited States Naval Academy, Annapolis, MarylandLanguageEnglishWebsitewww.usni.org/magazines/proceedings/ISSN0041-798X Proceedings is a 96-page monthly magazine published by the United States N...

 

1977 Indian filmAllahu AkbarPosterDirected byMoidu PadiyathWritten byMoidu PadiyathStarringJayabharathiJeseyVincent K. P. UmmerMusic byM. S. BaburajProductioncompanyHashim ProductionDistributed byHashim ProductionRelease date 4 February 1977 (1977-02-04) CountryIndiaLanguageMalayalam Allahu Akbar (transl. God is the greatest) is a 1977 Indian Malayalam-language film, directed by Moidu Padiyath. The film stars Jayabharathi, Jeassy and Vincent in the lead roles. The film ha...

 

Обложка журнала Вольное казачество, Париж, 1936 г. Делегаты первого съезда Вольного казачества из Суботова в Чигирине, октябрь 1917 года Вольное казачество (укр. Вільне козацтво, фр. Les cossaques libres), Украинское вольное казачество, иногда также Свободное казачество[1] — о...

Questa voce sull'argomento calciatori cubani è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Ángel Horta Nazionalità  Cuba Calcio Ruolo Difensore Squadra  Camagüey Carriera Squadre di club1 2013- Camagüey? (?) Nazionale 2015- Cuba1 (0) 1 I due numeri indicano le presenze e le reti segnate, per le sole partite di campionato.Il simbolo → indica un trasferimento in prestito.   Mod...

 

Eurovision Song Contest 2023 Slogan: United by Music Finale 13. maj 2023 Semifinale Semifinale 1: 9. maj 2023Semifinale 2: 11. maj 2023 Værter Alesha DixonHannah WaddinghamJulia SaninaGraham Norton (Finalen) Sendevært BBC & Suspilne By Liverpool, Storbritannien Sted Liverpool Arena Vindersang  Sverige Tattoo Stemmesystem Semifinalerne: Seerstemmer + Onlinestemmer (Rest of The World). Finalen: 50% seerstemme (+ Onlinestemmer) og 50% jurystemme.[1] Antal sange 37 Debuterende...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!