Set (abstract data type)

In computer science, a set is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.

Some set data structures are designed for static or frozen sets that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, called dynamic or mutable sets, allow also the insertion and deletion of elements from the set.

A multiset is a special kind of set in which an element can appear multiple times in the set.

Type theory

In type theory, sets are generally identified with their indicator function (characteristic function): accordingly, a set of values of type may be denoted by or . (Subtypes and subsets may be modeled by refinement types, and quotient sets may be replaced by setoids.) The characteristic function of a set is defined as:

In theory, many other abstract data structures can be viewed as set structures with additional operations and/or additional axioms imposed on the standard operations. For example, an abstract heap can be viewed as a set structure with a min(S) operation that returns the element of smallest value.

Operations

Core set-theoretical operations

One may define the operations of the algebra of sets:

  • union(S,T): returns the union of sets S and T.
  • intersection(S,T): returns the intersection of sets S and T.
  • difference(S,T): returns the difference of sets S and T.
  • subset(S,T): a predicate that tests whether the set S is a subset of set T.

Static sets

Typical operations that may be provided by a static set structure S are:

  • is_element_of(x,S): checks whether the value x is in the set S.
  • is_empty(S): checks whether the set S is empty.
  • size(S) or cardinality(S): returns the number of elements in S.
  • iterate(S): returns a function that returns one more value of S at each call, in some arbitrary order.
  • enumerate(S): returns a list containing the elements of S in some arbitrary order.
  • build(x1,x2,…,xn,): creates a set structure with values x1,x2,...,xn.
  • create_from(collection): creates a new set structure containing all the elements of the given collection or all the elements returned by the given iterator.

Dynamic sets

Dynamic set structures typically add:

  • create(): creates a new, initially empty set structure.
    • create_with_capacity(n): creates a new set structure, initially empty but capable of holding up to n elements.
  • add(S,x): adds the element x to S, if it is not present already.
  • remove(S, x): removes the element x from S, if it is present.
  • capacity(S): returns the maximum number of values that S can hold.

Some set structures may allow only some of these operations. The cost of each operation will depend on the implementation, and possibly also on the particular values stored in the set, and the order in which they are inserted.

Additional operations

There are many other operations that can (in principle) be defined in terms of the above, such as:

  • pop(S): returns an arbitrary element of S, deleting it from S.[1]
  • pick(S): returns an arbitrary element of S.[2][3][4] Functionally, the mutator pop can be interpreted as the pair of selectors (pick, rest), where rest returns the set consisting of all elements except for the arbitrary element.[5] Can be interpreted in terms of iterate.[a]
  • map(F,S): returns the set of distinct values resulting from applying function F to each element of S.
  • filter(P,S): returns the subset containing all elements of S that satisfy a given predicate P.
  • fold(A0,F,S): returns the value A|S| after applying Ai+1 := F(Ai, e) for each element e of S, for some binary operation F. F must be associative and commutative for this to be well-defined.
  • clear(S): delete all elements of S.
  • equal(S1', S2'): checks whether the two given sets are equal (i.e. contain all and only the same elements).
  • hash(S): returns a hash value for the static set S such that if equal(S1, S2) then hash(S1) = hash(S2)

Other operations can be defined for sets with elements of a special type:

  • sum(S): returns the sum of all elements of S for some definition of "sum". For example, over integers or reals, it may be defined as fold(0, add, S).
  • collapse(S): given a set of sets, return the union.[6] For example, collapse({{1}, {2, 3}}) == {1, 2, 3}. May be considered a kind of sum.
  • flatten(S): given a set consisting of sets and atomic elements (elements that are not sets), returns a set whose elements are the atomic elements of the original top-level set or elements of the sets it contains. In other words, remove a level of nesting – like collapse, but allow atoms. This can be done a single time, or recursively flattening to obtain a set of only atomic elements.[7] For example, flatten({1, {2, 3}}) == {1, 2, 3}.
  • nearest(S,x): returns the element of S that is closest in value to x (by some metric).
  • min(S), max(S): returns the minimum/maximum element of S.

Implementations

Sets can be implemented using various data structures, which provide different time and space trade-offs for various operations. Some implementations are designed to improve the efficiency of very specialized operations, such as nearest or union. Implementations described as "general use" typically strive to optimize the element_of, add, and delete operations. A simple implementation is to use a list, ignoring the order of the elements and taking care to avoid repeated values. This is simple but inefficient, as operations like set membership or element deletion are O(n), as they require scanning the entire list.[b] Sets are often instead implemented using more efficient data structures, particularly various flavors of trees, tries, or hash tables.

As sets can be interpreted as a kind of map (by the indicator function), sets are commonly implemented in the same way as (partial) maps (associative arrays) – in this case in which the value of each key-value pair has the unit type or a sentinel value (like 1) – namely, a self-balancing binary search tree for sorted sets[definition needed] (which has O(log n) for most operations), or a hash table for unsorted sets (which has O(1) average-case, but O(n) worst-case, for most operations). A sorted linear hash table[8] may be used to provide deterministically ordered sets.

Further, in languages that support maps but not sets, sets can be implemented in terms of maps. For example, a common programming idiom in Perl that converts an array to a hash whose values are the sentinel value 1, for use as a set, is:

my %elements = map { $_ => 1 } @elements;

Other popular methods include arrays. In particular a subset of the integers 1..n can be implemented efficiently as an n-bit bit array, which also support very efficient union and intersection operations. A Bloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.

The Boolean set operations can be implemented in terms of more elementary operations (pop, clear, and add), but specialized algorithms may yield lower asymptotic time bounds. If sets are implemented as sorted lists, for example, the naive algorithm for union(S,T) will take time proportional to the length m of S times the length n of T; whereas a variant of the list merging algorithm will do the job in time proportional to m+n. Moreover, there are specialized set data structures (such as the union-find data structure) that are optimized for one or more of these operations, at the expense of others.

Language support

One of the earliest languages to support sets was Pascal; many languages now include it, whether in the core language or in a standard library.

  • In C++, the Standard Template Library (STL) provides the set template class, which is typically implemented using a binary search tree (e.g. red–black tree); SGI's STL also provides the hash_set template class, which implements a set using a hash table. C++11 has support for the unordered_set template class, which is implemented using a hash table. In sets, the elements themselves are the keys, in contrast to sequenced containers, where elements are accessed using their (relative or absolute) position. Set elements must have a strict weak ordering.
  • The Rust standard library provides the generic HashSet and BTreeSet types.
  • Java offers the Set interface to support sets (with the HashSet class implementing it using a hash table), and the SortedSet sub-interface to support sorted sets (with the TreeSet class implementing it using a binary search tree).
  • Apple's Foundation framework (part of Cocoa) provides the Objective-C classes NSSet, NSMutableSet, NSCountedSet, NSOrderedSet, and NSMutableOrderedSet. The CoreFoundation APIs provide the CFSet and CFMutableSet types for use in C.
  • Python has built-in set and frozenset types since 2.4, and since Python 3.0 and 2.7, supports non-empty set literals using a curly-bracket syntax, e.g.: {x, y, z}; empty sets must be created using set(), because Python uses {} to represent the empty dictionary.
  • The .NET Framework provides the generic HashSet and SortedSet classes that implement the generic ISet interface.
  • Smalltalk's class library includes Set and IdentitySet, using equality and identity for inclusion test respectively. Many dialects provide variations for compressed storage (NumberSet, CharacterSet), for ordering (OrderedSet, SortedSet, etc.) or for weak references (WeakIdentitySet).
  • Ruby's standard library includes a set module which contains Set and SortedSet classes that implement sets using hash tables, the latter allowing iteration in sorted order.
  • OCaml's standard library contains a Set module, which implements a functional set data structure using binary search trees.
  • The GHC implementation of Haskell provides a Data.Set module, which implements immutable sets using binary search trees.[9]
  • The Tcl Tcllib package provides a set module which implements a set data structure based upon TCL lists.
  • The Swift standard library contains a Set type, since Swift 1.2.
  • JavaScript introduced Set as a standard built-in object with the ECMAScript 2015[10] standard.
  • Erlang's standard library has a sets module.
  • Clojure has literal syntax for hashed sets, and also implements sorted sets.
  • LabVIEW has native support for sets, from version 2019.
  • Ada provides the Ada.Containers.Hashed_Sets and Ada.Containers.Ordered_Sets packages.

As noted in the previous section, in languages which do not directly support sets but do support associative arrays, sets can be emulated using associative arrays, by using the elements as keys, and using a dummy value as the values, which are ignored.

Multiset

A generalization of the notion of a set is that of a multiset or bag, which is similar to a set but allows repeated ("equal") values (duplicates). This is used in two distinct senses: either equal values are considered identical, and are simply counted, or equal values are considered equivalent, and are stored as distinct items. For example, given a list of people (by name) and ages (in years), one could construct a multiset of ages, which simply counts the number of people of a given age. Alternatively, one can construct a multiset of people, where two people are considered equivalent if their ages are the same (but may be different people and have different names), in which case each pair (name, age) must be stored, and selecting on a given age gives all the people of a given age.

Formally, it is possible for objects in computer science to be considered "equal" under some equivalence relation but still distinct under another relation. Some types of multiset implementations will store distinct equal objects as separate items in the data structure; while others will collapse it down to one version (the first one encountered) and keep a positive integer count of the multiplicity of the element.

As with sets, multisets can naturally be implemented using hash table or trees, which yield different performance characteristics.

The set of all bags over type T is given by the expression bag T. If by multiset one considers equal items identical and simply counts them, then a multiset can be interpreted as a function from the input domain to the non-negative integers (natural numbers), generalizing the identification of a set with its indicator function. In some cases a multiset in this counting sense may be generalized to allow negative values, as in Python.

Where a multiset data structure is not available, a workaround is to use a regular set, but override the equality predicate of its items to always return "not equal" on distinct objects (however, such will still not be able to store multiple occurrences of the same object) or use an associative array mapping the values to their integer multiplicities (this will not be able to distinguish between equal elements at all).

Typical operations on bags:

  • contains(B, x): checks whether the element x is present (at least once) in the bag B
  • is_sub_bag(B1, B2): checks whether each element in the bag B1 occurs in B1 no more often than it occurs in the bag B2; sometimes denoted as B1B2.
  • count(B, x): returns the number of times that the element x occurs in the bag B; sometimes denoted as B # x.
  • scaled_by(B, n): given a natural number n, returns a bag which contains the same elements as the bag B, except that every element that occurs m times in B occurs n * m times in the resulting bag; sometimes denoted as nB.
  • union(B1, B2): returns a bag containing just those values that occur in either the bag B1 or the bag B2, except that the number of times a value x occurs in the resulting bag is equal to (B1 # x) + (B2 # x); sometimes denoted as B1B2.

Multisets in SQL

In relational databases, a table can be a (mathematical) set or a multiset, depending on the presence of unicity constraints on some columns (which turns it into a candidate key).

SQL allows the selection of rows from a relational table: this operation will in general yield a multiset, unless the keyword DISTINCT is used to force the rows to be all different, or the selection includes the primary (or a candidate) key.

In ANSI SQL the MULTISET keyword can be used to transform a subquery into a collection expression:

SELECT expression1, expression2... FROM table_name...

is a general select that can be used as subquery expression of another more general query, while

MULTISET(SELECT expression1, expression2... FROM table_name...)

transforms the subquery into a collection expression that can be used in another query, or in assignment to a column of appropriate collection type.

See also

Notes

  1. ^ For example, in Python pick can be implemented on a derived class of the built-in set as follows:
    class Set(set):
        def pick(self):
            return next(iter(self))
    
  2. ^ Element insertion can be done in O(1) time by simply inserting at an end, but if one avoids duplicates this takes O(n) time.

References

  1. ^ Python: pop()
  2. ^ Management and Processing of Complex Data Structures: Third Workshop on Information Systems and Artificial Intelligence, Hamburg, Germany, February 28 - March 2, 1994. Proceedings, ed. Kai v. Luck, Heinz Marburger, p. 76
  3. ^ Python Issue7212: Retrieve an arbitrary element from a set without removing it; see msg106593 regarding standard name
  4. ^ Ruby Feature #4553: Add Set#pick and Set#pop
  5. ^ Inductive Synthesis of Functional Programs: Universal Planning, Folding of Finite Programs, and Schema Abstraction by Analogical Reasoning, Ute Schmid, Springer, Aug 21, 2003, p. 240
  6. ^ Recent Trends in Data Type Specification: 10th Workshop on Specification of Abstract Data Types Joint with the 5th COMPASS Workshop, S. Margherita, Italy, May 30 - June 3, 1994. Selected Papers, Volume 10, ed. Egidio Astesiano, Gianna Reggio, Andrzej Tarlecki, p. 38
  7. ^ Ruby: flatten()
  8. ^ Wang, Thomas (1997), Sorted Linear Hash Table, archived from the original on 2006-01-12
  9. ^ Stephen Adams, "Efficient sets: a balancing act", Journal of Functional Programming 3(4):553-562, October 1993. Retrieved on 2015-03-11.
  10. ^ "ECMAScript 2015 Language Specification – ECMA-262 6th Edition". www.ecma-international.org. Retrieved 2017-07-11.

Read other articles:

PremonitionTheatrical release posterSutradara Mennan Yapo Produser Ashok Amritraj Jon Jashni Adam Shankman Jennifer Gibgot Sunil Perkash Nick Hamson Ditulis olehBill KellyPemeranSandra BullockJulian McMahonNia LongKate NelliganAmber VallettaCourtney Taylor BurnessShyann McClurePenata musikKlaus BadeltSinematograferTorsten LippstockPenyuntingNeil TravisPerusahaanproduksiHyde Park EntertainmentOffspring EntertainmentDistributorTriStar PicturesMetro-Goldwyn-MayerTanggal rilis16 Maret 2007&...

 

Not to be confused with Jules Verne Trophy. Jules Verne AwardThe Jules Verne Award statuette.Awarded forExcellence in exploration, environmental and cinematic achievementsCountryFrance, United StatesPresented byJules Verne AdventuresFirst awarded1992Websitewww.julesverne.org Created in 1992 by Jean-Christophe Jeauffre and Frédéric Dieudonné, the two founders of the Jules Verne Festival, the Jules Verne Awards are a set of awards given annually for excellence in exploration, environmental a...

 

У Вікіпедії є статті про інших людей із прізвищем Родрігес. Клаудіо Морель РодрігесClaudio Morel Rodríguez Особисті дані Повне ім'я Клаудіо Марсело Морель Родрігес Народження 2 лютого 1978(1978-02-02) (45 років)   Асунсьйон, Парагвай Зріст 175 см Вага 76 кг Громадянство  Парагвай Пози

Portuguese basic cable and satellite television news channel Television channel CNN PortugalCountryPortugalBroadcast areaPortugalAngolaMozambiqueCape VerdeVenezuelaNetworkCNNProgrammingLanguage(s)PortuguesePicture format1080i HDTV(downscaled to 16:9 576i for the SDTV feed)OwnershipOwnerMedia Capital(brand under license from Warner Bros. Discovery and CNN Worldwide)Sister channels CNN International TVI TVI Ficção TVI Reality TVI Internacional TVI África HistoryLaunched26 February 2009&...

 

Berikut ini daftar jaringan dan stasiun radio di Indonesia. Jaringan nasional Berdasarkan nama Berikut ini jaringan radio dengan stasiun anggota bernama sama di lebih dari satu kota/kabupaten, diurutkan berdasarkan kepemilikan. Catatan: Bidang berwarna biru ( ) merupakan jaringan radio publik (Lihat: Lembaga Penyiaran Publik). Kecuali dikutip, beberapa data jumlah stasiun anggota/pemancar bersumber dari artikel masing-masing jaringan di Wikipedia tanpa referensi. Logo Nama Pemilik Format...

 

مكتبة ومتحف جورج بوش الرئاسية مدخل المكتبة إحداثيات 30°35′48″N 96°21′12″W / 30.5966°N 96.3534°W / 30.5966; -96.3534  معلومات عامة الدولة كوليج ستيشن، تكساس،  الولايات المتحدة الاسم نسبة إلى جورج بوش الأب  سنة التأسيس 6 نوفمبر 1997 النوع مكتب ومتحف الرئيس الأمريكي جورج بوش الأ...

2010 film by Bernard Rose This article is about the film. For the book which the film is based, see Mr Nice (book). For the Tanzanian singer, see Lucas Mkenda. Mr NiceDirected byBernard RoseScreenplay byBernard RoseBased onMr Niceby Howard MarksProduced byLuc RoegStarringRhys IfansChloë SevignyOmid DjaliliCrispin GloverDavid ThewlisCinematographyBernard RoseEdited byTeresa FontBernard RoseMusic byPhilip GlassProductioncompaniesIndependent Film ProductionsKanzamanDistributed byContender Enter...

 

2012 studio album by Attack Attack!This Means WarStudio album by Attack Attack!ReleasedJanuary 17, 2012GenreMetalcore, djent[1]Length36:24LabelRiseProducerCaleb Shomo[2]Attack Attack! chronology Attack Attack!(2010) This Means War(2012) Long Time, No Sea(2021) Singles from This Means War The MotivationReleased: December 20, 2011 The WretchedReleased: January 12, 2012 The RevolutionReleased: July 1, 2012 This Means War is the third studio album by American metalcore ban...

 

Post WWII destabilization plan for Germany This article may have too many section headers. Please help consolidate the article. (August 2023) (Learn how and when to remove this template message) Morgenthau's proposal for the partition of Germany from his 1945 book Germany is Our Problem. The Morgenthau Plan was a proposal to weaken Germany following World War II by eliminating its arms industry and removing or destroying other key industries basic to military strength. This included the remov...

South Korean TV series or program Oh Feel YoungAlso known asOh! Pil-seung and Bong Soon-young Victory, Bong Soon-young Oh! So YoungGenreRomance, ComedyWritten byKang Eun-kyungDirected byJi Young-sooStarringAhn Jae-wookChae RimRyu JinPark Sun-youngCountry of originSouth KoreaOriginal languageKoreanNo. of episodes16ProductionProducersKim Chul-kyu Go Byeong-cheol Kim, Kyung MinProduction locationKoreaRunning timeMondays and Tuesdays at 21:55 (KST)Production companyKim Jong-hak ProductionOri...

 

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.Find sources: Water in Arkansas – news · newspapers · books · scholar · JSTOR (November 2020) This article incl...

 

Brazilian footballer In this Portuguese name, the first or maternal family name is Moraes and the second or paternal family name is Silva. Ronaldo Moraes Ronaldo Moraes da SilvaPersonal informationFull name Ronaldo Moraes da SilvaDate of birth (1962-03-02) 2 March 1962 (age 61)[1]Place of birth São Paulo, BrazilPosition(s) DefenderSenior career*Years Team Apps (Gls)1980–1985 Corinthians 62 (2)1982 → Operário-PR (loan) 1985–1986 Grêmio 1986 Goiás 17 (0)1986–1989 Sa...

Canadian English-language regional sports networks owned by Bell Media and ESPN This article is about the Canadian television network. For the sports information service, see The Sports Network (wire service). Not to be confused with Sportsnet. Television channel The Sports NetworkCountryCanadaHeadquartersBell Media Agincourt, Scarborough, Toronto, OntarioProgrammingLanguage(s)EnglishPicture format1080i (HDTV)(HD feed downgraded to letterboxed 480i for SDTVs)4K (UHDTV) (part-time, selected br...

 

1941 massacre and destruction of the Cretan village of Kondanos by Nazi troops A German soldier in front of one of the signs erected after the razing.The text reads: Kandanos was destroyed in retaliation for the bestial ambush murder of a paratrooper platoon and a half-platoon of military engineers by armed men and women. The Razing of Kandanos (Greek: Καταστροφή της Καντάνου) refers to the complete destruction of the village of Kandanos in Western Crete (Greece) and the ...

 

A series of six Japanese films, about illegal highway racing in the Shuto Expressway 首都高速トライアルMegalopolis Expressway TrialFreeway SpeedwayThe Complete Collection DVD boxsetDirected byFreeway Speedway:Katsuji KanazawaFreeway Speedway 2:Shūji KataokaFreeway Speedway 3:Yoshihiro TsukadaFreeway Speedway 4:Yoshihiro TsukadaFreeway Speedway 5:Yoshihiro TsukadaFreeway Speedway 6:-StarringKeiichi TsuchiyaGitan ŌtsuruYumiko OkayasuDaisuke NagakuraArthur KurodaKazuhiko NishimuraIkuo...

Chicago L station Not to be confused with Washington/State, Washington/Wabash, or Washington/Wells. Washington 100N36WChicago 'L' rapid transit stationGeneral informationLocation19 North Dearborn Street Chicago, Illinois 60602Coordinates41°52′59″N 87°37′46″W / 41.883164°N 87.62944°W / 41.883164; -87.62944Owned byCity of ChicagoLine(s)Milwaukee-Dearborn SubwayPlatforms1 island platformTracks2ConnectionsRed at Lake via The PedwayConstructionStructure typ...

 

Sports season2012–13 Kazakhstan Hockey ChampionshipLeagueKazakhstan Hockey ChampionshipSportIce HockeyNumber of teams10Regular seasonWinnersArlan KokshetauPlayoffsFinalsChampionsYertis Pavlodar  Runners-upBeibarys AtyrauKazakhstan Hockey Championship seasons← 2011–122013–14 → The 2012–13 Kazakhstan Hockey Championship was the 21st season of the Kazakhstan Hockey Championship, the top level of ice hockey in Kazakhstan. 10 teams participated in the league, and Ye...

 

2021 studio album by Isaiah Collier & The Chosen FewCosmic TransitionsStudio album by Isaiah Collier & The Chosen FewReleased12 May 2021 (12 May 2021)Recorded23 September 2020StudioVan Gelder StudioGenre Free jazz[1] spiritual jazz[2] Length56:27LabelDivision 81Isaiah Collier & The Chosen Few chronology Lift Every Voice (EP)(2020) Cosmic Transitions(2021) Beyond(2022) Cosmic Transitions is the second studio album by jazz quartet Isaiah Collier & Th...

American football player (born 1988) American football player Nate SolderSolder with the New England Patriots in 2011No. 77, 76Position:Offensive tacklePersonal informationBorn: (1988-04-12) April 12, 1988 (age 35)Denver, Colorado, U.S.Height:6 ft 9 in (2.06 m)Weight:316 lb (143 kg)Career informationHigh school:Buena Vista (Buena Vista, Colorado)College:Colorado (2006–2010)NFL Draft:2011 / Round: 1 / Pick: 17Career history New England Patrio...

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: The Legend of 1900 soundtrack – news · newspapers · books · scholar · JSTOR (June 2016) (Learn how and when to remove this template message) 1999 soundtrack album by Ennio MorriconeThe Legend of 1900Soundtrack album by Ennio MorriconeReleasedOctobe...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!