Sequential pattern mining

Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence.[1][2] It is usually presumed that the values are discrete, and thus time series mining is closely related, but usually considered a different activity. Sequential pattern mining is a special case of structured data mining.

There are several key traditional computational problems addressed within this field. These include building efficient databases and indexes for sequence information, extracting the frequently occurring patterns, comparing sequences for similarity, and recovering missing sequence members. In general, sequence mining problems can be classified as string mining which is typically based on string processing algorithms and itemset mining which is typically based on association rule learning. Local process models [3] extend sequential pattern mining to more complex patterns that can include (exclusive) choices, loops, and concurrency constructs in addition to the sequential ordering construct.

String mining

String mining typically deals with a limited alphabet for items that appear in a sequence, but the sequence itself may be typically very long. Examples of an alphabet can be those in the ASCII character set used in natural language text, nucleotide bases 'A', 'G', 'C' and 'T' in DNA sequences, or amino acids for protein sequences. In biology applications analysis of the arrangement of the alphabet in strings can be used to examine gene and protein sequences to determine their properties. Knowing the sequence of letters of a DNA or a protein is not an ultimate goal in itself. Rather, the major task is to understand the sequence, in terms of its structure and biological function. This is typically achieved first by identifying individual regions or structural units within each sequence and then assigning a function to each structural unit. In many cases this requires comparing a given sequence with previously studied ones. The comparison between the strings becomes complicated when insertions, deletions and mutations occur in a string.

A survey and taxonomy of the key algorithms for sequence comparison for bioinformatics is presented by Abouelhoda & Ghanem (2010), which include:[4]

  • Repeat-related problems: that deal with operations on single sequences and can be based on exact string matching or approximate string matching methods for finding dispersed fixed length and maximal length repeats, finding tandem repeats, and finding unique subsequences and missing (un-spelled) subsequences.
  • Alignment problems: that deal with comparison between strings by first aligning one or more sequences; examples of popular methods include BLAST for comparing a single sequence with multiple sequences in a database, and ClustalW for multiple alignments. Alignment algorithms can be based on either exact or approximate methods, and can also be classified as global alignments, semi-global alignments and local alignment. See sequence alignment.

Itemset mining

Some problems in sequence mining lend themselves to discovering frequent itemsets and the order they appear, for example, one is seeking rules of the form "if a {customer buys a car}, he or she is likely to {buy insurance} within 1 week", or in the context of stock prices, "if {Nokia up and Ericsson up}, it is likely that {Motorola up and Samsung up} within 2 days". Traditionally, itemset mining is used in marketing applications for discovering regularities between frequently co-occurring items in large transactions. For example, by analysing transactions of customer shopping baskets in a supermarket, one can produce a rule which reads "if a customer buys onions and potatoes together, he or she is likely to also buy hamburger meat in the same transaction".

A survey and taxonomy of the key algorithms for item set mining is presented by Han et al. (2007).[5]

The two common techniques that are applied to sequence databases for frequent itemset mining are the influential apriori algorithm and the more-recent FP-growth technique.

Applications

With a great variation of products and user buying behaviors, shelf on which products are being displayed is one of the most important resources in retail environment. Retailers can not only increase their profit but, also decrease cost by proper management of shelf space allocation and products display. To solve this problem, George and Binu (2013) have proposed an approach to mine user buying patterns using PrefixSpan algorithm and place the products on shelves based on the order of mined purchasing patterns.[6]

Algorithms

Commonly used algorithms include:

  • GSP algorithm
  • Sequential Pattern Discovery using Equivalence classes (SPADE)
  • FreeSpan
  • PrefixSpan
  • MAPres[7]
  • Seq2Pat (for constraint-based sequential pattern mining)[8][9]

See also

References

  1. ^ Mabroukeh, N. R.; Ezeife, C. I. (2010). "A taxonomy of sequential pattern mining algorithms". ACM Computing Surveys. 43: 1–41. CiteSeerX 10.1.1.332.4745. doi:10.1145/1824795.1824798. S2CID 207180619.
  2. ^ Bechini, A.; Bondielli, A.; Dell'Oglio, P.; Marcellonii, F. (2023). "From basic approaches to novel challenges and applications in Sequential Pattern Mining". Applied Computing and Intelligence. 3 (1): 44–78. doi:10.3934/aci.2023004.
  3. ^ Tax, N.; Sidorova, N.; Haakma, R.; van der Aalst, Wil M. P. (2016). "Mining Local Process Models". Journal of Innovation in Digital Ecosystems. 3 (2): 183–196. arXiv:1606.06066. doi:10.1016/j.jides.2016.11.001. S2CID 10872379.
  4. ^ Abouelhoda, M.; Ghanem, M. (2010). "String Mining in Bioinformatics". In Gaber, M. M. (ed.). Scientific Data Mining and Knowledge Discovery. Springer. doi:10.1007/978-3-642-02788-8_9. ISBN 978-3-642-02787-1.
  5. ^ Han, J.; Cheng, H.; Xin, D.; Yan, X. (2007). "Frequent pattern mining: current status and future directions". Data Mining and Knowledge Discovery. 15 (1): 55–86. doi:10.1007/s10618-006-0059-1.
  6. ^ George, A.; Binu, D. (2013). "An Approach to Products Placement in Supermarkets Using PrefixSpan Algorithm". Journal of King Saud University-Computer and Information Sciences. 25 (1): 77–87. doi:10.1016/j.jksuci.2012.07.001.
  7. ^ Ahmad, Ishtiaq; Qazi, Wajahat M.; Khurshid, Ahmed; Ahmad, Munir; Hoessli, Daniel C.; Khawaja, Iffat; Choudhary, M. Iqbal; Shakoori, Abdul R.; Nasir-ud-Din (1 May 2008). "MAPRes: Mining association patterns among preferred amino acid residues in the vicinity of amino acids targeted for post-translational modifications". Proteomics. 8 (10): 1954–1958. doi:10.1002/pmic.200700657. PMID 18491291. S2CID 22362167.
  8. ^ Hosseininasab A, van Hoeve WJ, Cire AA (2019). "Constraint-Based Sequential Pattern Mining with Decision Diagrams". Proceedings of the AAAI Conference on Artificial Intelligence. 33: 1495–1502. arXiv:1811.06086. doi:10.1609/aaai.v33i01.33011495. S2CID 53427299.
  9. ^ "Seq2Pat: Sequence-to-Pattern Generation Library". GitHub. 9 April 2022.
  • SPMF includes open-source implementations of GSP, PrefixSpan, SPADE, SPAM and many others.

Read other articles:

International athletics championship eventSenior men's race at the 1993 IAAF World Cross Country ChampionshipsOrganisersIAAFEdition21stDateMarch 28Host cityAmorebieta, Euskadi, Spain VenueJaureguibarría CourseEvents1Distances11.75 km – Senior menParticipation236 athletes from 45 nations← 1992 Boston 1994 Budapest → The Senior men's race at the 1993 IAAF World Cross Country Championships was held in Amorebieta, Spain, at the Jaureguibarría Course on March 28, 1993. A report on ...

 

GKC beralih ke halaman ini. Untuk kegunaan lain, lihat GKC (disambiguasi). G. K. ChestertonG. K. Chesterton, foto dari E. H. Mills, 1909.LahirGilbert Keith Chesterton(1874-05-29)29 Mei 1874Kensington, London, InggrisMeninggal14 Juni 1936(1936-06-14) (umur 62)Beaconsfield, Buckinghamshire, InggrisMakamPemakaman Katolik Roma, BeaconsfieldPekerjaanJurnalis, novelis, esaisBahasaInggrisKewarganegaraanInggrisPendidikanSt Paul's School (London)AlmamaterSlade School of ArtPeriode1900–1936Genre...

 

Bandar Udara Internasional TontoutaAéroport de Nouméa - La TontoutaIATA: NOUICAO: NWWW NOULokasi bandar udara di Kaledonia BariInformasiJenisPublikPengelolaKaledonia Baru Kamar Dagang & IndustriLokasiPaïta, Kaledonia BaruKetinggian dpl16 mdplKoordinat22°00′59″S 166°12′58″E / 22.01639°S 166.21611°E / -22.01639; 166.21611Landasan pacu Arah Panjang Permukaan kaki m 11/29 10,663 3,250 Aspal Bandar Udara Internasional La Tontouta atau Bandar Uda...

おはようとちぎホームタウンとちぎ 185系「おはようとちぎ」概要国 日本種類 特別急行列車現況 廃止地域 東京都・埼玉県・茨城県・栃木県運行開始 1985年3月14日※「新特急なすの」として運行終了 2010年12月3日運営者 日本国有鉄道(国鉄)→東日本旅客鉄道(JR東日本)路線起点 新宿駅終点 黒磯駅使用路線 山手線・東北本線(宇都宮線)車内サービスクラス グリーン車

 

Oki Setiana DewiLahir13 Januari 1989 (umur 34)Batam, Kepulauan Riau, IndonesiaPekerjaanAktrispenulispendakwahpresenterproduser filmTahun aktif2008—sekarangSuami/istriOry Vitrio ​(m. 2014)​Anak4Keluarga Ria Ricis (adik) Teuku Ryan (adik ipar) Dr. Hj. Oki Setiana Dewi, S.Hum., M.Pd.[1] (lahir 13 Januari 1989)[2][3][4] adalah seorang pemeran, penulis, pendakwah, presenter, dan produser film berkebangsaan Indonesia. Karier O...

 

This article may be written from a fan's point of view, rather than a neutral point of view. Please clean it up to conform to a higher standard of quality, and to make it neutral in tone. (August 2013) (Learn how and when to remove this template message) 2010 soundtrack album by A. R. RahmanVinnaithaandi VaruvaayaaSoundtrack album by A. R. RahmanReleased6 January 2010 (2010-01-06)Recorded2009–2010StudioPanchathan Record Inn and AM Studios, ChennaiGenreFeature film sou...

Pink Floyd's hit Another Brick in the Wall (Part 2) spent the most weeks at number one in 1980 and went on to become the year's highest-selling record. RPM was a Canadian magazine that published the best-performing singles of Canada from 1964 to 2000. During 1980, twenty-two singles reached number one. American rock band Styx achieved the first number-one single of the year, Babe, while Beatle John Lennon became the last musician to peak at the summit during the year with (Just Like) Starting...

 

The Emswave is a ship built in Bangladesh Bangladesh has a long history of shipbuilding. It has over 200 shipbuilding companies.[1][2] Some of the leading shipbuilding companies of Bangladesh include Ananda Shipyard & Slipways Limited, FMC Dockyard Limited, Western Marine Shipyard, Chittagong Dry Dock Limited, Khulna Shipyard and Dockyard and Engineering Works. History Different types of boats and ships used in Bengal Due to the riverine geography of Bangladesh, ships have...

 

American politician This article uses bare URLs, which are uninformative and vulnerable to link rot. Please consider converting them to full citations to ensure the article remains verifiable and maintains a consistent citation style. Several templates and tools are available to assist in formatting, such as reFill (documentation) and Citation bot (documentation). (August 2022) (Learn how and when to remove this template message) Shane SchoellerSpeaker of the Missouri House of Representatives...

American novel Tricky Business First editionAuthorDave BarryCountryUnited StatesLanguageEnglishGenreNovel/ HumorPublisherPutnamPublication date2002ISBN978-1491509692Preceded byBig Trouble Followed byInsane City  Tricky Business is Dave Barry's second novel. It was first published in 2002. Like his previous novel, Big Trouble, its events take place over only 1-2 days, in and around Miami, Florida. Synopsis The Extravaganza of the Seas is a gambling ship headed into a tropic...

 

60th season in franchise history 2019 Dallas Cowboys seasonOwnerJerry JonesGeneral managerJerry JonesHead coachJason GarrettHome fieldAT&T StadiumResultsRecord8–8Division place2nd NFC EastPlayoff finishDid not qualifyPro Bowlers 6 RB Ezekiel Elliott C Travis Frederick G Zack Martin OT Tyron Smith WR Amari Cooper (alternate) LB Jaylon Smith (alternate) AP All-ProsRG Zack Martin (1st team)Uniform ← 2018 Cowboys seasons 2020 → Dallas taking the field before a pre...

 

Matius 18Lukisan Yesus dan anak-anak kecil karya Carl Heinrich Bloch, berdasarkan Injil Matius 8:1-6.KitabInjil MatiusKategoriInjilBagian Alkitab KristenPerjanjian BaruUrutan dalamKitab Kristen1← pasal 17 pasal 19 → Matius 18 (disingkat Mat 18) adalah pasal kedelapan belas Injil Matius pada Perjanjian Baru dalam Alkitab Kristen, yang disusun menurut catatan dan kesaksian Matius, salah seorang dari Keduabelas Rasul Yesus Kristus.[1][2][3][4] Teks Nas...

Political party in European Union Conservative Group Swedish: Konservativa gruppenFinnish: Konservatiivinen ryhmäDanish: Den Konservative GruppeNorwegian: Den konservative gruppenIcelandic: Flokkahópur hægrimannaChairman Hans WallmarkVice Chairman Michael TetzschnerHeadquartersChristiansborg DK-1240 København KYouth wingNordic Young Conservative UnionIdeologyConservatismLiberal conservatismEconomic liberalismPolitical positionCentre-rightEuropean affiliationEuropean People's Part...

 

Halliwells LLPHeadquartersManchester, United KingdomNo. of officesFourNo. of lawyers372 (2010)[1]No. of employees848 (2010)[1]Major practice areasGeneral practiceRevenue£67 million (2010)[1]Date founded2004Company typeLimited liability partnershipDissolved20 July 2010 Halliwells LLP was an English law firm practising from offices in Manchester, London, Liverpool and Sheffield, with 116 partners and around 850 employees.[...

 

City in Gyeonggi Province, South Korea Specific city in Gyeonggi Province, South KoreaNamyangju 남양주시Specific cityKorean transcription(s) • Hangul남양주시 • Hanja南楊州市 • Revised RomanizationNamyangju-si • McCune-ReischauerNamyangju-si FlagEmblem of NamyangjuLocation in South KoreaCountry South KoreaRegionGyeonggi Province (Sudogwon)Administrative divisions6 eup, 3 myeon, 7 dongGovernment •&#...

School associated with Visva Bharati University This article relies excessively on references to primary sources. Please improve this article by adding secondary or tertiary sources. Find sources: Sangit Bhavana – news · newspapers · books · scholar · JSTOR (September 2019) (Learn how and when to remove this template message) Sangit BhavanaVisva-Bharati UniversitySangit Bhavana, SantiniketanLocationSantiniketan, West Bengal, IndiaCoordinates23°40′52...

 

Duta Besar Amerika Serikat untuk BrasilSegel Kementerian Dalam Negeri Amerika SerikatDicalonkan olehPresiden Amerika SerikatDitunjuk olehPresidendengan nasehat Senat Berikut ini adalah daftar Duta Besar Amerika Serikat untuk Brasil Daftar Perwakilan Diangkat oleh Condy Raguet John Quincy Adams William Tudor Ethan A. Brown Andrew Jackson William Hunter Andrew Jackson & John Tyler George H. Proffit John Tyler Henry A. Wise David Tod James K. Polk Robert C. Schenck Millard Fillmore William T...

 

この項目には、JIS X 0213:2004 で規定されている文字(ハートマーク)が含まれています(詳細)。 すだ あき菅田 愛貴 菅田 愛貴プロフィール愛称 あきちゃん、あきたん生年月日 2004年12月20日現年齢 19歳出身地 東京都公称サイズ(2022年7月[1]時点)身長 154.2 cm 単位系換算身長 / 体重5′ 2″ / ― lb活動デビュー 2017年ジャンル J-POP所属グループ 超ときめき♡宣伝部活...

Malea pomumPhân loại khoa họcGiới (regnum)AnimaliaNgành (phylum)MolluscaLớp (class)GastropodaLiên họ (superfamilia)Tonnoidea(không phân hạng)nhánh Caenogastropodanhánh Hypsogastropodanhánh LittorinimorphaHọ (familia)TonnidaeChi (genus)MaleaLoài (species)M. pomumDanh pháp hai phầnMalea pomum(Linnaeus, 1758)[1] Danh pháp đồng nghĩa[2] Danh sách Buccinum pomum Linnaeus, 1758 Cadus pomum Röding, 1798 Cassis labrosa Martini, 1773 Dolium pomum Lamarc...

 

Opera by André Campra André CampraTélèphe (Telephus) is an opera by the French composer André Campra, first performed at the Académie Royale de Musique (the Paris Opera) on 23 or 28 November 1713.[1] It takes the form of a tragédie en musique in a prologue and five acts. The libretto, by Antoine Danchet, is based on the Greek legend of Telephus. References ^ The date is uncertain. According to the original libretto and to Pitou (1983), p. 317, the opera was premiered on 23 Nove...

 

Strategi Solo vs Squad di Free Fire: Cara Menang Mudah!