Share to: share facebook share twitter share wa share telegram print page

Pattern matching

In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be a match." The patterns generally have the form of either sequences or tree structures. Uses of pattern matching include outputting the locations (if any) of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence (i.e., search and replace).

Sequence patterns (e.g., a text string) are often described using regular expressions and matched using techniques such as backtracking.

Tree patterns are used in some programming languages as a general tool to process data based on its structure, e.g. C#,[1] F#,[2] Haskell,[3] ML, Python,[4] Ruby,[5] Rust,[6] Scala,[7] Swift[8] and the symbolic mathematics language Mathematica have special syntax for expressing tree patterns and a language construct for conditional execution and value retrieval based on it.

Often it is possible to give alternative patterns that are tried one by one, which yields a powerful conditional programming construct. Pattern matching sometimes includes support for guards.[citation needed]

History

Early programming languages with pattern matching constructs include COMIT (1957), SNOBOL (1962), Refal (1968) with tree-based pattern matching, Prolog (1972), St Andrews Static Language (SASL) (1976), NPL (1977), and Kent Recursive Calculator (KRC) (1981).

Many text editors support pattern matching of various kinds: the QED editor supports regular expression search, and some versions of TECO support the OR operator in searches.

Computer algebra systems generally support pattern matching on algebraic expressions.[9]

Primitive patterns

The simplest pattern in pattern matching is an explicit value or a variable. For an example, consider a simple function definition in Haskell syntax (function parameters are not in parentheses but are separated by spaces, = is not assignment but definition):

f 0 = 1

Here, 0 is a single value pattern. Now, whenever f is given 0 as argument the pattern matches and the function returns 1. With any other argument, the matching and thus the function fail. As the syntax supports alternative patterns in function definitions, we can continue the definition extending it to take more generic arguments:

f n = n * f (n-1)

Here, the first n is a single variable pattern, which will match absolutely any argument and bind it to name n to be used in the rest of the definition. In Haskell (unlike at least Hope), patterns are tried in order so the first definition still applies in the very specific case of the input being 0, while for any other argument the function returns n * f (n-1) with n being the argument.

The wildcard pattern (often written as _) is also simple: like a variable name, it matches any value, but does not bind the value to any name. Algorithms for matching wildcards in simple string-matching situations have been developed in a number of recursive and non-recursive varieties.[10]

Tree patterns

More complex patterns can be built from the primitive ones of the previous section, usually in the same way as values are built by combining other values. The difference then is that with variable and wildcard parts, a pattern doesn't build into a single value, but matches a group of values that are the combination of the concrete elements and the elements that are allowed to vary within the structure of the pattern.

A tree pattern describes a part of a tree by starting with a node and specifying some branches and nodes and leaving some unspecified with a variable or wildcard pattern. It may help to think of the abstract syntax tree of a programming language and algebraic data types.

In Haskell, the following line defines an algebraic data type Color that has a single data constructor ColorConstructor that wraps an integer and a string.

data Color = ColorConstructor Integer String

The constructor is a node in a tree and the integer and string are leaves in branches.

When we want to write functions to make Color an abstract data type, we wish to write functions to interface with the data type, and thus we want to extract some data from the data type, for example, just the string or just the integer part of Color.

If we pass a variable that is of type Color, how can we get the data out of this variable? For example, for a function to get the integer part of Color, we can use a simple tree pattern and write:

integerPart (ColorConstructor theInteger _) = theInteger

As well:

stringPart (ColorConstructor _ theString) = theString

The creations of these functions can be automated by Haskell's data record syntax.

This OCaml example which defines a red–black tree and a function to re-balance it after element insertion shows how to match on a more complex structure generated by a recursive data type. The compiler verifies at compile-time that the list of cases is exhaustive and none are redundant.

type color = Red | Black
type 'a tree = Empty | Tree of color * 'a tree * 'a * 'a tree

let rebalance t = match t with
    | Tree (Black, Tree (Red, Tree (Red, a, x, b), y, c), z, d)
    | Tree (Black, Tree (Red, a, x, Tree (Red, b, y, c)), z, d)                                  
    | Tree (Black, a, x, Tree (Red, Tree (Red, b, y, c), z, d))
    | Tree (Black, a, x, Tree (Red, b, y, Tree (Red, c, z, d)))
        ->  Tree (Red, Tree (Black, a, x, b), y, Tree (Black, c, z, d))
    | _ -> t (* the 'catch-all' case if no previous pattern matches *)

Filtering data with patterns

Pattern matching can be used to filter data of a certain structure. For instance, in Haskell a list comprehension could be used for this kind of filtering:

[A x|A x <- [A 1, B 1, A 2, B 2]]

evaluates to

[A 1, A 2]

Pattern matching in Mathematica

In Mathematica, the only structure that exists is the tree, which is populated by symbols. In the Haskell syntax used thus far, this could be defined as

data SymbolTree = Symbol String [SymbolTree]

An example tree could then look like

Symbol "a" [Symbol "b" [], Symbol "c" []]

In the traditional, more suitable syntax, the symbols are written as they are and the levels of the tree are represented using [], so that for instance a[b,c] is a tree with a as the parent, and b and c as the children.

A pattern in Mathematica involves putting "_" at positions in that tree. For instance, the pattern

A[_]

will match elements such as A[1], A[2], or more generally A[x] where x is any entity. In this case, A is the concrete element, while _ denotes the piece of tree that can be varied. A symbol prepended to _ binds the match to that variable name while a symbol appended to _ restricts the matches to nodes of that symbol. Note that even blanks themselves are internally represented as Blank[] for _ and Blank[x] for _x.

The Mathematica function Cases filters elements of the first argument that match the pattern in the second argument:[11]

Cases[{a[1], b[1], a[2], b[2]}, a[_] ]

evaluates to

{a[1], a[2]}

Pattern matching applies to the structure of expressions. In the example below,

Cases[ {a[b], a[b, c], a[b[c], d], a[b[c], d[e]], a[b[c], d, e]}, a[b[_], _] ]

returns

{a[b[c],d], a[b[c],d[e]]}

because only these elements will match the pattern a[b[_],_] above.

In Mathematica, it is also possible to extract structures as they are created in the course of computation, regardless of how or where they appear. The function Trace can be used to monitor a computation, and return the elements that arise which match a pattern. For example, we can define the Fibonacci sequence as

fib[0|1]:=1
fib[n_]:= fib[n-1] + fib[n-2]

Then, we can ask the question: Given fib[3], what is the sequence of recursive Fibonacci calls?

Trace[fib[3], fib[_]]

returns a structure that represents the occurrences of the pattern fib[_] in the computational structure:

{fib[3],{fib[2],{fib[1]},{fib[0]}},{fib[1]}}

Declarative programming

In symbolic programming languages, it is easy to have patterns as arguments to functions or as elements of data structures. A consequence of this is the ability to use patterns to declaratively make statements about pieces of data and to flexibly instruct functions how to operate.

For instance, the Mathematica function Compile can be used to make more efficient versions of the code. In the following example the details do not particularly matter; what matters is that the subexpression {{com[_], Integer}} instructs Compile that expressions of the form com[_] can be assumed to be integers for the purposes of compilation:

com[i_] := Binomial[2i, i]
Compile[{x, {i, _Integer}}, x^com[i], {{com[_],  Integer}}]

Mailboxes in Erlang also work this way.

The Curry–Howard correspondence between proofs and programs relates ML-style pattern matching to case analysis and proof by exhaustion.

Pattern matching and strings

By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters.

However, it is possible to perform some string pattern matching within the same framework that has been discussed throughout this article.

Tree patterns for strings

In Mathematica, strings are represented as trees of root StringExpression and all the characters in order as children of the root. Thus, to match "any amount of trailing characters", a new wildcard ___ is needed in contrast to _ that would match only a single character.

In Haskell and functional programming languages in general, strings are represented as functional lists of characters. A functional list is defined as an empty list, or an element constructed on an existing list. In Haskell syntax:

[] -- an empty list
x:xs -- an element x constructed on a list xs

The structure for a list with some elements is thus element:list. When pattern matching, we assert that a certain piece of data is equal to a certain pattern. For example, in the function:

head (element:list) = element

We assert that the first element of head's argument is called element, and the function returns this. We know that this is the first element because of the way lists are defined, a single element constructed onto a list. This single element must be the first. The empty list would not match the pattern at all, as an empty list does not have a head (the first element that is constructed).

In the example, we have no use for list, so we can disregard it, and thus write the function:

head (element:_) = element

The equivalent Mathematica transformation is expressed as

head[element, ]:=element

Example string patterns

In Mathematica, for instance,

StringExpression["a",_]

will match a string that has two characters and begins with "a".

The same pattern in Haskell:

['a', _]

Symbolic entities can be introduced to represent many different classes of relevant features of a string. For instance,

StringExpression[LetterCharacter, DigitCharacter]

will match a string that consists of a letter first, and then a number.

In Haskell, guards could be used to achieve the same matches:

[letter, digit] | isAlpha letter && isDigit digit

The main advantage of symbolic string manipulation is that it can be completely integrated with the rest of the programming language, rather than being a separate, special purpose subunit. The entire power of the language can be leveraged to build up the patterns themselves or analyze and transform the programs that contain them.

SNOBOL

SNOBOL (StriNg Oriented and symBOlic Language) is a computer programming language developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky.

SNOBOL4 stands apart from most programming languages by having patterns as a first-class data type (i.e. a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern concatenation and alternation. Strings generated during execution can be treated as programs and executed.

SNOBOL was quite widely taught in larger US universities in the late 1960s and early 1970s and was widely used in the 1970s and 1980s as a text manipulation language in the humanities.

Since SNOBOL's creation, newer languages such as Awk and Perl have made string manipulation by means of regular expressions fashionable. SNOBOL4 patterns, however, subsume BNF grammars, which are equivalent to context-free grammars and more powerful than regular expressions.[12]

See also

References

  1. ^ "Pattern Matching - C# Guide".
  2. ^ "Pattern Matching - F# Guide".
  3. ^ A Gentle Introduction to Haskell: Patterns
  4. ^ "What's New In Python 3.10 — Python 3.10.0b3 documentation". docs.python.org. Retrieved 2021-07-06.
  5. ^ "pattern_matching - Documentation for Ruby 3.0.0". docs.ruby-lang.org. Retrieved 2021-07-06.
  6. ^ "Pattern Syntax - The Rust Programming Language".
  7. ^ "Pattern Matching". Scala Documentation. Retrieved 2021-01-17.
  8. ^ "Patterns — The Swift Programming Language (Swift 5.1)".
  9. ^ Joel Moses, "Symbolic Integration", MIT Project MAC MAC-TR-47, December 1967
  10. ^ Cantatore, Alessandro (2003). "Wildcard matching algorithms".
  11. ^ "Cases—Wolfram Language Documentation". reference.wolfram.com. Retrieved 2020-11-17.
  12. ^ Gimpel, J. F. 1973. A theory of discrete patterns and their implementation in SNOBOL4. Commun. ACM 16, 2 (Feb. 1973), 91–100. DOI=http://doi.acm.org/10.1145/361952.361960.

Read other articles:

Premier Rides Rechtsform Incorporated Gründung 1994 Sitz Baltimore, Vereinigte Staaten USA Leitung Jim Seay Branche Fahrgeschäfte, Achterbahnen Website premier-rides.com Premier Rides Inc. ist ein US-amerikanischer Hersteller von Achterbahnen aus Baltimore, welcher im Jahre 1994 gegründet wurde. Das Unternehmen verwendete als erstes bei einer Achterbahn die Technik des Linearinduktionsmotor (LIM). Seit 1996 ist Jim Seay alleiniger Besitzer des Unternehmens.[1] Inhaltsverzeichn...

English sprinter (1890–1958) Willie ApplegarthApplegarth with coach Sam Mussabini at the 1912 OlympicsPersonal informationNationalityBritishBorn11 May 1890Guisborough, North Riding of Yorkshire, EnglandDied5 December 1958 (aged 68)Schenectady, New York, United StatesHeight1.70 m (5 ft 7 in)Weight59 kg (130 lb)SportSportAthleticsEvent(s)100 m, 200 mClubPolytechnic Harriers, LondonCoached bySam MussabiniAchievements and titlesPersonal best(s)100 m – 10.6 (1912)...

Ini adalah nama Tionghoa; marganya adalah Duan. Duan Qirui pada 1913. Duan Qirui (段祺瑞; 6 Maret 1865 – 2 November 1936) adalah seorang panglima perang dan politikus Tiongkok, panglima Tentara Beiyang dan pelaksana jabatan Kepala Eksekutif Republik Tiongkok (di Beijing) dari 1924 sampai 1926. Ia juga menjadi Perdana Menteri Republik Tiongkok pada empat kesempatan antara 1913 dan 1918. Ia bisa dikatakan sebagai orang terkuat di Tiongkok dari 1916 sampai 1920. Kehidupan awal...

العلاقات البنينية الطاجيكستانية بنين طاجيكستان   بنين   طاجيكستان تعديل مصدري - تعديل   العلاقات البنينية الطاجيكستانية هي العلاقات الثنائية التي تجمع بين بنين وطاجيكستان.[1][2][3][4][5] مقارنة بين البلدين هذه مقارنة عامة ومرجعية للدولتين: وجه...

LajjaSutradara Rajkumar Santoshi Produser Rajkumar Santoshi Ditulis olehRanjit Kapoor,Rajkumar Santoshi (Dialogues)SkenarioAshok Rawat,Rajkumar SantoshiCeritaRajkumar SantoshiPemeranManisha KoiralaJackie ShroffAnil KapoorMahima ChaudhryMadhuri DixitRekhaSamir SoniAjay DevgnNaratorBharat ShahPenata musikIlaiyaraajaAnu MalikSinematograferMadhu AmbatPenyuntingV. N. MayekarDistributorSantoshi ProductionsTanggal rilis31 Agustus 2001 (2001-08-31)Durasi202 minutesNegara India Bahasa Hindi...

Bacab (pengucapan Maya: [ɓaˈkaɓ]) adalah nama Maya Yukatek yang mengacu kepada empat dewa dari bagian dalam bumi dan kandungan airnya. Empat dewa ini berasal dari masa pra-Hispanik. Dewa-dewa Bacab juga disebut sebagai Pauahtun. Referensi Robert Redfield and Alfonso Villa Rojas,Chan Kom. Chicago University Press. Ralph L. Roys, The Book of Chilam Balam of Chumayel. Norman: University of Oklahoma Press. Ralph L. Roys, Ritual of the Bacabs. Norman: University of Oklahoma Press. David St...

Burmese politician In this Burmese name, the given name is Zura Begum. There is no family name. Zurah Begum MPMember of the Union Parliament from Maungdaw-2In office1951–1956 Personal detailsBorn10 April 1919 (1919-04-10)Akyab,ArakanDied1990(1990-00-00) (aged 70–71)Yangon,MyanmarPolitical partyAnti-Fascist People's Freedom LeagueAlma materMedic,BT Zurah Begum (Rohingya), also known as Aye Nyunt, was one of the first two female legislators of the Union of Burma, along with Khin K...

Beispiel für eine Überlaufschwelle am Abfluss des Fuhsekanals in die Oker im Braunschweiger Südseegebiet Die Überlaufschwelle begrenzt den höchstmöglichen Wasserstand und den Gesamtstauraum in natürlichen oder künstlich geschaffenen Wasserbecken, wie etwa bei einem See, einer Talsperre, einem Kanal, einem Schwimmbecken, einer Zisterne, einer Kläranlage oder Ähnlichem. Im Gegensatz zu anderen technischen Einrichtungen wie beispielsweise einem Wehr ist sie nicht in der Höhe verstellb...

American competitive sailor Charlie BuckinghamPersonal informationNationality United StatesBorn (1989-01-16) January 16, 1989 (age 34)Newport Beach, California, U.S.Height6 ft 2 in (188 cm)Weight180 lb (82 kg)SportCountryUnited StatesSportSailingEventLaserCollege team Georgetown University Medal record Sailing Representing  United States Pan American Games 2019 Lima Men's Laser Charlie Buckingham (born January 16, 1989 in Newport Beach, California)...

Fictional family created by Charles Addams For other uses, see The Addams Family (disambiguation). This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: The Addams Family – news · newspapers · books · scholar · JSTOR (December 2021) (Learn how and when to remove this template message) The Addams FamilyThe Addams F...

1936–2009 amusement park in Florida, United States This article is about the Florida theme park. For the CDP in Polk County, Florida, see Cypress Gardens, Florida. For the South Carolina botanical garden, see Cypress Gardens (South Carolina). For the locality in Queensland, Australia, see Cypress Gardens, Queensland. Cypress GardensPreviously known as Cypress Gardens Adventure ParkThe Gazebo and a Southern belle of Cypress GardensLocationWinter Haven, Florida, United StatesCoordinates27°59...

Mixed martial arts event in 2023 UFC Fight Night: Holloway vs. The Korean ZombieThe poster for UFC Fight Night: Holloway vs. The Korean ZombieInformationPromotionUltimate Fighting ChampionshipDateAugust 26, 2023 (2023-08-26)VenueSingapore Indoor StadiumCitySingaporeAttendance10,263[1]Total gate$1,288,777[1]Event chronology UFC 292: Sterling vs. O'Malley UFC Fight Night: Holloway vs. The Korean Zombie UFC Fight Night: Gane vs. Spivak UFC Fight Night: Holloway vs....

Keuskupan Agung Siena-Colle di Val d'Elsa-MontalcinoArchidioecesis Senensis-Collensis-IlcinensisKatolik Katedral SienaLokasiNegaraItaliaProvinsi gerejawiSiena-Colle di Val d'Elsa-MontalcinoStatistikLuas2.265 km2 (875 sq mi)Populasi- Total- Katolik(per 2010)185.751178,098 (95.9%)Paroki178InformasiDenominasiGereja KatolikRitusRitus RomaPendirianAbad ke-4KatedralCattedrale di S. Maria Assunta (Siena)KonkatedralConcattedrale di Ss. Marziale e Alberto (Colle di V...

The Queensland War Council (1915–1932) was established by the Queensland Government to co-ordinate Queensland's assistance to World War I soldiers and their dependents. History The Queensland Government established the Queensland War Council on 25 September 1915. Its role was to co-ordinate the funding and initiatives for employment and settlement of returned soldiers, and for assistance to the families of those killed.[1] Specifically, there was a concern that without a co-ordinati...

Protected area in Estonia Laidevahe Nature ReserveLocationEstoniaNearest cityKuressaareCoordinates58°20′09″N 22°48′02″E / 58.33583°N 22.80056°E / 58.33583; 22.80056Area2,442 ha (6,030 acres)Established2002[1] Ramsar WetlandDesignated31 March 2003Reference no.1271[2] Laidevahe Nature Reserve (Estonian: Laidevahe looduskaitseala) is a nature reserve situated on Saaremaa in western Estonia, in Saare County. Laidevahe nature reser...

Suburb in Dunedin, New ZealandLeith ValleySuburbWoodhaugh and the mouth of the Leith Valley, seen from Prospect Park immediately to the south. The Gardens Corner, at the mouth of North East Valley, is visible in the background, top right.Coordinates: 45°50′42″S 170°30′23″E / 45.8451°S 170.5065°E / -45.8451; 170.5065CountryNew ZealandCityDunedinLocal authorityDunedin City CouncilArea[1] • Land212 ha (524 acres)Population (J...

State electoral district of New South Wales, Australia This article is about the New South Wales state electoral district. For the former New Brunswick electoral district, see Bathurst (electoral district). BathurstNew South Wales—Legislative AssemblyInteractive map of district boundaries from the 2023 state electionStateNew South WalesDates current1859–presentMPPaul ToolePartyNationalNamesakeBathurst, New South WalesElectors56,841 (2019)Area14,992.77 km2 (5,788.7 sq m...

Filmmaking in North Macedonia Cinema of North MacedoniaNo. of screens18 (2009)[1] • Per capita0.9 per 100,000 (2009)[1]Produced feature films (2010)[2]Fictional4Animated-Documentary-Number of admissions (2011)[3]Total119,575National films24,986 (20.9%) Cinema of North Macedonia refers to film industry based in North Macedonia or any motion-picture made by Macedonians abroad. Janaki and Milton Manaki are considered the founding fathers. The...

René van der Gijp Van der Gijp in de studio van VI (2015) Persoonlijke informatie Volledige naam René van der Gijp Bijnaam Gijp(ie) Geboortedatum 4 april 1961 Geboorteplaats Dordrecht,  Nederland Lengte 183 cm Been rechts Positie rechtsbuiten Clubinformatie Voetbalcarrière geëindigd in 1992 Jeugd 0000–19751975–19761976–1978 DFC Feyenoord Sparta Rotterdam Senioren Seizoen Club W 0(G) 1978–19821982–19841984–19871987–19881988–19891989–19901990–1992Totaal Sparta Rotte...

1987 film directed by Mani Ratnam This article is about the 1987 Tamil film. For other uses, see Nayakan (disambiguation). NayakanTheatrical release posterDirected byMani RatnamWritten byMani RatnamProduced byMuktha SrinivasanMuktha V. RamaswamyG. VenkateswaranStarringKamal HaasanSaranya KarthikaCinematographyP. C. SreeramEdited byB. LeninV. T. VijayanMusic byIlaiyaraajaProductioncompanyMuktha FilmsDistributed byGV FilmsRelease date 21 October 1987 (1987-10-21) Running time155 ...

Kembali kehalaman sebelumnya