Standard ML is a functional programming language with some impure features. Programs written in Standard ML consist of expressions in contrast to statements or commands, although some expressions of type unit are only evaluated for their side-effects.
Functions
Like all functional languages, a key feature of Standard ML is the function, which is used for abstraction. The factorial function can be expressed as follows:
funfactorialn=ifn=0then1elsen*factorial(n-1)
Type inference
An SML compiler must infer the static type valfactorial:int->int without user-supplied type annotations. It has to deduce that n is only used with integer expressions, and must therefore itself be an integer, and that all terminal expressions are integer expressions.
Declarative definitions
The same function can be expressed with clausal function definitions where the if-then-else conditional is replaced with templates of the factorial function evaluated for specific values:
Here, the keyword val introduces a binding of an identifier to a value, fn introduces an anonymous function, and rec allows the definition to be self-referential.
Local definitions
The encapsulation of an invariant-preserving tail-recursive tight loop with one or more accumulator parameters within an invariant-free outer function, as seen here, is a common idiom in Standard ML.
Using a local function, it can be rewritten in a more efficient tail-recursive style:
A type synonym is defined with the keyword type. Here is a type synonym for points on a plane, and functions computing the distances between two points, and the area of a triangle with the given corners as per Heron's formula. (These definitions will be used in subsequent examples).
Standard ML provides strong support for algebraic datatypes (ADT). A data type can be thought of as a disjoint union of tuples (or a "sum of products"). They are easy to define and easy to use, largely because of pattern matching, and most Standard ML implementations' pattern-exhaustiveness checking and pattern redundancy checking.
In object-oriented programming languages, a disjoint union can be expressed as class hierarchies. However, in contrast to class hierarchies, ADTs are closed. Thus, the extensibility of ADTs is orthogonal to the extensibility of class hierarchies. Class hierarchies can be extended with new subclasses which implement the same interface, while the functions of ADTs can be extended for the fixed set of constructors. See expression problem.
A datatype is defined with the keyword datatype, as in:
datatypeshape=Circleofloc*real(* center and radius *)|Squareofloc*real(* upper-left corner and side length; axis-aligned *)|Triangleofloc*loc*loc(* corners *)
Note that a type synonym cannot be recursive; datatypes are necessary to define recursive constructors. (This is not at issue in this example.)
Pattern matching
Patterns are matched in the order in which they are defined. C programmers can use tagged unions, dispatching on tag values, to do what ML does with datatypes and pattern matching. Nevertheless, while a C program decorated with appropriate checks will, in a sense, be as robust as the corresponding ML program, those checks will of necessity be dynamic; ML's static checks provide strong guarantees about the correctness of the program at compile time.
Function arguments can be defined as patterns as follows:
funarea(Circle(_,r))=Math.pi*squarer|area(Square(_,s))=squares|area(Trianglep)=heronp(* see above *)
The so-called "clausal form" of function definition, where arguments are defined as patterns, is merely syntactic sugar for a case expression:
There is no pattern for the Triangle case in the center function. The compiler will issue a warning that the case expression is not exhaustive, and if a Triangle is passed to this function at runtime, exceptionMatch will be raised.
Redundancy checking
The pattern in the second clause of the following (meaningless) function is redundant:
funf(Circle((x,y),r))=x+y|f(Circle_)=1.0|f_=0.0
Any value that would match the pattern in the second clause would also match the pattern in the first clause, so the second clause is unreachable. Therefore, this definition as a whole exhibits redundancy, and causes a compile-time warning.
The following function definition is exhaustive and not redundant:
valhasCorners=fn(Circle_)=>false|_=>true
If control gets past the first pattern (Circle), we know the shape must be either a Square or a Triangle. In either of those cases, we know the shape has corners, so we can return true without discerning the actual shape.
Higher-order functions
Functions can consume functions as arguments:
funmapf(x,y)=(fx,fy)
Functions can produce functions as return values:
funconstantk=(fn_=>k)
Functions can also both consume and produce functions:
funcompose(f,g)=(fnx=>f(gx))
The function List.map from the basis library is one of the most commonly used higher-order functions in Standard ML:
funmap_[]=[]|mapf(x::xs)=fx::mapfxs
A more efficient implementation with tail-recursive List.foldl:
funmapf=List.revoList.foldl(fn(x,acc)=>fx::acc)[]
Exceptions
Exceptions are raised with the keyword raise and handled with the pattern matching handle construct. The exception system can implement non-local exit; this optimization technique is suitable for functions like the following.
When exceptionZero is raised, control leaves the function List.foldl altogether. Consider the alternative: the value 0 would be returned, it would be multiplied by the next integer in the list, the resulting value (inevitably 0) would be returned, and so on. The raising of the exception allows control to skip over the entire chain of frames and avoid the associated computation. Note the use of the underscore (_) as a wildcard pattern.
The same optimization can be obtained with a tail call.
Standard ML's advanced module system allows programs to be decomposed into hierarchically organized structures of logically related type and value definitions. Modules provide not only namespace control but also abstraction, in the sense that they allow the definition of abstract data types. Three main syntactic constructs comprise the module system: signatures, structures and functors.
Signatures
A signature is an interface, usually thought of as a type for a structure; it specifies the names of all entities provided by the structure, the arity of each type component, the type of each value component, and the signature of each substructure. The definitions of type components are optional; type components whose definitions are hidden are abstract types.
This signature describes a module that provides a polymorphic type 'aqueue, exceptionQueueError, and values that define basic operations on queues.
Structures
A structure is a module; it consists of a collection of types, exceptions, values and structures (called substructures) packaged together into a logical unit.
This definition declares that structureTwoListQueue implements signatureQUEUE. Furthermore, the opaque ascription denoted by :> states that any types which are not defined in the signature (i.e. type'aqueue) should be abstract, meaning that the definition of a queue as a pair of lists is not visible outside the module. The structure implements all of the definitions in the signature.
The types and values in a structure can be accessed with "dot notation":
A functor is a function from structures to structures; that is, a functor accepts one or more arguments, which are usually structures of a given signature, and produces a structure as its result. Functors are used to implement generic data structures and algorithms.
One popular algorithm[6] for breadth-first search of trees makes use of queues. Here is a version of that algorithm parameterized over an abstract queue structure:
(* after Okasaki, ICFP, 2000 *)functorBFS(Q:QUEUE)=structdatatype'atree=E|Tof'a*'atree*'atreelocalfunbfsQq=ifQ.isEmptyqthen[]elsesearch(Q.removeq)andsearch(E,q)=bfsQq|search(T(x,l,r),q)=x::bfsQ(insert(insertql)r)andinsertqa=Q.insert(a,q)infunbfst=bfsQ(Q.singletont)endendstructureQueueBFS=BFS(TwoListQueue)
Within functorBFS, the representation of the queue is not visible. More concretely, there is no way to select the first list in the two-list queue, if that is indeed the representation being used. This data abstraction mechanism makes the breadth-first search truly agnostic to the queue's implementation. This is in general desirable; in this case, the queue structure can safely maintain any logical invariants on which its correctness depends behind the bulletproof wall of abstraction.
Here, the classic mergesort algorithm is implemented in three functions: split, merge and mergesort. Also note the absence of types, with the exception of the syntax op:: and [] which signify lists. This code will sort lists of any type, so long as a consistent ordering function cmp is defined. Using Hindley–Milner type inference, the types of all variables can be inferred, even complicated types such as that of the function cmp.
Split
funsplit is implemented with a stateful closure which alternates between true and false, ignoring the input:
funalternator{}=letvalstate=reftrueinfna=>!statebeforestate:=not(!state)end(* Split a list into near-halves which will either be the same length, * or the first will have one more element than the other. * Runs in O(n) time, where n = |xs|. *)funsplitxs=List.partition(alternator{})xs
Merge
Merge uses a local function loop for efficiency. The inner loop is defined in terms of cases: when both lists are non-empty (x::xs) and when one list is empty ([]).
This function merges two sorted lists into one sorted list. Note how the accumulator acc is built backwards, then reversed before being returned. This is a common technique, since 'alist is represented as a linked list; this technique requires more clock time, but the asymptotics are not worse.
(* Merge two ordered lists using the order cmp. * Pre: each list must already be ordered per cmp. * Runs in O(n) time, where n = |xs| + |ys|. *)funmergecmp(xs,[])=xs|mergecmp(xs,y::ys)=letfunloop(a,acc)(xs,[])=List.revAppend(a::acc,xs)|loop(a,acc)(xs,y::ys)=ifcmp(a,y)thenloop(y,a::acc)(ys,xs)elseloop(a,y::acc)(xs,ys)inloop(y,[])(ys,xs)end
Mergesort
The main function:
funapf(x,y)=(fx,fy)(* Sort a list in according to the given ordering operation cmp. * Runs in O(n log n) time, where n = |xs|. *)funmergesortcmp[]=[]|mergesortcmp[x]=[x]|mergesortcmpxs=(mergecmpoap(mergesortcmp)osplit)xs
The IntInf module provides arbitrary-precision integer arithmetic. Moreover, integer literals may be used as arbitrary-precision integers without the programmer having to do anything.
The following program implements an arbitrary-precision factorial function:
Curried functions have many applications, such as eliminating redundant code. For example, a module may require functions of type a->b, but it is more convenient to write functions of type a*c->b where there is a fixed relationship between the objects of type a and c. A function of type c->(a*c->b)->a->b can factor out this commonality. This is an example of the adapter pattern.[citation needed]
In this example, fund computes the numerical derivative of a given function f at point x:
The type of fund indicates that it maps a "float" onto a function with the type (real->real)->real->real. This allows us to partially apply arguments, known as currying. In this case, function d can be specialised by partially applying it with the argument delta. A good choice for delta when using this algorithm is the cube root of the machine epsilon.[citation needed]
-vald'=d1E~8;vald'=fn:(real->real)->real->real
The inferred type indicates that d' expects a function with the type real->real as its first argument. We can compute an approximation to the derivative of at . The correct answer is .
-d'(fnx=>x*x*x-x-1.0)3.0;valit=25.9999996644:real
Libraries
Standard
The Basis Library[7] has been standardized and ships with most implementations. It provides modules for trees, arrays, and other data structures, and input/output and system interfaces.
Moscow ML: a light-weight implementation, based on the Caml Light runtime engine which implements the full Standard ML language, including modules and much of the basis library
Poly/ML: a full implementation of Standard ML that produces fast code and supports multicore hardware (via Portable Operating System Interface (POSIX) threads); its runtime system performs parallel garbage collection and online sharing of immutable substructures.
ML KitArchived 2016-01-07 at the Wayback Machine: an implementation based very closely on the Definition, integrating a garbage collector (which can be disabled) and region-based memory management with automatic inference of regions, aiming to support real-time applications
SML#: an extension of SML providing record polymorphism and C language interoperability. It is a conventional native compiler and its name is not an allusion to running on the .NET framework
SOSML: an implementation written in TypeScript, supporting most of the SML language and select parts of the basis library
Research
CakeML is a REPL version of ML with formally verified runtime and translation to assembler.
Isabelle (Isabelle/MLArchived 2020-08-30 at the Wayback Machine) integrates parallel Poly/ML into an interactive theorem prover, with a sophisticated IDE (based on jEdit) for official Standard ML (SML'97), the Isabelle/ML dialect, and the proof language. Starting with Isabelle2016, there is also a source-level debugger for ML.
All of these implementations are open-source and freely available. Most are implemented themselves in Standard ML. There are no longer any commercial implementations; Harlequin, now defunct, once produced a commercial IDE and compiler called MLWorks which passed on to Xanalys and was later open-sourced after it was acquired by Ravenbrook Limited on April 26, 2013.
Major projects using SML
The IT University of Copenhagen's entire enterprise architecture is implemented in around 100,000 lines of SML, including staff records, payroll, course administration and feedback, student project management, and web-based self-service interfaces.[8]
^ abMilner, Robin; Tofte, Mads; Harper, Robert; MacQueen, David (1997). The Definition of Standard ML (Revised). MIT Press. ISBN0-262-63181-4.
^ abOkasaki, Chris (2000). "Breadth-First Numbering: Lessons from a Small Exercise in Algorithm Design". International Conference on Functional Programming 2000. ACM.