Expression templates are a C++template metaprogramming technique that builds structures representing a computation at compile time, where expressions are evaluated only as needed to produce efficient code for the entire computation.[1] Expression templates thus allow programmers to bypass the normal order of evaluation of the C++ language and achieve optimizations such as loop fusion.
Expression templates were invented independently by Todd Veldhuizen and David Vandevoorde;[2][3] it was Veldhuizen who gave them their name.[3] They are a popular technique for the implementation of linear algebra software.[1]
Motivation and example
Consider a library representing vectors and operations on them. One common mathematical operation is to add two vectors u and v, element-wise, to produce a new vector. The obvious C++ implementation of this operation would be an overloadedoperator+ that returns a new vector object:
/// @brief class representing a mathematical 3D vectorclassVec:publicstd::array<double,3>{public:usingstd::array<double,3>::array;// inherit constructor (C++11)// see https://en.cppreference.com/w/cpp/language/using_declaration};/// @brief sum 'u' and 'v' into a new instance of VecVecoperator+(Vecconst&u,Vecconst&v){Vecsum;for(size_ti=0;i<u.size();i++){sum[i]=u[i]+v[i];}returnsum;}
Users of this class can now write Vec x = a + b; where a and b are both instances of Vec.
A problem with this approach is that more complicated expressions such as Vec x = a + b + c are implemented inefficiently. The implementation first produces a temporary Vec to hold a + b, then produces another Vec with the elements of c added in. Even with return value optimization this will allocate memory at least twice and require two loops.
Delayed evaluation solves this problem, and can be implemented in C++ by letting operator+ return an object of an auxiliary type, say VecSum, that represents the unevaluated sum of two Vecs, or a vector with a VecSum, etc. Larger expressions then effectively build expression trees that are evaluated only when assigned to an actual Vec variable. But this requires traversing such trees to do the evaluation, which is in itself costly.[4]
Expression templates implement delayed evaluation using expression trees that only exist at compile time. Each assignment to a Vec, such as Vec x = a + b + c, generates a new Vec constructor if needed by template instantiation. This constructor operates on three Vec; it allocates the necessary memory and then performs the computation. Thus only one memory allocation is performed.
Example implementation of expression templates :
An example implementation of expression templates looks like the following. A base class VecExpression represents any vector-valued expression. It is templated on the actual expression type E to be implemented, per the curiously recurring template pattern. The existence of a base class like VecExpression is not strictly necessary for expression templates to work. It will merely serve as a function argument type to distinguish the expressions from other types (note the definition of a Vec constructor and operator+ below).
template<typenameE>classVecExpression{public:staticconstexprboolis_leaf=false;doubleoperator[](size_ti)const{// Delegation to the actual expression type. This avoids dynamic polymorphism (a.k.a. virtual functions in C++)returnstatic_cast<Econst&>(*this)[i];}size_tsize()const{returnstatic_cast<Econst&>(*this).size();}};
The Boolean is_leaf is there to tag VecExpressions that are "leafs", i.e. that actually contain data. The Vec class is a leaf that stores the coordinates of a fully evaluated vector expression, and becomes a subclass of VecExpression.
classVec:publicVecExpression<Vec>{std::array<double,3>elems;public:staticconstexprboolis_leaf=true;decltype(auto)operator[](size_ti)const{returnelems[i];}decltype(auto)&operator[](size_ti){returnelems[i];}size_tsize()const{returnelems.size();}// construct Vec using initializer list Vec(std::initializer_list<double>init){std::copy(init.begin(),init.end(),elems.begin());}// A Vec can be constructed from any VecExpression, forcing its evaluation.template<typenameE>Vec(VecExpression<E>const&expr){for(size_ti=0;i!=expr.size();++i){elems[i]=expr[i];}}};
The sum of two Vecs is represented by a new type, VecSum, that is templated on the types of the left- and right-hand sides of the sum so that it can be applied to arbitrary pairs of Vec expressions. An overloaded operator+ serves as syntactic sugar for the VecSum constructor. A subtlety intervenes in this case: in order to reference the original data when summing two VecExpressions, VecSum needs to store a constreference to each VecExpression if it is a leaf, otherwise it is a temporary object that needs to be copied to be properly saved.
template<typenameE1,typenameE2>classVecSum:publicVecExpression<VecSum<E1,E2>>{// cref if leaf, copy otherwisetypenamestd::conditional<E1::is_leaf,constE1&,constE1>::type_u;typenamestd::conditional<E2::is_leaf,constE2&,constE2>::type_v;public:staticconstexprboolis_leaf=false;VecSum(E1const&u,E2const&v):_u(u),_v(v){assert(u.size()==v.size());}decltype(auto)operator[](size_ti)const{return_u[i]+_v[i];}size_tsize()const{return_v.size();}};template<typenameE1,typenameE2>VecSum<E1,E2>operator+(VecExpression<E1>const&u,VecExpression<E2>const&v){returnVecSum<E1,E2>(*static_cast<constE1*>(&u),*static_cast<constE2*>(&v));}
With the above definitions, the expression a + b + c is of type
VecSum<VecSum<Vec,Vec>,Vec>
so Vec x = a + b + c invokes the templated Vec constructor Vec(VecExpression<E> const& expr) with its template argument E being this type (meaning VecSum<VecSum<Vec, Vec>, Vec>). Inside this constructor, the loop body
elems[i]=expr[i];
is effectively expanded (following the recursive definitions of operator+ and operator[]on this type) to
elems[i]=a.elems[i]+b.elems[i]+c.elems[i];
with no temporary Vec objects needed and only one pass through each memory block.
Basic Usage :
intmain(){Vecv0={23.4,12.5,144.56};Vecv1={67.12,34.8,90.34};Vecv2={34.90,111.9,45.12};// Following assignment will call the ctor of Vec which accept type of // `VecExpression<E> const&`. Then expand the loop body to // a.elems[i] + b.elems[i] + c.elems[i]Vecsum_of_vec_type=v0+v1+v2;for(size_ti=0;i<sum_of_vec_type.size();++i)std::cout<<sum_of_vec_type[i]<<std::endl;// To avoid creating any extra storage, other than v0, v1, v2// one can do the following (Tested with C++11 on GCC 5.3.0)autosum=v0+v1+v2;for(size_ti=0;i<sum.size();++i)std::cout<<sum[i]<<std::endl;// Observe that in this case typeid(sum) will be VecSum<VecSum<Vec, Vec>, Vec>// and this chaining of operations can go on.}
^ abMatsuzaki, Kiminori; Emoto, Kento (2009). Implementing fusion-equipped parallel skeletons by expression templates. Proc. Int'l Symp. on Implementation and Application of Functional Languages. pp. 72–89.
^Veldhuizen, Todd (2000). Just when you thought your little language was safe: "Expression Templates" in Java. Int'l Symp. Generative and Component-Based Software Engineering. CiteSeerX10.1.1.22.6984.