Xeon Phi launched in 2010. Since it was originally based on an earlier GPU design (codenamed "Larrabee") by Intel[6] that was cancelled in 2009,[7] it shared application areas with GPUs. The main difference between Xeon Phi and a GPGPU like Nvidia Tesla was that Xeon Phi, with an x86-compatible core, could, with less modification, run software that was originally targeted to a standard x86 CPU.
Initially in the form of PCI Express-based add-on cards, a second-generation product, codenamed Knights Landing, was announced in June 2013.[8] These second-generation chips could be used as a standalone CPU, rather than just as an add-in card.
The Xeon Phi product line directly competed with Nvidia's Tesla and AMD Radeon Instinct lines of deep learning and GPGPU cards. It was discontinued due to a lack of demand and Intel's problems with its 10nm node.[12]
History
Code name
Process
Comments
Knights Ferry
45 nm
offered as PCI Express card; derived from Larrabee project
Knights Corner
22 nm
derived from P54C; vector processing unit; first device to be announced as Xeon Phi; AVX-512-like encoding
Knights Landing
14 nm
derived from Silvermont/Airmont (Intel Atom);[13] AVX-512
Knights Mill
14 nm
nearly identical to Knights Landing but optimized for deep learning
Knights Hill
10 nm
cancelled
Background
The Larrabee microarchitecture (in development since 2006[14]) introduced very wide (512-bit) SIMD units to an x86 architecture based processor design, extended to a cache-coherent multiprocessor system connected via a ring bus to memory; each core was capable of four-way multithreading. Due to the design being intended for GPU as well as general purpose computing, the Larrabee chips also included specialised hardware for texture sampling.[15][16] The project to produce a retail GPU product directly from the Larrabee research project was terminated in May 2010.[17]
Another contemporary Intel research project implementing x86 architecture on a many-multicore processor was the 'Single-chip Cloud Computer' (prototype introduced 2009[18]), a design mimicking a cloud computing computer datacentre on a single chip with multiple independent cores: the prototype design included 48 cores per chip with hardware support for selective frequency and voltage control of cores to maximize energy efficiency, and incorporated a mesh network for inter-chip messaging. The design lacked cache-coherent cores and focused on principles that would allow the design to scale to many more cores.[19]
The Teraflops Research Chip (prototype unveiled 2007[20]) is an experimental 80-core chip with two floating-point units per core, implementing a 96-bit VLIW architecture instead of the x86 architecture.[21] The project investigated intercore communication methods, per-chip power management, and achieved 1.01 TFLOPS at 3.16 GHz consuming 62 W of power.[22][23]
Knights Ferry
Intel's Many Integrated Core (MIC) prototype board, named Knights Ferry, incorporating a processor codenamed Aubrey Isle was announced 31 May 2010. The product was stated to be a derivative of the Larrabee project and other Intel research including the Single-chip Cloud Computer.[24][25]
The development product was offered as a PCIe card with 32 in-order cores at up to 1.2 GHz with four threads per core, 2 GB GDDR5 memory,[26] and 8 MB coherent L2 cache (256 KB per core with 32 KB L1 cache), and a power requirement of ~300 W,[26] built at a 45 nm process.[27] In the Aubrey Isle core a 1,024-bit ring bus (512-bit bi-directional) connects processors to main memory.[28] Single-board performance has exceeded 750 GFLOPS.[27] The prototype boards only support single-precision floating-point instructions.[29]
The Knights Corner product line is made at a 22 nm process size, using Intel's Tri-gate technology with more than 50 cores per chip, and is Intel's first many-cores commercial product.[24][27]
In June 2011, SGI announced a partnership with Intel to use the MIC architecture in its high-performance computing products.[31] In September 2011, it was announced that the Texas Advanced Computing Center (TACC) will use Knights Corner cards in their 10-petaFLOPS "Stampede" supercomputer, providing 8 petaFLOPS of compute power.[32] According to "Stampede: A Comprehensive Petascale Computing Environment" the "second-generation Intel (Knights Landing) MICs will be added when they become available, increasing Stampede's aggregate peak performance to at least 15 PetaFLOPS."[33]
On 15 November 2011, Intel showed an early silicon version of a Knights Corner processor.[34][35]
On 5 June 2012, Intel released open source software and documentation regarding Knights Corner.[36]
On 18 June 2012, Intel announced at the 2012 Hamburg International Supercomputing Conference that Xeon Phi will be the brand name used for all products based on their Many Integrated Core architecture.[3][37][38][39][40][41][42] In June 2012, Cray announced it would be offering 22 nm 'Knight's Corner' chips (branded as 'Xeon Phi') as a co-processor in its 'Cascade' systems.[43][44]
In June 2012, ScaleMP announced a virtualization update allowing Xeon Phi as a transparent processor extension, allowing legacy MMX/SSE code to run without code changes.[45]
An important component of the Intel Xeon Phi coprocessor's core is its vector processing unit (VPU).[46]
The VPU features a novel 512-bit SIMD instruction set, officially known as Intel Initial Many Core Instructions (Intel IMCI). Thus, the VPU can execute 16 single-precision (SP) or 8 double-precision (DP) operations per cycle. The VPU also supports Fused Multiply-Add (FMA) instructions and hence can execute 32 SP or 16 DP floating point operations per cycle. It also provides support for integers.
The VPU also features an Extended Math Unit (EMU) that can execute operations such as reciprocal, square root, and logarithm, thereby allowing these operations to be executed in a vector fashion with high bandwidth. The EMU operates by calculating polynomial approximations of these functions.
On 12 November 2012, Intel announced two Xeon Phi coprocessor families using the 22 nm process size: the Xeon Phi 3100 and the Xeon Phi 5110P.[47][48][49] The Xeon Phi 3100 will be capable of more than 1 teraFLOPS of double-precision floating-point instructions with 240 GB/s memory bandwidth at 300 W.[47][48][49] The Xeon Phi 5110P will be capable of 1.01 teraFLOPS of double-precision floating-point instructions with 320 GB/s memory bandwidth at 225 W.[47][48][49] The Xeon Phi 7120P will be capable of 1.2 teraFLOPS of double-precision floating-point instructions with 352 GB/s memory bandwidth at 300 W.
On 17 June 2013, the Tianhe-2 supercomputer was announced[9] by TOP500 as the world's fastest. Tianhe-2 used Intel Ivy Bridge Xeon and Xeon Phi processors to achieve 33.86 petaFLOPS. It was the fastest on the list for two and a half years, lastly in November 2015.[50]
Design and programming
The cores of Knights Corner are based on a modified version of P54C design, used in the original Pentium.[51] The basis of the Intel MIC architecture is to leverage x86 legacy by creating an x86-compatible multiprocessor architecture that can use existing parallelization software tools.[27] Programming tools include OpenMP,[52]OpenCL,[53]Cilk/Cilk Plus and specialised versions of Intel's Fortran, C++[54] and math libraries.[55]
Design elements inherited from the Larrabee project include x86 ISA, 4-way SMT per core, 512-bit SIMD units, 32 KB L1 instruction cache, 32 KB L1 data cache, coherent L2 cache (512 KB per core[56]), and ultra-wide ring bus connecting processors and memory.
The Knights Corner 512-bit SIMD instructions share many intrinsic functions with AVX-512 extension . The instruction set documentation is available from Intel under the extension name of KNC.[57][58][59][60]
Code name for the second-generation MIC architecture product from Intel.[33] Intel officially first revealed details of its second-generation Intel Xeon Phi products on 17 June 2013.[11] Intel said that the next generation of Intel MIC Architecture-based products will be available in two forms, as a coprocessor or a host processor (CPU), and be manufactured using Intel's 14 nm process technology. Knights Landing products will include integrated on-package memory for significantly higher memory bandwidth.
Knights Landing contains up to 72 Airmont (Atom) cores with four threads per core,[75][76] using LGA 3647 socket[77] supporting up to 384 GB of "far" DDR4 2133 RAM and 8–16 GB of stacked "near" 3D MCDRAM, a version of the Hybrid Memory Cube. Each core has two 512-bit vector units and supports AVX-512 SIMD instructions, specifically the Intel AVX-512 Foundational Instructions (AVX-512F) with Intel AVX-512 Conflict Detection Instructions (AVX-512CD), Intel AVX-512 Exponential and Reciprocal Instructions (AVX-512ER), and Intel AVX-512 Prefetch Instructions (AVX-512PF). Support for IMCI has been removed in favor of AVX-512.[78]
On 20 June 2016, Intel launched the Intel Xeon Phi product family x200 based on the Knights Landing architecture, stressing its applicability to not just traditional simulation workloads, but also to machine learning.[80][81] The model lineup announced at launch included only Xeon Phi of bootable form-factor, but two versions of it: standard processors and processors with integrated Intel Omni-Path architecture fabric.[82] The latter is denoted by the suffix F in the model number. Integrated fabric is expected to provide better latency at a lower cost than discrete high-performance network cards.[80]
On 14 November 2016, the 48th list of TOP500 contained two systems using Knights Landing in the Top 10.[83]
The PCIe based co-processor variant of Knight's Landing was never offered to the general market and was discontinued by August 2017.[84] This included the 7220A, 7240P and 7220P coprocessor cards.
Intel announced they were discontinuing Knights Landing in summer 2018.[85]
Models
All models can boost to their peak speeds, adding 200 MHz to their base frequency when running just one or two cores. When running from three to the maximum number of cores, the chips can only boost 100 MHz above the base frequency. All chips run high-AVX code at a frequency reduced by 200 MHz.[86]
Knights Mill is Intel's codename for a Xeon Phi product specialized in deep learning,[99] initially released in December 2017.[100] Nearly identical in specifications to Knights Landing, Knights Mill includes optimizations for better utilization of AVX-512 instructions. Single-precision and variable-precision floating-point performance increased, at the expense of double-precision floating-point performance.
Knights Hill was the codename for the third-generation MIC architecture, for which Intel announced the first details at SC14.[101] It was to be manufactured in a 10 nm process.[102]
In 2017, Intel announced that Knights Hill had been canceled in favor of another architecture built from the ground up to enable Exascale computing in the future. This new architecture is now expected for 2020–2021[needs update].[107][108]
Programming
One performance and programmability study reported that achieving high performance with Xeon Phi still needs help from programmers and that merely relying on compilers with traditional programming models is insufficient.[109] Other studies in various domains, such as life sciences[110] and deep learning,[111] have shown that exploiting the thread- and SIMD-parallelism of Xeon Phi achieves significant speed-ups.
^Intel Corporation (18 June 2012), "Latest Intel Xeon Processors E5 Product Family Achieves Fastest Adoption of New Technology on Top500 List", marketwatch.com, archived from the original on 20 June 2012, retrieved 18 June 2012, Intel Xeon Phi is the new brand name for all future Intel Many Integrated Core Architecture based products targeted at HPC, enterprise, datacenters and workstations. The first Intel Xeon Phi product family member is scheduled for volume production by the end of 2012
^Barker, J; Bowden, J (2013). "Manycore Parallelism through OpenMP". OpenMP in the Era of Low Power Devices and Accelerators. IWOMP. Lecture Notes in Computer Science, vol 8122. Vol. 8122. Springer. pp. 45–57. doi:10.1007/978-3-642-40698-0_4. ISBN978-3-642-40697-3.
^Fang, Jianbin; Sips, Henk; Zhang, Lilun; Xu, Chuanfu; Yonggang, Che; Varbanescu, Ana Lucia (2014). Test-Driving Intel Xeon Phi(PDF). 2014 ACM/SPEC International Conference on Performance Engineering. Archived from the original(PDF) on 11 November 2017. Retrieved 30 December 2013.
^Memeti, Suejb; Pllana, Sabri; Benkner, Siegfried; Pllana, Sabri; Sandrieser, Martin; Bachmayer, Beverly (29 June 2015), Accelerating DNA Sequence Analysis using Intel Xeon Phi, arXiv:1506.08612, Bibcode:2015arXiv150608612M
^Viebke, Andre; Pllana, Sabri; Benkner, Siegfried; Pllana, Sabri; Sandrieser, Martin; Bachmayer, Beverly (30 June 2015), The Potential of the Intel Xeon Phi for Supervised Deep Learning, arXiv:1506.09067, Bibcode:2015arXiv150609067V