An experimental computer made by Hewlett Packard Enterprise
The Machine is an experimental computer made by Hewlett Packard Enterprise. It was created as part of a research project to develop a new type of computer architecture for servers. The design focused on a “memory centric computing” architecture, where NVRAM replaced traditional DRAM and disks in the memory hierarchy. The NVRAM was byte addressable and could be accessed from any CPU via a photonicinterconnect.[1][2] The aim of the project was to build and evaluate this new design.
Hardware overview
The Machine was a computer cluster with many individual nodes connected over a memory fabric. The fabric interconnect used VCSEL-based silicon photonics with a custom chip called the X1.[3] Access to memory is non-uniform and may include multiple hops. The Machine was envisioned to be a rack-scale computer initially with 80 processors and 320 TB of fabric attached memory, with potential for scaling to more enclosures up to 32 ZB.[4][5] The fabric attached memory is not cache coherent and requires software to be aware of this property.[4] Since traditional locks need cache coherency, hardware was added to the bridges to do atomic operations at that level.[4] Each node also has a limited amount of local private cache-coherent memory (256 GB).[6][4] Storage and compute on each node had completely separate power domains.[4]
The whole fabric attached memory of The Machine is too large to be mapped into a processor's virtual address space (which was 48-bits wide[4]). A way is needed to map windows of the fabric attached memory into processor memory. Therefore, communication between each node SoC and the memory pool goes through an FPGA-based “Z-bridge” component that manages memory mapping of the local SoC to the fabric attached memory.[4] The Z-bridge deals with two different kinds of addresses: 53-bit logical Z addresses and 75-bit Z addresses, which allows addressing 8PB and 32ZB respectively.[4] Each Z-bridge also contained a firewall to enforce access control.[7] The interconnect protocol was developed in-house and known as Next Generation Memory Interconnect (NGMI).[4] This protocol evolved into the open Gen-Z standard.[8][9] The Z-bridge connects to the SoC using PCIe, avoiding major software changes.[9]
A half rack prototype of the machine was unveiled at HPE Discover in London in 2016.[10] Each node contained ARMv8-A based Broadcom/Cavium ThunderX2 SoCs.[11][12][3] In total there were 40 32-core SoCs.[13] Due to unavailability of adequate memristor-based NVRAM or phase-change memory, the prototype used 160 TB of battery-backed DRAM.[14][12][15] Despite this setback, software architect Keith Packard said this "can be used to prove the other parts of the design before switching".[4] According to The Register, HPE's partnership with SK Hynix to develop memristor-based NVRAM ran into funding and directional problems and they were working with Sandisk on Resistive RAM (ReRAM) for The Machine.[16] According to The Next Platform, HPE considered switching to Intel Optane DIMMs "when production quantities of are available on the market".[9]
The Next Platform estimated the rack prototype to consume 24 kW to 36 kW of power.[9]
Software overview
Two major software projects were created for the Machine.[17] An experimental version of Linux called Linux++[18] with all the necessary enhancements to configure the hardware and work with traditional programming models.[19] This included bridge configuration, access control and mapping using the DAX subsystem. In parallel, a new operating system (OS) called Carbon[20][21] was announced that would be designed from first principles to take full advantage of an NVRAM based computer.[22][23][24]
Primary workloads for The Machine included in-memory database, Hadoop-style software, and real-time big data analytics.[25][26] HPE claimed that a memory-driven computing design like The Machine could "improve speeds by up to 8000x compared to conventional systems".[27]
In the prototype system, the fabric attached memory of the system was organised by a "top of rack" management server component called The Librarian.[4][28] The Librarian divided the memory into "shelves" of 8GB "books", and hardware protections could be configured on book boundaries.[4] A fine grained 64KB "booklet" was also supported.[4]
The mapping of memory is handled by the OS, while the access controls for the memory are configured by the management infrastructure of The Machine system as a whole.[4] Software needs to be aware that fabric attached memory memory reads can have synchronous errors whilst writes can have asynchronous errors. On the Linux system, when a memory error occurs the SIGBUS operating system signal is used.[4]
Programming model and data structure changes were also explored, including changes to thread libraries and heap data structures to be resilient with non-volatile memory failure modes.[29][30][31][32][33]
History
A few years after HP’s re-discovery of the Memristor,[34] the newly appointed CTO of HP, Martin Fink, created a HP Labs project to build a computer system based on memristor to tackle the slowing of Moore's law. He announced the project at HP’s Discover event in the summer of 2014.[35] Some of the ideas of The Machine also came from Dragonhawk system designs.[4][36] Three-quarters of HP Labs’s 200 staff were focused on the hardware and software of the machine.[22]
Speaking to Bloomberg, HP says it would commercialize The Machine within a few years, “or fall on its face trying.”[35]
Kirk Bresniker served as Chief Architect, and Keith Packard was hired to work on the Linux enhancements.[37][7]Bdale Garbee was hired to manage open source development.[38]
In 2015, Hewlett-Packard separated into two separate companies, HP Inc and Hewlett Packard Enterprise (HPE), with The Machine project assigned to the latter.[39]
In late 2016, Martin Fink retired as HPE CTO.[40] Fink's retirement announcement also said that Hewlett Packard Labs staff would be moved into the Enterprise product group to "align our R&D work on The Machine with the business".[41][42]
By early 2017, Hewlett Packard Labs had a slide saying that the project's aim was “to demonstrate progress, not develop products” and they would “collaborate to deliver differentiating Machine value into existing architectures as well as disruptive architectures”.[43]BleepingComputer said "In other words, The Machine is no longer a product in its own right. Instead it will provide technologies that will be used in other HPE products going forward.". HPE restructured its pure R&D organization and placed it in the products group.[44]Yahoo! Finance reported that the Machine prototype "remains years away from being commercially available".[45]
In 2018, HPE stated that the project had reached the stage where it needed commercial applications from customers in the next step of its evolution.[46]
^Hsu, Terry Ching-Hsiang; Brügner, Helge; Roy, Indrajit; Keeton, Kimberly; Eugster, Patrick (2017-04-23). "NVthreads". Proceedings of the Twelfth European Conference on Computer Systems. EuroSys '17. New York, NY, USA: Association for Computing Machinery. pp. 468–482. doi:10.1145/3064176.3064204. ISBN978-1-4503-4938-3.