Hardware Development

Project > Hardware Development

Hardware Development

The EuroEXA project group collaborated on an array of hardware and electronic elements which will support and enhance our vision for ExaScale performance.

Modularity is key to our approach, ensuring that the system is both scalable and practical. This is aligned with our focus on key open standards – including Open Compute Project (OCP) and COM Express – which provides both flexibility and adaptability, leaving the doors open to further development and innovation.

We worked to build very dense PCBs, bringing the electronics as close together as possible to drive performance and reduce energy demands. Similarly, we have developed a three-level interconnect that gives data the shortest possible routes to travel, while allowing for the expansion of the system with commodity OpenFlow switches – offering scalability with fewer bottlenecks.

Finally, the project harnesses fully reprogrammable accelerators to enable data-flow-based processing, which not only improves data flow but also reduces the number of idle electrons within the processors – making the fullest use of the compute capacity.

Proximity-Optimised Computing Architecture

Thanks to thermal and power density capabilities of our infrastructure, we can place the processors closer together, while integrating accelerators and IO elements on a single module.

With numerous modules on each carrier, we have direct peer-to-peer connectivity in addition to system-level interconnect. And, with numerous carriers in each chassis, we then have a hierarchy of proximity, offering greater peer proximity without placing any communication load on the system interconnect.

We also make the distributed storage drives an integral element of the network interface of each compute module. This removes the latency and bandwidth bottlenecks that often come with centralised network storage models. As such, it’s a system that aggregates the total storage bandwidth, delivering storage-centric high-throughput computation.

High Compute Density Hardware

We worked to create a unique compute module, called the Co-design Recommended Daughter Board (CRDB). This used an innovative node architecture that makes compute acceleration central to the node through an FPGA, with a direct connection to a smart IO controller – unifying storage and communication services, together with the application host.

We then mount 16 of these CRDB modules across four multi-carriers to create a single Open Compute Unit blade (1OU). Each blade then creates the final hierarchy in the proximity-optimised interconnect, switching directly between modules – both within the chassis and between peer chassis within a network – while bridging into the system-level interconnect. As such, this hierarchy removes the increasingly costly hop-count and interface radix, offering an approach that can scale up without losing compute performance.

The design and implementation of our new ARM-based processor elements allow us to investigate the memory compression technologies that can relieve pressure on the memory bottleneck and therefore improve our rate of FLOPS per byte.

We then enhance our innovative global shared-memory architecture, UNIMEM, with a processor-native interface that can cut the interconnect latency of critical small packets seven-fold, while enabling native streaming of data and advanced control functions.

Scalable Interconnect

As the node count grows, the pressure on the homogeneous interconnects becomes untenable and can suffer from multi-hop latencies, needing longer cables and more communication energy. Our solution is the Trifecta™ Scalable Interconnect.

It’s called Trifecta because it optimises on three levels – a proximity optimised interconnect that brings data and communication together at various levels of peer hierarchy. This allows us to create direct peer links between groups of four compute modules per carrier, then link groups of four carriers, before creating a homogeneous interconnect between an optimally sized network group at chassis level.

We can then use Open Flow to provide a geographically addressed and scalable switch-based topology to interconnect our network groups and the resulting cabinets. It’s an approach that offers software-defined configuration of switches across the network – giving Trifecta™ the flexibility to adjust port speed depending on demand. This then offers a switching capability that permits higher bandwidth at higher levels of the interconnect hierarchy, without shifting port requirements across every node.

Data Fluent Processing

Today’s Von-Neuman derived processors are consuming twice as much energy to work out what to do next than to carry out the operation. The use of vector and SIMD extensions and the simplification of the control unit (for instance, in GPUs) improve this ratio, but both come at the cost of flexibility and overall scalability.

To meet this challenge, we have designed our system around Data Fluent Processing, which promotes the flow of data between a graph of operations. We’re harnessing today’s FPGA to deliver Adaptive Dataflow Accelerators that offer a defined graph of operators while flowing data between them without any control overheads.

Likewise, we are investigating various task models to identify the best approach for scheduling activities that affect the flow of data. Integrating these approaches with current processing scalability models is then helping us create a platform capable of executing current codes, while leaving the door open for further investigation into future data-flow approaches.

Hardware Meta-modularity

EuroEXA uses a Meta-Modular approach to adapt to the latest Open Hardware specifications. Drawing inspiration from the Open Compute Project, EuroEXA’s hardware includes a number of modular components, with different levels of modularity across both the hardware and infrastructure – from the building to the electronics.

At the hardware level, EuroEXA uses 1u blades, each of which houses four modular compute carriers, which in turn carry four modular compute boards, which are themselves based on an extended version of an open specification. With this meta-modular approach, we’re pursuing our vision of an infrastructure that can swiftly adopt new hardware technologies, building an extensive system for scaling out to ExaScale, while offering maximum flexibility and variety of hardware.