Project > Testbeds

Testbeds 

 

The EuroEXA project worked toward a series of three centralised testbeds, installed at STFC Daresbury Laboratories in the UK. Each testbed offers increasing complexity and scale, demonstrating the technologies that the EuroEXA project group have under development.

Testbed 1 was built to enable software development for the networking and storage functions that run on the Xilinx ZU9 FPGA – with four mounted on each node – together with FPGA-software development within the Xilinx environment. At this stage, we also deployed a series of Development Testbeds at partner locations to aid with their own development work.

The original plan for Testbed 2 was for it to be a scaled-out version of Testbed 1, with additional nodes housing the same set of 4xZU9 chips. However, as part of the project’s co-design approach, the specification evolved – instead making it part of the development journey for Testbed 3. While this did cause a delay in its deployment, the resulting Testbed 2 offers more opportunities for demonstrating the potential for ExaScale computing, with each node including a VU9 Accelerator and a ZU9 network/storage processor.

Finally, the project culminates with Testbed 3. This will use the project’s key architectural compute-node elements of a Network/Storage Processor (ZU9 Xilinx), a fully reprogrammable Accelerator (VU9 Xilinx) and a powerful 64bit processor host. This will also include both air-cooled and liquid-cooled variants running side by side. While the air-cooled sections will only use commodity components, this approach will help us demonstrate and quantify the density and proximity optimisations that come with our liquid-cooled infrastructure.

Testbed 1: QFDB with FPGAs

Our first testbed includes eight interconnected Quad-FPGA Daughter Board (QFDB) nodes – each of which contains four Xilinx Zync ZU9 chips, with processors enhanced with FPGAs. This is then housed within a liquid cooling system designed by Iceotope, developed as part of the ExaNeSt project


Total number of Nodes: 8

Processors per Node: 4x Xilinx ZU9 Quad A53 ARM+FPGA

Ram per Node: 32GB

Storage per Node: 480GB

Infrastructure: Iceotope Petagen

Status: Live, Installed Q4 2018

Testbed 2: Codesigned Scale-Out Testbed

EuroEXA is a co-designed project – an approach that ensures we develop our technologies around the needs of a range of applications. It’s this approach that has driven the evolution of Testbed 2 to become much more of a stepping stone on the journey toward Testbed 3. As such, it uses the much more powerful VU9 FPGA, which was shown to be up to four times faster than 4xZU9 FPGAs for key partner applications – offering both greater scalability and greater energy efficiency.

Testbed 2 Individual and Paired Development Nodes

Total number of Nodes: 12 (potentially increasing to 20)

Processors per Node: 1x Xilinx ZU9 Quad A53 ARM + FPGA; 1x Xilinx VU9 FPGA

RAM per Node: 64GB

Storage per Node: 480GB

Infrastructure: Iceotope Cold Plate

Status: Live @ Iceotope; Q3 2020 - Distributing to partners

Roles: Software Development; Runtimes Development; Firmware Development; Interconnect Development

Testbed 2 Liquid Cooled System

Total number of Nodes: 256

Processors per Node: 1x Xilinx ZU9 Quad A53 ARM + FPGA; 1x Xilinx VU9 FPGA

RAM per Node: 64GB

Storage per Node: 480GB

Infrastructure: Iceotope K:UL; Schneider Modular Data Centre Container

Status: Under Construction at Daresbury Labs, UK, ETA Q1 2021

Roles: Benchmarking; Real Operations

Testbed 3: High-performance host

The final testbed for the EuroEXA project is designed with bundles of nodes, putting a high-performance host together with an accelerator and a network/storage controller. It also includes three different nodes, offering the opportunity to showcase and contrast different air-cooled and liquid-cooled technological approaches.

Testbed 3 Air Cooled ARM node

Total Number of Nodes: 1

Processors per node: 1x ARM 64 Core; 1x Xilinx FPGA Accelerator; 1x Xilinx Storage/Network Processor/FPGA

RAM per Node: 64GB

Storage per Node: 480GB

Status: Under Construction at University of Manchester, UK; ETA Q4 2020

Roles: NEEDED

Testbed 3 Air Cooled Cluster

Total Number of Nodes: 32

Processors per node: 1x AMD EPYC; 1x Xilinx FPGA Accelerator; 1x Xilinx Storage/Network Processor/FPGA

RAM per Node: 64GB

Storage per Node: 480GB

Status: Under Construction at Daresbury Labs, UK; ETA Q4 2020

Roles: Software Development; Benchmarking; Real Operations

Notes: Made of modern COTS building blocks; 32 nodes taking up more space than 2u of TB3

Testbed 3 High Density, Proximity Optimised Liquid Cooled Cluster

Total Number of Nodes: 32

Processors per node: 1x AMD EPYC; 1x Xilinx VU9 FPGA Accelerator; 1x Xilinx ZU9 Storage/Network Processor/FPGA

RAM per Node: 96GB

Storage per Node: 480GB

Status: Under Construction at Daresbury Labs, UK; ETA Q2 2021

Roles: Software Development; Benchmarking; Real Operations

Notes: Physically Combined/Retrofitted to Testbed 2 to create a single compact system with hundreds of nodes in half a cabinet.