Skip to content

HPC Vega Architecture

Below is a table summarizing the type and quantity of the major hardware components of the proposed solution for the Vega system:

Computing

GPU partition

Category Component Quantity Description
Infrastructure Rack 2 XH2000 DLC rack with PSUs, HYC and IB HDR switches
Compute GPU node 60 4x Nvidia A100, 2x AMD Rome 7H12, 512 GB RAM, 2x HDR dual port mezzanine, 1x 1.92TB M.2 SSD

CPU partition

Category Component Quantity Description
Infrastructure Rack 10 XH2000 DLC rack with PSUs, HYC and IB HDR switches
Compute CPU node Standard 768 256x blades of 3 compute nodes (2x AMD Rome 7H12 (64c, 2.6GHz, 280W) 256GB RAM 1x HDR100 single port mezzanine 1x 1.92TB M.2 SSD)
Compute CPU node Large Memory 192 64x blades of 3 compute nodes (2x AMD Rome (64c, 2.6GHz, 280W) 1TB RAM 1x HDR100 single port mezzanine 1x 1.92TB M.2 SSD)

Storage

HPST - High-performance storage tier

Category Component Quantity Description
Storage Flash-based building block 10 2U ES400NVX (per device: 23x 6.4 TB NVMe, 8 InfiniBand HDR100, 4 embedded Lustre VMs, 1 OST and MDT per VM).

LCST - Large Capacity Storage tier

Category Component Quantity Description
Storage Storage node 61 Supermicro SuperStorage 6029P-E1CR24L with 2x Intel Xeon Silver 421R, 12c, 2.4GHz, 100W, 256GB RAM DDR4 RDIMM 2933MT/s, 1x 240GB SSD, 2x 6.4TB NVMe, 24x 16TB HDD, 2x 25GbE Mellanox ConnectX-4 DP, 1x 1GbE IPMI
Internal Ceph Network Ethernet switch 8 Mellanox SN2010. Per Switch: 18x 25GbE + 4x 100GbE ports

Login and Virtualization

Category Component Quantity Description
CPU login Login nodes 4 Atos BullSequana X430-A5 with 2x AMD EPYC 7H12, 256GB RAM DDR4 3200MT/s, 2x 7.6TB U.2 SSD, 1x 100GbE DP ConnectX5, 1x 100Gb IB HDR ConnectX-6 SP
GPU login Login nodes 4 Atos BullSequana X430-A5 with 1x NVIDIA Ampere A100 PCIe GPU and 2x AMD EPYC 7452 (32c, 2.35GHz, 155W), 256GB RAM DDR4 3200MT/s, 2x 7.6TB U.2 SSD, 1x 100GbE DP ConnectX5, 1x 100Gb IB HDR ConnectX-6 SP
Service Virtualization/Service nodes 30 Atos BullSequana X430-A5 with 2x AMD EPYC 7502 (32c, 2.5GHZ, 180W) 512GB RAM DDR4 3200MT/s, 2x 7.6TB U.2 SSD, 1x 100GbE DP ConnectX5, 1x 100Gb IB HDR ConnectX-6 SP

Network and Interconnect Infrastructure

Category Component Quantity Description
Interconnect Network IB switch 68 40-port Mellanox HDR swich, Dragonfly+ topology
Interconnect Connections IB HDR100/200 ports on IB card 1230 960 Compute, 60 (x2) GPU, 8 Login, 30 Virtualization, 10 (x8) HCST and 8 (x4) Skyway Gateways with Mellanox ConnectX-6 (single or dual port)
IPoIB Gateway IB/Ethernet Data Gateway 4 Mellanox Skyway IB to Ethernet Gateway Appliance (per gateway: 8x IB and 8x 100GbE ports)
Ethernet Data Network Top-Level Switches 2 Cisco Nexus N3K – C3408-S, 192 ports 100GE activated
WAN Connectivity IP Routers 2 Cisco Nexus N3K – C3636C-R, 5x 100GbE to WAN (provided end of 2021)
Top Management Network 10GbE switch 2 Mellanox 2410 switches (per switch 48x 10GbE ports)
In/Out of Band Management Network 1GbE switch 4 Mellanox 4610 switches (per switch 48x1GbE + 2x 10GbE ports)
Rack Management Network WELB switch 24 Two per rack integrated switches WELB (sWitch Ethernet Leaf Board) with three 24-port Ethernet switch instances and one Ethernet Management Controller (EMC)

Placeholder