NVIDIA Volta, IBM POWER9 Land Contracts For New US Government Supercomputers

The launch of Oak Ridge National Laboratorys Titan Supercomputer was in many ways a turning point for NVIDIAs GPU compute business. Though already into their third generation of Tesla products by that time, getting Tesla into the worlds most powerful supercomputer is as much of a singular mark of making it as there can be. Supercomputer contracts are not just large orders in and of themselves, but they indicate that the HPC industry has accepted GPUs as reliable and performant, and is ready to significantly invest in them. Since then Tesla has ended up in several other supercomputer contracts, with Tesla K20 systems powering 2 of the worlds top 10 supercomputers, and Tesla sales overall for this generation have greatly surpassed the Fermi generation.

Of course while landing their first supercomputer contract was a major accomplishment for NVIDIA, its not the only factor in making the current success of Tesla a sustainable success. To steal a restaurant analogy, NVIDIA was able to get customers in the door, but could they get them to come back? As announced by the US Department of Energy at the end of last week the answer to that is yes. The DoE is building 2 more supercomputers, and it will be NVIDIA and IBM powering them.

The two supercomputers will be Summit and Sierra. At a combined price tag of $325 million, the supercomputers will be built by IBM for Oak Ridge National Laboratory and Lawrence Livermore National Laboratory respectively. They will be the successors to the laboratories respective current supercomputers, Titan and Sequoia.

Both systems will be of similar design, with Summit being the more powerful of the two. Powering the systems will be a triumvirate of technologies; IBM POWER9 CPUs, NVIDIA Volta-based Tesla GPUs, and Mellanox EDR Infiniband for the system interconnect.

Starting with the CPU, at this point this is the first real attention POWER9 has received. Relatively little information is available on the CPU, though IBM has previously mentioned that POWER9 is going to emphasize the use of accelerators (specialist hardware), which meshes well with what is being done for these supercomputers. Otherwise beyond this we dont know much else other than that it will be building on top of IBMs existing POWER8 technologies.

Meanwhile on the GPU side, this supercomputer announcement marks the reintroduction of Volta by NVIDIA since going quiet on it after the announcement of Pascal earlier this year. Volta was then and still remains a blank slate, so not unlike the POWER9 CPU we dont know what new functionality is due with Volta, only that it is a distinct product that is separate from Pascal and that it will be building off of Pascal. Pascal of course introduces support for 3D stacked memory and NVLink, both of which will be critical for these supercomputers.

Speaking of NVLink, as IBMs POWER family is the first CPU family to support NVLink it should come as no surprise that NVLink will be the CPU-GPU and GPU-GPU interconnect for these computers. NVIDIAs high-speed PCIe replacement, NVLink is intended to allow faster, lower latency, and lower energy communication between processors, and is expected to play a big part in NVIDIAs HPC performance goals. While GPU-GPU NVLink has been expected to reach production systems from day one, the DoE supercomputer announcement means that the CPU-GPU implementation is also becoming reality. Until now it was unknown whether an NVLink equipped POWER CPU would be manufactured (it was merely an option to licensees), so this confirms that well be seeing NVLink CPUs as well as GPUs.

With NVLink in place for CPU-GPU communications these supercomputers will be able to offer unified memory support, which should go a long way towards opening up these systems to tasks that require frequent CPU/GPU interaction, as opposed to the more homogenous nature of systems such as Titan. Meanwhile it is likely though unconfirmed that these systems will be using NVLink 2.0, which as originally announced was expected for the GPU after Pascal. NVLink 2.0 introduces cache coherency, which would allow for further performance improvements and the ability to more readily execute programs in a heterogeneous manner.

Read more:

NVIDIA Volta, IBM POWER9 Land Contracts For New US Government Supercomputers

Related Posts

Comments are closed.