A world-leader in AI supercomputing solutions, NVIDIA debuted a major AI supercomputing platform with a suite of next-gen capabilities on Monday — introducing the A100 80GB GPU, the DGX Station (also an A100), and the Mellanox 400G InfiniBrand — during a pre-briefing virtual event Interesting Engineering attended last week.
NVIDIA debuts supercomputing triple-whammy with A100 80GB GPU
NVIDIA unveiled three major innovations for its NVIDIA HGX AI supercomputing platform, the first of which is the A100 80GB GPU. With twice the memory capacity of its predecessor, it helps engineers and researchers reach the next wave of AI — with new levels of performance of speed — and apply it to tomorrow's scientific breakthroughs, according to an embargoed press release shared with Interesting Engineering (IE).
The A100 employs HBM2e technology, capable of doubling the A100 40GB GPU's high-bandwidth memory, and offers more than 2 terabytes per second of bandwidth. This lets data run to the A100, accelerating researchers' applications and enabling them to approach larger models and datasets than they could before.
"Achieving state-of-the-art results in HPC and AI research requires building the biggest models, but these demand more memory capacity and bandwidth than ever before," said Vice President of Applied Deep Learning at NVIDIA Bryan Catanzaro, in the press release.
Systems providers offering A100 80 GP GPUs in 2021
"The A100 80GB GPU provides double the memory of its predecessor, which was introduced just six months ago, and breaks the 2TB second barrier, enabling researchers to tackle the world's most important scientific and big data challenges," Catanzaro added.
Numerous systems providers — including Dell Technologies, GIGABYTE, Fujitsu, Atos, Lenovo, Hewlett Packard Enterprise, Inspur, Supermicro, and Quanta — are slated to offer systems built with HGX A100 integrated baseboards using A100 80GB GPUs in the first half of next year.
AI applications normalizing across industries
This comes roughly a month after an earlier announcement from NVIDIA — when the company declared the industry benchmarking consortium called MLPerf had monitored more GPUs than CPUs in active inference performance on cloud service platforms, for the first time.
As an industry leader in AI performances, software, and services, NVIDIA is quickly becoming a go-to developer, as AI applications become normalized across multiple tech industries — from smartphone apps to cutting-edge quantum physics.
NVIDIA A100 is fastest 2TB per second of memory bandwidth
This new hardware will help a wide range of scientific applications, from quantum chemistry to weather forecasting. A materials simulation called Quantum Espresso, gained substantial gains — at a factor of nearly two — with a single node of A100 80GB.
"Speedy and ample memory bandwidth and capacity are vital to realizing high performance in supercomputing applications," said RIKEN Center for Computational Science Director Satoshi Matsuoka. "The NVIDIA A100 with 80GB of HBM2e GPU memory, providing the world's fastest 2TB per second of bandwidth, will help deliver a big boost in application performances."
NVIDIA debuts DGX Station A100, AI data research in-a-box
NVIDIA also debuted the DGX Station A100 — the only petascale workgroup server in the world, according to a second press release shared with IE under embargo. As a second-generation AI system, it's designed to accelerate heavy data science and machine learning workloads for specialized teams working from corporate offices, labs, research facilities — or even home offices anywhere on the planet.
The DGX Station A100 offers 2.5 petaflops of AI processing, and is the only workgroup server with four of the newest NVIDIA A100 Tensor Core GPUs which — via an NVLink — supports up to 320GB of GPU memory, setting up the possibility of major speed breakthroughs in private AI and data science.
DGX Station 'brings AI out of the data center'
Additionally, the DGX Station A100 supports NVIDIA's Multi-Instance GPU (MIG) technology — which allows a single DGX Station A100 to run 28 separate GPU instances parallel, enabling several users without dragging down system performance, read the press release.
"DGX Station A100 brings AI out of the data center with a server-class system that can plug in anywhere," said NVIDIA's Vice President and General Manager of DGX systems Charlie Boyle. "Teams of data science and AI researchers can accelerate their work using the same software stack as NVIDIA DGX A100 systems, enabling them to easily scale from development to deployment."
Lockheed Martin, BMW Group use NVIDIA DGX Stations
Several major organizations worldwide have already integrated DGX Station to carry out AI and data science, in industries like financial services, healthcare, education, government, and retails.
Lockheed Martin uses DGX Station to generate AI models using sensor data and service logs to anticipate maintenance — which improves manufacturing cycles, reduces costs of operations, and enhances safety for workers. BMW Group Production uses NVIDIA DGX Stations to accelerate the investigation of new ideas, strategically deploying AI to improve operations.
DVX Station A100 available this quarter 2020
There are numerous other examples, but the value of NVIDIA's DGX Station is plain: without the need for data center-grade cooling or power, it allows users to carry out experiments, analytics, management tasks, and much more via remote connection — decentralizing AI supercomputing at unprecedented scales and efficiency.
NVIDIA's DGX Station A100 will be made available this quarter 2020 via the company's partner network, which features global resellers. Additionally, customers in possession of a DGX A100 320GB customers can upgrade to the new standard.
NVIDIA unveils Mellanox InfiniBand, offers exascale AI supercomputing
NVIDIA also debuted the next generation of Mellanox 400G InfiniBand, to confront the exponential growth of computing requirements, according to a third press release shared with IE under embargo.
The Mellanox 400G InfiniBand is designed to accelerate work in climate research, drug discovery, and genomics — via a substantial jump in performance.
As the seventh generation of Mellanox InfiniBand, it doubles data throughput via NDR 400Gb/s at ultra-low latency, and adds in-network computing engines, which further accelerate task processing.
Mellanox 400G InfiniBrand 'hyperscales cloud infrastructures'
Leading infrastructure manufacturers like Dell Technologies, Lenovo, Atos, and Dupermicro aim to integrate the Mellanox 400G InfiniBand into their hardware and HPC offerings — complemented by ongoing support from storage infrastructure partners like IBM Storage, DDN, and others.
"The most important work of our customers is based on AI and increasingly complex applications that demand faster, smarter, more scalable networks," said NVIDIA's Senior Vice President of Networking Gilad Shainer, in the press release.
"The NVIDIA Mellanox 400G InfiniBand's massive throughput and smart acceleration engines let HPC, AI and hyperscale cloud infrastructures achieve unmatched performance with less cost and complexity," added Shainer.
Microsoft Azure partnered with NVIDIA to scale HPC, AI
As of writing, Microsoft Azure has partnered with NVIDIA Networking to help advance the work of scientists via scalable HPC and AI, Shainer continued.
"In AI, to meet the high-ambition needs of AI innovation, the Azure NDv4 VMs also leverage HDR InfiniBand with 200 GB/s per GPU, a massive total of 1.6 Tb/s of interconnected bandwidth per VM, and scale to thousands of GPUs under the same low-latency Infiniband fabric," wrote Microsoft's Head of Product and Specialized Acure Compute AI Nidhi Chappell, in the press release.
AI applications are spreading across every industry and scientific field at lightspeed. From the search to habitable exoplanets to crunching numbers in data analytics, NVIDIA is continually upping the stakes as an increasingly mainstream source for next-gen AI supercomputing.