New NCAR-Wyoming supercomputer to accelerate scientific discovery

New system will advance the nation’s understanding of geosciences

Jan 27, 2021 - by David Hosansky

BOULDER—The National Center for Atmospheric Research (NCAR) announced today that it has selected its next supercomputer for advancing the Earth system sciences, following a competitive open procurement process. The new machine will help scientists conduct research needed to better understand a range of phenomena that affect society, from the behavior of major wildfires to eruptions of solar storms that can threaten GPS and other sensitive technologies.

The innovative system will be built by Hewlett Packard Enterprise (HPE) and installed this year at the NCAR-Wyoming Supercomputing Center (NWSC) in Cheyenne, Wyoming. It will become operational in early 2022 and will replace the existing system, known as Cheyenne.

NCAR is holding a statewide contest for Wyoming schoolchildren to propose a name for the new system.

The HPE-Cray EX supercomputer will be a 19.87-petaflops system, meaning it will have the theoretical ability to perform 19.87 quadrillion calculations per second. That is about 3.5 times the speed of scientific computing performed by the Cheyenne supercomputer, and the equivalent of every man, woman, and child on the planet solving one equation every second for a month. Once operational, the HPE-powered system is expected to rank among the top 25 or so fastest supercomputers in the world.

"This new system is a major step forward in supercomputing power, providing the scientific community with the most cutting-edge technology to better understand the Earth system,” said Anke Kamrath, director of NCAR’s Computational and Information Systems Laboratory. “The resulting research will lead to new insights into potential threats ranging from severe weather and solar storms to climate change, helping to advance the knowledge needed for improved predictions that will strengthen society’s resilience to potential disasters.”

Funding for the system, which will cost $35-$40 million, comes from the National Science Foundation (NSF). The NWSC is funded by NSF and the state of Wyoming through an appropriation to the University of Wyoming.

Since the NWSC opened its doors in 2012, more than 4,000 users from more than 575 universities and other institutions across the nation and overseas have used its resources. Last year, the NWSC joined the COVID-19 High Performance Computing Consortium to accelerate understanding into the novel coronavirus.

The HPE-powered system will be a critical tool for researchers across the country and overseas who study climate change, severe weather, the hydrological cycle, geomagnetic storms, seismic activity, air quality, wildfires, and other important Earth system processes that can have wide-ranging impacts on society.

“The timing and nature of the NCAR upgrade could not be better for Wyoming. Researchers at the University of Wyoming will make great use of the new system to better understand areas of fundamental and economic interest impacted by flows in the atmosphere and underground,” said Ed Synakowski, vice president of research and economic development at the University of Wyoming. “The advances in computing that are captured in this upgrade, and the potential for impactful application of its results, are tremendous. We look forward to working with NCAR and the National Science Foundation in using this increased capacity to advance the fundamental science that determines so many issues of potentially high economic and social importance.”

More speed, greater efficiency

One of the most innovative features of the new system is its use of accelerated computing with NVIDIA A100 Tensor Core graphics processing units (GPUs). The supercomputer will get 20 percent of its sustained computing capability from GPUs, with the remainder coming from traditional central processing units (CPUs).

GPUs offer significant advantages over CPUs for Earth system research. They are far more powerful and energy efficient than CPUs, with up to six times the performance (as measured by floating-point operations) per watt of energy than CPUs. Adaption of GPU computing will also position the NWSC for the eventual use of exascale computing, which is many times faster than the most advanced systems today.

GPU computing is also more effective for newly developed artificial intelligence and machine learning techniques because they perform large numbers of computations simultaneously on one accelerator, resulting in lower power usage and less hardware for the same number of parallel operations. GPUs have less onboard memory than CPUs, but the ones being used in NWSC-3 are top-of-the-line in terms of both memory and number of cores. This will allow researchers to load more data and train larger machine learning models than previously possible. 

As a result of the GPUs and other energy-efficient features, the new NWSC system will use just 40 percent more electricity than Cheyenne – which is itself highly energy efficient – despite being almost 3.5 times faster.

It will be able to burst out to commercial cloud computing services that are highly flexible and scalable, providing access to computing and storage resources necessary for on-demand high-performance computing systems and specialized requirements.

The system will have 60 petabytes of high-performance storage, almost double the capacity of Cheyenne. It will feature HPE Slingshot, a purpose-built networking solution developed for high-performance systems to address demands for higher speed and congestion control for data-intensive workloads.

“With its extra storage and a 3.5-fold capability improvement over the current NCAR supercomputer, Cheyenne, the new supercomputer will provide the necessary resources for our scientists to continue expanding their research in the atmospheric and geospatial sciences,” said NCAR’s Irfan Elahi, project director of NWSC-3 and director of the High-Performance Computing Division. “To provide this capability, the new supercomputer is designed for highly energy-efficient operations.”

A more complete picture of the Earth system

High-performance computers enable researchers to run increasingly detailed models that simulate complex processes and how they might unfold in the future. Scientists can also harness the increased computing power to run multiple simulations that provide a more complete picture of the Earth system. This type of ensemble modeling enables them to quantify the likely range of outcomes, or uncertainty, of a given event.

Ensemble predictions are particularly helpful for providing resource managers and policy experts with valuable information for planning ahead and mitigating risk.

Some of the areas in which the new system is expected to accelerate research include the following:

Severe weather. More realistic simulations of weather hazards such as thunderstorms, tornadoes, and hurricanes will enable scientists to gain new insights into the processes involved, improve the models used for weather prediction, and better represent hazards and their impacts in a changing climate.

Climate change. The system’s increased capability will be critical to the development of the next version of NCAR’s flagship global climate model, the Community Earth System Model, which will allow more detailed and societally relevant projections of global and regional climate change.

Water availability and flooding. Detailed simulations of streamflow and prevailing climate patterns will lead to increasingly realistic predictions of seasonal water supply, drought risk, and flooding, providing vital information to water managers, farmers, and other decision makers.

Wildfires. The increased computational power will enable scientists to improve representations of physical processes such as local winds, soil moisture, and vegetation patterns, setting the stage for more reliable, probabilistic forecasts of wildfire risk and behavior.

Subseasonal to decadal prediction. The new system will facilitate research into a wide range of phenomena across the atmosphere, oceans, sea ice, and land that can be predicted from a few weeks to a decade out, helping society anticipate such events as heat waves, shifts in precipitation patterns, or changing fisheries conditions.

Air quality. Scientists will gain new insights into the feedbacks of atmospheric conditions and the complex movement and evolution of air pollutants in ways that can provide policy makers with additional information about human exposure in specific locations, helping to better safeguard human health and advance knowledge needed for more accurate air quality forecasts.

Renewable energy. By running ensembles of specialized models at high resolution, scientists can help utilities better estimate the amount of energy that will likely be generated by wind farms and major solar arrays days in advance, thereby reducing the costs of energy production.

Subsurface flows. More accurate and detailed models will enable researchers to better simulate the subsurface flows of water, oil, and gas, leading to a greater understanding of these resources.

Solar storms. Increasingly detailed, three-dimensional simulations of the Sun's turbulent plasma flows and magnetic fields will help improve predictions of powerful solar storms that can disrupt Earth's atmosphere and trigger space weather events, threatening communications systems and power grids.

"More powerful supercomputing is a vital component of the research infrastructure of our nation, enabling scientists to advance fundamental research and deepen our understanding of the complex and interconnected nature of the Earth system,” said NCAR Director Everette Joseph. “This new NWSC system will support basic research in ways that will lead to more detailed and useful predictions of the world around us, thereby helping to make our society more resilient to increasingly costly disasters and contributing to improved human health and well-being. It further equips NCAR to deliver on the priorities in its new strategic plan on actionable Earth system science for society.”

Simulation of solar flare
A simulation of a solar flare followed by a coronal mass ejection. Such simulations, run on supercomputers, can help scientists better understand solar storms that can disrupt Earth’s atmosphere with widespread impacts on technology. (Image: Matthias Rempel, ©UCAR)

 

Quick facts

Key features of the new NWSC system:

  • 19.87-petaflops powered by HPE Cray EX supercomputers, which are engineered to support next-generation supercomputing, including exascale systems
  • 2,570 compute nodes total: 2,488 Homogeneous Compute and 82 Heterogeneous (GPU) nodes
    • Homogeneous nodes have 2x 3rd Gen AMD EPYC™ CPUs
    • Heterogeneous (GPU) nodes have 1x 3rd Gen AMD EPYC™ CPUs and 4x NVIDIA 1.41 GHz A100 Tensor Core GPUs with 40GiB HBM2 memory and a 600 GB/s NVIDIA NVLink GPU interconnect
  • 692 terabytes (TB) of total memory
  • HPE Slingshot (v11) high-speed interconnect in a Dragonfly topology
    • Homogeneous compute nodes have one Slingshot injection port and the GPU nodes have 4 Slingshot injection ports per node
    • HPE Slingshot bandwidth is 200 Gb/sec per port per direction
    • HPE Slingshot MPI latency is 1.7-2.6 usec
  • 8 login nodes, each with 512 GB DDR4-3200 memory
    • six nodes with 2x AMD EPYC™ 7742 CPUs
    • two nodes with 2x AMD EPYC™ 7742 CPUs and 2x NVIDIA V100 GPUs

Software Environment

  • HPE Cray Operating System (OS), a tuned version of SUSE Linux
  • Altair Accelerator Plus scheduler with PBS Professional Workload Manager
  • Support for Docker containers, Singularity containers, and containers that support the Open Container Initiative standard.
  • HPE Cray Programming Environment, support for OpenMP 4.5 and 5.0, and MPI v3.1
  • Performance analysis and optimization tools in the HPE Cray Programming Environment to improve performance of applications.
  • NVIDIA HPC SDK, a comprehensive set of compilers, libraries and tools for the accelerated platform.
  • Intel Parallel Studio XE compiler suite
  • Cray Clusterstor E1000 storage system from HPE (based on 2.12 LTS)

The new NWSC-3 supercomputer and the existing NWSC GLADE file systems are complemented by a new parallel file system and data storage components.

Key features of the new data storage system:

  • Six Cray ClusterStor E1000 storage systems from HPE
  • 60 petabytes of usable file system space (can be expanded to 120 petabytes by exercising options)
  • 300 GB per second aggregate I/O bandwidth to/from the NWSC-3 system
  • 5,088 × 16-TB drives
  • 40TB SSD for Lustre file system metadata
  • Two metadata management units (MDU) exporting four MDTs (one MDT exported per one MDS), configured in highly available storage pairs
  • Cray Clusterstor E1000 storage file system from HPE

AMD, the AMD Arrow logo, EPYC, and combinations thereof are trademarks of Advanced Micro Devices, Inc.

 

See all News