5. High Performance Computing Research Facilities

It is at these facilities that (1) full-scale production systems are used on Grand Challenge scale applications that cannot be scaled down without modifying the problem being addressed or requiring unacceptably long execution times, (2) early prototype systems are evaluated, feedback is provided to developers, and systems are made more mature and robust, (3) small versions of mature systems are used for developing parallel applications, and (4) advanced visualization systems are integrated into the high performance computing environment. The largest of the Grand Challenge scale applications are being run on multiple high performance systems located at these HPCC facilities across the country and networked at gigabit speeds. These facilities are demonstrating a new paradigm for conducting advanced R&D by the Federal government and American industry and academia.

The facilities are a virtual meeting place of an interdisciplinary group of experts. These groups include facility staff, hardware and software vendors, Grand Challenge applications researchers, industrial affiliates that want to develop industry-specific software, and academic researchers interested in advancing the frontier of high performance computing. HPCC funding is heavily leveraged by discipline-specific agency funds, equipment and personnel from hardware vendors, funds from state and local governments and universities, with industrial affiliates paying their fair share. Industrial affiliation offers a low risk environment for exploring and ultimately exploiting HPCC technology.

Production-quality operating systems software and software tools are developed at these facilities. Applications software developers access their resources over the Internet. Production-quality applications are often first run at these facilities, with user access increasingly at gigabit speeds. With their wide range of hardware and applications software, these are often sites for benchmarking systems and applications, and feedback is provided to hardware and software developers and vendors.

Other broadly common features of the facilities are extensive K-12 and undergraduate educational opportunities; training for researchers, graduate students, and faculty; and publication of articles in professional journals, annual reports, and newsletters.

Many of the systems listed below are funded by the named agency and receive additional funding from other HPCC agencies. For example, funding for systems at NSF centers also comes from ARPA, NASA, and NIH.

The resources at each facility and their key focus areas are as follows:

NSF Supercomputer centers

NSF funds four Supercomputer centers and augments the computing facilities at NCAR (the National center for Atmospheric Research). The systems at these centers are listed below.

The term Metacenter refers to the joint cooperative activities of these centers and others in naturally overlapping research and technology areas. The Metacenter facilitates collaboration, communication, technical progress, and interoperability among participating institutions. To encourage further extension and collaborative integration of Metacenter activities with those of other providers of high performance computing, communications, and information infrastructure, in FY 1994 NSF initiated a program of Metacenter Regional Alliances (MRAs) resulting in six awards. In addition to augmenting national support activities, these MRAs are intended to complement, expand, and strengthen existing Metacenter activities at the regional, state, or local level. Participants in MRAs are expected to prototype new local and regional activities having the potential for being broadly replicated. NSF intends to add a similar number of MRAs in FY 1995.

http://pscinfo.psc.edu/Metacenter/MetaScience/welcome.html

Cornell Theory center (CTC), Ithaca, NY

The main CTC system is a 512-processor IBM SP-2. One CTC focus area is a globally scalable computing environment, including mass storage, I/O capability, networking, archival storage, data processing power, and graphics power.

http://www.tc.cornell.edu/ctc.html

National center for Supercomputing Applications (NCSA), Urbana- Champaign, IL

Resources include:

The three-tiered NCSA network consists of (1) Ethernet or FDDI to the desktop; (2) FDDI backbone between buildings, high-end systems, and the Internet; and (3) HiPPI between high performance computing systems, mass storage, and high-end peripherals.

NCSA is also involved in ATM research in (1) a local area network, (2) a trans-continental 155 Mb/s (SONET OC-3) national network, and (3) the BLANCA gigabit testbed at 622 Mb/s (SONET OC-12).

NCSA's virtual reality CAVE is described in Section II.3, and NCSA Mosaic is described earlier.

http://www.ncsa.uiuc.edu/General/NCSAHome.html

Pittsburgh Supercomputer center (PSC), Pittsburgh, PA

Resources include:

San Diego Supercomputer center (SDSC), San Diego, CA

Resources include:

An environmental modeling case study performed at SDSC appears in the Case Studies Section.

http://www.sdsc.edu/SDSCHome.html

National center for Atmospheric Research (NCAR), Boulder, CO

NSF HPCC funds enabled NCAR to acquire a 64-processor Cray Research T3D and an 8-processor IBM SP-1 for use in the global climate modeling Grand Challenge.

http://www.ucar.edu/homepage.html

NSF Science and Technology centers

Each of these four centers addresses a particular research area; common to all four is cross-disciplinary focus, knowledge transfer and links to the private sector, and education and outreach. The centers are:

The center for Research in Parallel Computation (CRPC) at Rice University

CRPC aims to make parallel computing systems as easy to use as conventional computing systems -- efforts include HPF, PVM, MPI, and NHSE, HPC++, algorithms for physical simulation, algorithms using parallel optimization, and ScaLAPACK. These are described in Section II.3.

http://www.cs.rice.edu/CRPC/bluebook/bluebook.html

The center for Computer Graphics and Scientific Visualization at the University of Utah

This center is building and displaying models that are visually and measurably indistinguishable from real world entities.

http://www.graphics.cornell.edu/GVSTC.html

The center for Discrete Mathematics and Theoretical Computer Science headquartered at Rutgers University

This center is applying discrete mathematics and theoretical computer science to solving fundamental problems in science and engineering; in FY 1995 a special year on Mathematical Support for Molecular Biology, focusing on DNA sequencing and protein structure, was begun; in FY 1996 a special year on Logic and Algorithms, focusing on the relationship between mathematics and computational algorithms, will begin.

http://dimacs.rutgers.edu/

The center for Cognitive Science at the University of Pennsylvania

This center studies the human mind through the interaction of disciplines such as psychology, philosophy, linguistics, logic, and computer science. Work in human cognition, perception, natural language processing, and parallel computing has applications in robotic and manufacturing systems, human-machine interfaces, and language teaching and translational tools.

http://www.cis.upenn.edu/~ircs/homepage.html

NASA Testbeds

NASA maintains testbeds throughout the country to offer a diversity in configuration and capability. The testbeds are:

Ames Research center, Moffett Field, CA

Resources include:

Goddard Space Flight center, Greenbelt, MD

Resources include:

Jet Propulsion Laboratory (JPL), Pasadena, CA

Resources include:

The Paragon came into service in August, 1993, for Earth and Space Science Grand Challenge applications development. Peak performance is 5.6 Gflops in single precision (using 56 nodes with 1.8 GB aggregate memory and more than 20 GB aggregate on-line disk). The system is available to NASA HPCC investigators and select collaborators. Interactive access to all compute nodes is available 24 hours a day, seven days a week.

Langley Research center, Hampton, VA

Resources include an Intel Paragon (with 72 compute nodes, 3 service nodes, 8 I/O nodes, 2 Ethernet nodes, 2 HiPPI nodes, and 2 FDDI nodes, each with 32 MB memory and 38 GB disk).

Lewis Research center, Cleveland, OH

Resources include an IBM SP-1 (with 16 processors, 2 GB disk, 800 Mflops peak speed).

Other Resources

These include:

Further information about these NASA testbeds is at:

http://cesdis.gsfc.nasa.gov/hpccm/accomp/94accomp/ess94.accomps/ess4.html



DOE Laboratories

National Energy Research Supercomputer center (NERSC), Lawrence Livermore National Laboratory (LLNL), Livermore, CA

The Supercomputer Access Program at NERSC provides production computing for investigators supported by the Office of Energy Research in the following areas: material sciences, chemistry, geosciences, biosciences, engineering, health and environmental research, high energy and nuclear physics, fusion energy, and applied mathematics and computational science. The center serves more than 4,000 accounts involved in some 700 projects. NERSC resources include:

Anticipated FY 1995 accomplishments include providing initial production use of a storage system based on National Storage Laboratory (NSL) technology to Energy Research programs, and bringing in a pilot early production massively parallel system. FY 1996 plans include replacing this system with a fully configured system, increasing disk capacity and robotic tape capabilities for the storage system, installing Kerberos-based security for NERSC services, and providing client access to the Distributed File System (DFS) from major NERSC computing platforms. DFS is a distributed file system based on the Andrew File System (AFS), which is now part of the Open Software Foundation's Distributed Computing Environment (DCE).

Los Alamos National Laboratory (LANL), NM, and

Oak Ridge National Laboratory (ORNL), TN

These DOE HPCC Research centers provide full-scale high performance computing systems for work on Grand Challenge applications and use in scalability studies. These applications must be run on large prototype systems -- they cannot be scaled down without removing essential aspects of their physics.

LANL operates a Thinking Machines CM-5 (1,024 compute nodes, 32 GB memory, 128 GB Scalable Disk Array using four HiPPI channels). This system has 128 Gflops theoretical performance with 50 Gflops observed on several codes.

ORNL resources include:

Several system upgrades are anticipated in FY 1995. FY 1996 plans include expanding DOE storage capabilities by implementing the HPSS network-centered parallel storage system management software.

NIH Systems

The Division of Computer Research and Technology (DCRT) has a 128- processor Intel iPSC/860 with 2 GB memory. In FY 1995 DCRT will install a 10- to 20-Gflops massively parallel system. Both systems are used by NIH staff in biomedical applications.

http://hpccwww.dcrt.nih.gov/

The National Cancer Institute (NCI) Frederick Biomedical Supercomputing center has an 8-processor Cray Y-MP and a MasPar MP-2 with 4,096 processors along with a comprehensive collection of biomedical software available to all scientists who use the facility.

http://www-lmmb.ncifcrf.gov/

http://www-ips.ncifcrf.gov/

http://www-pdd.ncifcrf.gov/

The National center for Research Resources (NCRR) supports systems for biomedical research applications at its High Performance Computing Resources centers at CTC, PSC, SDSC, NCSA, and Columbia University.

NOAA Laboratories

The Forecast Systems Laboratory in Boulder, CO, has a 221- processor Intel Paragon, with 6.5 GB memory and 28.8 GB disk, rated 15 Gflops peak. This system is used to parallelize regional and mesoscale forecast models.

The Geophysical Fluid Dynamics Laboratory (GFDL) in Princeton, NJ, is acquiring a high performance computing system including massively parallel capabilities in FY 1995, and the Numerical Meteorological center in Camp Springs, MD, will acquire a system in FY 1996. These systems are to be used for the global climate modeling and weather forecasting Grand Challenges. The GFDL procurement is funded in part by NOAA HPCC funds; it will provide an order of magnitude improvement in performance, have more advanced data archiving facilities, and use the new scalable models now being developed to address important climate problems.

EPA Systems

EPA's National Environmental Supercomputing center in Bay City, MI, has a Cray C90 (with 3 processors, 64 Mw memory, and 90 GB disk). In FY 1995 EPA plans to acquire a scalable massively parallel system for installation at this center. These systems are dedicated to environmental research and problem solving.

http://www.epa.gov/nesc/