2. High Performance Computing Systems

When the HPCC Program began in 1991, traditional vector computers were still the primary high performance computing systems. Those systems were known to be approaching their physical limits, and a number of computing systems vendors were developing parallel systems that promised to overcome those limits. Today all major U.S. vendors have adopted parallel technology. Products span a wide range, including scalable parallel, fine- and coarse-grained parallel, vector and vector/parallel, networked workstations with high speed connectivity, and heterogeneous systems connected by high speed networks.

Performance Accomplishments

One of the fastest of these systems, an Intel Paragon with 1,904 nodes (each consisting of two Intel i860 processors) and 38 GB memory at DOE's Sandia National Laboratories, achieved a world record 143.4 gigaflops (Gflops) on the Massively Parallel (MP) Linpack benchmark.

By loosely coupling two large Paragons with a total of 2,256 of Intel's new multiprocessing nodes (each consisting of three i860 XP microprocessors, for a total of 6,768 processors), world records of 281 Gflops on the MP Linpack benchmark and 328 Gflops on a double- precision complex LU factorization code were realized. The systems were connected by 16 HiPPI channels with over 3 GB/s capacity. Both ran SUNMOS (Sandia/UNM Operating System), whose development was funded by DOE. SUNMOS is a lightweight custom message-passing OS kernel optimized for high performance and low latency. Applications demanding this level of performance are described beginning in Section II.6.

http://www.cs.sandia.gov/HPCCIT/main.html

Applications exhibiting half teraflops performance over a network are expected to be demonstrated on the exhibition hall floor at Supercomputing '95 in November 1995 in San Diego. The HPCC Program remains well on track toward demonstrating technologies capable of sustaining one trillion operations per second (teraops) performance on select large scientific and engineering problems.

ARPA is the lead HPCC agency developing the scalable computing technologies capable of sustained teraflops and faster performance. The ARPA program will enable industry to develop such systems for broad commercial use while enabling defense agencies to procure large-scale versions from the low-cost commercial technology base for special applications without redeveloping architectures, operating systems, or applications software. In FY 1995 ARPA will support the hardware, software, and system architecture design of teraflops-scale systems expected to emerge in FY 1996. Efforts include support for the convergence of architectural models (for example, message passing and distributed shared memory, or scalable vector and massively parallel). That will result in systems that run critical defense parallel applications more efficiently (for example, automatic target recognition, sensor fusion), and critical mobile targets. These systems will take advantage of ARPA's efforts in microsystems that are expected to produce computer backplane data rates approaching one Gb/s per wire. FY 1995 ARPA efforts will also create early prototypes of significantly higher performance and secure operating systems. In workstation cluster computing, ARPA will explore this more loosely coupled scalable computing architecture, including work on resource discovery, operating environments, and new models of computing.


Some of the members of the team that broke the world computing systems speed record.


Microsystems

ARPA's microsystems program provides the scalable microarchitecture building blocks specifically targeted at key limitations in power and computational efficiency for defense embedded computing applications for next generation computing systems. FY 1995 efforts include design of programmable protocol accelerator integrated circuits that support multiple communications protocols on the backplanes of next generation systems. Efforts in high speed signaling will result in improvements in inter-node communication bandwidth. New microarchitectures that use VLSI (very large scale integration) technology are being investigated as part of an effort to develop parallel-friendly architectures that support security, work cooperatively with compilers, have wider bandwidth to memory, and have new caching strategies. Microsystems also support CAD tools and environments for designing complex digital systems. Computational prototyping is used to accelerate the design process; in FY 1995 it will be used in virtual design environments for atomic-level to system-level modeling.

Embedded Systems

Work in embeddable high performance computing systems enables the transfer of high performance scalable technologies to military applications. Prototypes of embedded versions of emerging high performance computing systems lower future risk. In FY 1995 efficient programming environments will be developed for this class of systems. Real-time operating systems that meet performance, size, and security needs will emerge. Open interfaces for scalable computing systems will be developed, including new communications standards enabling low latency high performance interconnection of elements in a heterogeneous distributed system for embedded applications. In FY 1995 interoperability mechanisms that insulate application development from the hardware system will be explored. Scalable instrumentation will be developed to speed the debugging process and to validate performance. In FY 1996 ARPA plans to prototype scalable embedded modules containing memory and power on a single unit of replication.

Networks of Workstations (NOW)

In an ARPA-funded effort at the University of California at Berkeley, the NOW project is developing hardware and software support for using a network of workstations as a distributed computing system on a building-wide scale. This project addresses distributed scalable computing technologies typical of large embedded military programs. Advances in switched networks such as ATM have made it possible to closely integrate processors, memory, and disks. This approach connects a cluster of workstations into an integrated high performance system almost as powerful as traditional supercomputers. Parallel scientific software, computer-aided engineering software, databases, file servers, and large scale commercial information servers can benefit. R&D issues include network interface hardware, communications protocols, network- wide resource management, distributed scheduling, and parallel file systems. Features of a 100-workstation system that will be demonstrated include (1) high performance (delivering large portions of the capacity to demanding sequential and parallel applications while guaranteeing good performance to interactive users), (2) incremental scalability by adding workstations, (3) fault tolerance (the system remains usable even when a workstation fails), and (4) easy administration. Commercially available systems are used for fast prototyping.

http://now.cs.berkeley.edu/

Rapid Prototyping Facility

R&D in high performance computing systems involves government, industry, and academia. An example is the NSF-funded high performance rapid prototyping facility at the University of Michigan that can be accessed over the Internet. In FY 1995 NSF began funding university-based R&D in the design and analysis of memory architectures, and in the use of high performance computing in prototyping and manufacturing. In FY 1996 the agency plans to support additional research addressing the continuing imbalance between processor and memory speeds, which is becoming a major roadblock to advances in high performance computing systems. In FY 1996 NSF's Engineering Directorate plans to fund university-based R&D in optical and optoelectronic technologies that will enable future advances in ultra high capacity computing and communications. Together with NSF's Computer and Information Science and Engineering Directorate (CISE), they will begin to support the implementation of wireless network architectures and their interface to optical networks; research on integration at the interface of device and system can help accelerate the implementation of these technologies.


A wafer containing multiple PIM (Processor-in-Memory) chips, each with 128kb of memory and 64 processors. 0.25M of these processors with memory have passed initial testing in a single Cray-3 quadrant as part of the Cray-3/SSS (Super Scalable System), a joint venture between NSA and Cray Computer Corporation.


Specialized Very High Performance Architectures

NSA R&D is directed at order of magnitude improvements for deriving information using mathematical and signal processing approaches. Activities include:


Components of a prototype superconductive crossbar switch being developed at NSA. Data are transferred (via ribbon cable) from room temperature to cryogenic temperatures and back to room temperature at 2.5 Gb/s. A full 128-by-128 configuration is intended for use as a switch for massively parallel computer memory data transfers.


Mass Storage

High performance computing systems handle substantially more data -- both input and output -- than traditional systems. Large scale simulations, experiments, and observational projects generate large multidimensional datasets on meshes of space and time. Accurate modeling requires that the mesh be as dense as possible. Fast simulations require that relevant subsets of information in large datasets be accessed quickly -- in seconds or minutes rather than hours. For example, scientific investigations of environmental and earth science phenomena require ever increasing volumes of data in order to develop accurate models that can explain and predict these phenomena in a timely fashion.

The mass storage industry has developed technologies for handling petabytes (10(15) bytes) of data today and is developing technologies for handling exabytes (10(18) bytes) in the future. Technology advancements include increasing the density of storage media such as disks and tapes, RAID (Redundant Arrays of Inexpensive Disks), and robotic tape storage systems. Software that has been developed to manage mass storage systems includes file and volume managers and systems for updating, archiving, and backup.

On October 21-23, 1994, twelve mass storage vendors and two other organizations presented non-disclosure briefings to the HPCCIT Subcommittee on issues including trends, obstacles, standards, forums, international issues, the HPCC Program, and current and planned product lines.

The High Performance Storage System (HPSS) Consortium was one of the two non-vendor organizations at the October briefings. Consisting of DOE's Oak Ridge (ORNL), Lawrence Livermore (LLNL), Los Alamos, and Sandia National Laboratories, and IBM Federal, HPSS is developing a high performance parallel network-centered data storage and access system capable of GB/s transfers. Experimental implementations are in place at the development laboratories and at Cornell University, with general availability in late 1995.

Several HPCC-funded projects are addressing specific mass storage issues, including the following: