The Networking and Information Technology Research and Development (NITRD) Program's member agencies coordinate their NITRD research activities and plans by Program Component Areas (PCAs) or program focus areas. For each PCA, agency representatives meet in an Interagency Working Group (IWG) to exchange information and collaborate on research plans and activities such as testbeds, workshops, and cooperative solicitations.
Working in collaboration with, and striving to serve the needs of a growing, diverse set of emerging and existing applications requiring computing and other IT capabilities beyond those typically available to individual research groups yet not well served by existing HPC systems, the Pittsburgh Supercomputing Center (PSC) is carrying out an accelerated, development pilot project to create, deploy and test software building blocks and hardware implementing functionalities specifically designed to support data-analytic capabilities for data intensive scientific research. This Data Exacell (DXC) project, building on the PSC’s successful Data Supercell (DSC) technology which replaced a conventional tape-based archive with a disk-based system to economically provide the much lower latency and higher bandwidth data success necessary for data-intensive activities, will implement and bring to production quality additional functionalities important to such work. These include improved local performance, additional abilities for remote data access and storage, enhanced data integrity, data tagging and improved manageability. In support of data-analytics and data-intensive processing, we are acquiring hardware appropriate for running a broad collection of databases and interacting with them directly (e.g. over the Web) or as part of data-intensive workflows thereby increasing DXC’s effectiveness and applicability for the full range of data-analytic research. In the technological ecosystem, DXC fills the void between computationally-intensive systems (now driving towards exascale) and shared-nothing clusters (including commercial clouds).
We will discuss DXC’s current application complement, its architecture, computational engines (including small and large database systems, large cache-coherent memory system, graph analytics and data-analytic systems) and storage systems (including shared SSDs, local and multi-petabyte, shared, high-performance file systems) interconnected by a high-performance fabric in a ‘data-centric’ architecture. In addition we will sketch the extension of the DXC pilot project to the recently announced NSF award to the Bridges, large-scale production facility scheduled to go on-line in 2016.