The Convergence of High Performance Computing, Big Data, and Machine Learning

Logistics

Event: The Convergence of High Performance Computing, Big Data, and Machine Learning; A NITRD workshop organized by the High-End Computing and Big Data Interagency Working Groups
Date: October 29-30, 2018
Location: National Institutes of Health, Bethesda, MD
Agenda
Workshop Participants
Federal Register Notice: 83 FR 50413, “Notice of Workshop on the Convergence of High Performance Computing, Big Data, and Machine Learning”, The National Coordination Office (NCO) for Networking and Information Technology Research and Development (NITRD), National Science Foundation, October 5, 2018.

Purpose

To examine key challenges and opportunities in integrating high performance computing (HPC) with big data (BD) and machine learning (ML)
To identify research and development opportunities including practical systems implementation.

Rationale

An evolving scientific and technological landscape requires computing platforms for HPC, BD, and ML to be more integrated, but convergence can strain traditional paradigms of computing and software development. Science-based simulation increasingly relies on embedded machine learning models to interpret results from massive outputs as well as to steer computations. Likewise, science-based models are being combined with data-driven models to represent complex systems and phenomena. This process is not seamless and there are numerous challenges including the need for: innovative distributed computing and workflow architectures, new software capabilities that incorporate simulation and analytics, and advanced workforce training. This workshop will bring together experts from the Federal agencies, academia and the private sector to discuss these challenges and opportunities, facilitate information sharing and collaboration, and identify research needs.

The National Strategic Computing Initiative (NSCI) and the national Big Data R&D Initiative have highlighted the rapid development of hardware and software for extreme-scale and big data computing, while the National Artificial Intelligence Research and Development Plan, and other government Artificial Intelligence and ML initiatives, have highlighted the ubiquitous use of ML and related techniques across a variety of domains.

Goals

The workshop will identify and discuss:

Use cases and applications from a variety of domains such as: science and engineering, medicine and biomedicine, finance and business, national security and intelligence, and smart and connected communities.
Current activities to address the convergence challenge and the research and technologies that are still needed.
Strategies for combining the HPC, BD, and ML software and hardware ecosystems, including the use of: heterogeneous computing; special processors, e.g. GPUs, Tensor Processing Units, and other custom hardware; innovations in memory and the I/O subsystems.
Strategies for combining the “people culture” of HPC, BD, and ML.
Different modes of operation such as: interactive workflows, batch vs. interactive processing, burst computing, in situ analysis, and streaming data.

Documents/Publications/References

The Convergence of High Performance Computing, Big Data, and Machine Learning: Summary of the Big Data and High End Computing Interagency Working Groups Joint Workshop, Big Data Interagency Working Group; High End Computing Interagency Working Group; Networking and Information Technology Research and Development Subcommittee; Committee on Science and Technology Enterprise; National Science and Technology Council, September 9, 2019.
Workshop Resources and Reading List, October 2018.
Webcast: Day #1: https://youtu.be/fvlyqLJdIV8; Day #2: https://youtu.be/pnEtOzpqm9U, October 29-30, 2018.

Presentations

Keynote, Jim Kurose, Assistant Director, NSF, Directorate for Computer & Information Science & Engineering
Convergence of HPC, Big Data and Machine Learning: A Science and Engineering Perspective, Professor Tony Hey, Chief Data Scientist, Rutherford Appleton Laboratory, Science and Technology Facilities Council (STFC)
Cancer Research: Computing and Data , Warren A. Kibbe, Ph.D., Professor, Biostats & Bioinformatics Chief Data Officer, Duke Cancer Institute
Convergence of HPC and Big Data : Software Panel, Alex Aiken, Stanford/SLAC
Convergence of HPC and Big Data : Architecture Panel, Rangan Sukumar, Senior Analytics Architect, Office of the CTO Cray Inc.
Modes of Operation: perspectives from NERSC, Debbie Bard, Acting Group Lead Data Science Engagement Group NERSC, LBNL
Doing More With Less:Democratizing Computing Across Scientific Domains, Fernanda Foertter, GPU Developer Advocate Healthcare HPC + AI, NVIDIA