• Big Data
    Interagency Working Group
    (BD IWG)

    The Big Data Interagency Working Group (BD IWG) works to facilitate and further the goals of the White House Big Data R&D Initiative.

    BigData
  • Cyber Physical Systems Interagency Working Group (CPS IWG)

    The CPS IWG is to coordinate programs, budgets, and policy recommendations for Cyber Physical Systems (CPS) research and development (R&D).

    CPS
  • Cyber Security and Information Assurance Interagency Working Group (CSIA IWG)

    Cyber Security and Information Assurance (CSIA) Interagency Working Group coordinates the activities of the CSIA Program Component Area.

    CSIA
  • Health IT R&D
    Interagency Working Group

    The Health Information Technology Research and Development Interagency Working Group coordinates programs, budgets and policy recommendations for Health IT R&D.

    healthitrd
  • Human Computer Interaction & Information Management Interagency Working Group (HCI&IM IWG)

    HCI&IM focuses on information interaction, integration, and management research to develop and measure the performance of new technologies.

    hciim
  • High Confidence Software & Systems Interagency Working Group (HCSS IWG)

    HCSS R&D supports development of scientific foundations and enabling software and hardware technologies for the engineering, verification and validation, assurance, and certification of complex, networked, distributed computing systems and cyber-physical systems (CPS).

    hcss
  • High End Computing Interagency Working Group (HEC IWG)

    The HEC IWG coordinates the activities of the High End Computing (HEC) Infrastructure and Applications (I&A) and HEC Research and Development (R&D) Program Component Areas (PCAs).

    hec
  • Large Scale Networking Interagency Working Group
    (LSN IWG)

    LSN members coordinate Federal agency networking R&D in leading-edge networking technologies, services, and enhanced performance.

    lsn
  • Software Productivity, Sustainability, and Quality Interagency Working Group (SPSQ IWG)

    The purpose of the SPSQ IWG is to coordinate the R&D efforts across agencies that transform the frontiers of software science and engineering and to identify R&D areas in need of development that span the science and the technology of software creation and sustainment.

    sdp
  • Video and Image Analytics
    Interagency Working Group (VIA IWG)

    Formed to ensure and maximize successful coordination and collaboration across the Federal government in the important and growing area of video and image analytics

    VIA CG
  • Wireless Spectrum Research and Development Interagency Working Group (WSRD IWG)

    The Wireless Spectrum R&D (WSRD) Interagency Working Group (IWG) has been formed to coordinate spectrum-related research and development activities across the Federal government.

    WSRD

Superfacility For Data Intensive Science

From NITRDGROUPS
Jump to: navigation, search

NITRD -> NITRD Groups -> FASTER CoP


"A Superfacility for Data Intensive Science"

Dr. Katherine Yelick
Associate Laboratory Director for Computing Sciences at Lawrence Berkeley National Laboratory
Professor of Electrical Engineering and Computer Sciences, University of California at Berkeley.
(presented at FASTER CoP on November 08, 2016)



About the Speaker

Dr. Katherine Yelick

Dr. Katherine Yelick
Professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley
Associate Laboratory Director for Computing Sciences at Lawrence Berkeley National Laboratory


Katherine Yelick is a Professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley and the Associate Laboratory Director for Computing Sciences at Lawrence Berkeley National Laboratory. She is known for her research in parallel languages, compilers, algorithms, libraries, architecture, and runtime systems. She earned her Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology and has been on the faculty at UC Berkeley since 1991 with a joint research appointment at Berkeley Lab since 1996. She was the director of the National Energy Research Scientific Computing Center (NERSC) from 2008 to 2012 and in her current role as Associate Laboratory Director she manages a 300-person organization that includes NERSC, the Energy Science Network (ESNet), and the Computational Research Division. She is a member of the National Academies Computer Science and Telecommunications Board (CSTB) and the Computing Community Consortium (CCC), and she previously served on the California Council on Science and Technology. Yelick is an ACM Fellow and recipient of the ACM-W Athena award and the 2015 ACM/IEEE Ken Kennedy Award.



Abstract

In the same way that the Internet has combined with web content and search engines to revolutionize every aspect of our lives, the scientific process is poised to undergo a radical transformation based on the ability to access, analyze, and merge complex data sets. Scientists will be able to combine their own data with that of other scientists, validating models, interpreting experiments, re-using and re-analyzing data, and making use of sophisticated mathematical analyses and simulations to drive the discovery of relationships across data sets. This “scientific web” will yield higher quality science, more insights per experiment, a higher impact from major investments in scientific instruments, and an increased democratization of science—allowing people from a wide variety of backgrounds to participate in the science process.


Scientists have always demanded some of the fastest computers for computer simulations, and while this has not abated, there is a new driver for computer performance with the need to analyze large experimental and observational data sets. The exponential growth rates in detectors, sequencers and other observational technology, data sets across many science disciplines are outstripping the storage, computing, and algorithmic techniques available to individual scientists. The first step in realizing this is to consider the model used for scientific user facilities, including experimental facilities, wide area networks, computing and data facilities. To maximize scientific productivity and the efficiency of the infrastructure, these facilities should be viewed as a single tightly integrated “superfacility” where data streams between locations and experiments can be integrated with high-speed analytics and simulation.


Equally important to this model is the need for advanced research in computer science, applied mathematics, and statistics to deal with increasingly sophisticated scientific questions and the complexity of the data. In this talk I will describe some examples of how science disciplines such as biology, material science and cosmology are changing in the face of their own data explosions, and how this will lead to a set of research questions due to the scale of the data sets, the data rates, inherent noise and complexity, and the need to “fuse” disparate data sets. What is really needed for data-driven science workloads in terms of hardware, systems software, networks and programming environments and how well can those be supported on systems that also run simulation codes? How will the imminent hardware disruptions affect the ability to perform data analysis computations and what types of algorithms will be required?


Webcast

Video Link: https://youtu.be/E4dDm1ixojQ