-   -
 
National Coordination Office for Networking and Information Technology Research and Development
 
 
-
-
 
 

Broadening IT Capabilities To Support Human Needs and Goals Return to Table of Contents Reliability, Security, and Safety for Mission-Critical Systems
 

Information Management

 

The Information Tsunami


  Yottabyte [ 1,000,000,000,000,000,000,000,000 bytes OR 1024 bytes ]
Zettabyte [ 1,000,000,000,000,000,000,000 bytes OR 1021 bytes ]

5 exabytes: All words ever spoken by human beings
2 exabytes: Total volume of information generated worldwide annually
Exabyte [ 1,000,000,000,000,000,000 bytes OR 1018 bytes ]
200 petabytes: All printed material
8 petabytes: All information available on the Web
2 petabytes: All U.S. academic research libraries
1 petabyte: 3 years of Earth Observing System (EOS) data (2001)
Petabyte [ 1,000,000,000,000,000 bytes OR 1015 bytes]
400 terabytes: National Climatic Data center (NOAA) database
50 terabytes:The contents of a large mass storage system
10 terabytes:The printed collection of the U.S. Library of Congress
2 terabytes: An academic research library
1 terabyte: 50,000 trees made into paper and printed OR
daily rate of EOS data (1998)
Terabyte [ 1,000,000,000,000 bytes OR 1012 bytes ]
500 gigabytes: The biggest FTP site
100 gigabytes: A floor of academic journals
50 gigabytes: A floor of books
2 gigabytes: 1 movie on a Digital Video Disk (DVD)
1 gigabyte: A pickup truck fi lled with paper
Gigabyte [ 1,000,000,000 bytes OR 109 bytes ]
500 megabytes: A CD-ROM
100 megabytes: 1 meter of shelved books
10 megabytes: A minute of high-fidelity sound
5 megabytes:The complete works of Shakespeare
2 megabytes: A high-resolution photograph
1 megabyte: A small novel OR a 3.5-inch floppy disk
Megabyte [ 1,000,000 bytes OR 106 bytes ]
200 kilobytes: A box of punched cards
100 kilobytes: A low-resolution photograph
50 kilobytes: A compressed document image page
10 kilobytes: An encyclopaedia page
2 kilobytes: A typewritten page
1 kilobyte: A very short story
Kilobyte [ 1,000 bytes OR 103 bytes ]
100 bytes: A telegram or a punched card
10 bytes: A single word
1 byte: A single character
Byte [ 8 bits ]
Bit [ A binary digit - either 0 or 1 ]

 

Credit: "How much Information?," University of California at Berkeley, 2001, http://sims.berkeley.edu/research/projects/how-much-info/index.html

 

Advanced Technologies To Build Knowledge from Data


 
Idealized simulation of large-scale ocean eddies from NOAA's Geophysical Fluid Dynamics Laboratory. (The lighter areas are warmer water.) Current global models are not able to resolve data for this vigorous oceanic action on scales of 10-100 kilometers.We do not yet understand the effects of the constant eddies on global climate.


Representative FY 2003 agency activities

NSF: Support for development of new online collections; research in architectures, tools, and technologies for digital libraries; preservation of digital records; knowledge discovery, analysis, and visualization in multiscale, heterogeneous data sets; multilingual access to large audio archives; methods of information extraction and synthesis

DARPA: Multilingual processing for summarization; data-mining technologies with cross-repository information analysis functionality (semantic retrieval, indexing, value filtering, user-defined alerting, and categorizing); bio-surveillance technologies

NIH: Aggregate and manage large-scale data resources for the medical community

NASA: Novel algorithms and software tools for extraction and visualization of very-large-scale, multisource data sets

DOE Office of Science: Research in software and infrastructure to manage very-large-scale data, instrumentation, and research results; integration of massive, heterogeneous data sets

NIST: Evaluation methods to measure relevance of content extraction; metrics,standards, and testing to advance technologies for access to and use of multimedia information; measuring performance of intelligent systems for information handling

NOAA: Apply advanced communications technologies to speed national dissemination of critical weather information; extend real-time collaborative access to disaster data with synchronous interfaces and tools; enhance scientific study of environmental data through advanced visualization techniques

ODDR&E: University-based research in reasoning across data with diverse measures of uncertainty; representations of uncertainty for decision making

AHRQ: Information management to enable studies of health care and delivery system effectiveness; tools to enhance patient safety by reducing medical errors; IT methods enabling providers to share information with patients; establish and maintain National Quality Measures Clearinghouse™ with detailed online information about health care metrics

Information superiority is really decision superiority, an executive recently told a newsweekly. That sounds right - if the critical pieces of information are readily available and identifiable. But today information is, as one IT researcher puts it, a tsunami - an inconceivable volume of data engulfing everyone and everything at an overwhelming rate. A University of California-Berkeley study estimates that the world now produces one to two exabytes of non-redundant information per year, about 93 percent of which is stored in digital form. The U.S. generates more than 50 percent of the total output, much of it in key scientific and government activities.

For example, to produce a single day's 24-hour radar scan of the weather, seen by billions of people around the world in televised weather forecasts, NOAA's National Weather Service collects a half-terabyte of real-time data from Doppler radars and turns that computationally into near-real-time visual images. The flagship Terra Spacecraft in NASA's Earth Observing System (EOS) program circles the globe every 99 minutes, gathering information about the cycling of water, trace gases, energy, and nutrients throughout the Earth's climate system. Terra's instruments generate 850 gigabytes of data every 24 hours - the contents of 100,000 encyclopedias.

The information tsunami, which experts predict will swell by orders of magnitude over this decade, presents enormous challenges for society along with unprecedented opportunities for U.S. advanced research and technological innovation. The September 11 attacks showed that even a few ruthless adversaries within our borders can use everyday information and communication systems to devastating effect. Anticipating, detecting, and thwarting such attacks in the future requires the Nation to sustain an unprecedented level of national alertness. Among the most powerful tools in this work will be high-end computing capabilities first developed by NITRD agencies to collect, manage, search or "mine," synthesize, analyze, and visualize massive amounts of data.

For people who work with advanced scientific, engineering, and commercial processes as well as in such time-critical activities as air traffic control and intelligence gathering, accessing and making use of relevant data are core necessities. NITRD research therefore focuses, not just on technologies to store and organize information, but on how technologies can help people find, "see" the significance of, and interact with theinformation they need. These next-generation, interactive information management technologies include the array of innovative hardware and software capabilities we need to maximize the benefits of information to our quality of life. Many scientists consider effectively managing the information tsunami to be the top technical challenge of the new millennium.

In FY 2003, the NITRD agencies, which pioneered the Nation's first digital libraries and Internet search engines, will continue to support R&D in interoperable technologies and tools for archiving, cataloguing, accessing, and using heterogeneous materials in the online environment and development of online information repositories for research and education. The agencies will also support research in advanced search and data-mining techniques; software and hardware issues in integrating, accessing, and using very-large-scale data sets; advanced interactive methods and tools for information display and analysis; and technical issues in scalable archiving and digital preservation.

Major Research Challenges

  • Ultra-large-scale data-mining technologies for rapid mining, filtering, correlating, and assessing of vast quantities of heterogeneous and unstructured data (such as text and audio in many languages, video, images, and embedded code); intelligent search agents; tools for abstraction and summarization
  • User-oriented frameworks and interfaces for analysis, reporting, and presentation
  • Data storage and management technologies:
    • Tools for collecting, indexing, archiving, and synthesis
    • Protocols for data compatibility, conversion, interoperability, and interpretation in networked environments
    • Technologies and tools for fusion of databases, such as molecules and macromolecular structures or disparate real-time weather observations, with remote access and analysis capabilities
    • Component technologies and integration of dynamic, scalable, and flexible information environments
    • Representation, preservation, and storage of multimedia collections
    • Digital classification frameworks and interoperable search architectures
    • Metadata technologies and tools for distributed multimedia archives
    • Testbeds for prototyping and evaluating integration of types of digital content, software functionality, and large-scale applications
 
Broadening IT Capabilities To Support Human Needs and Goals Return to Table of Contents Reliability, Security, and Safety for Mission-Critical Systems
 
 
4201 Wilson Blvd, Suite II-405, Arlington, VA 22230 | (703) 292-4873 | (703) 292-9097 (fax)
 
-
Home | Back to Top | Contact Us | Privacy Policy | Search
-