| |
Yottabyte [ 1,000,000,000,000,000,000,000,000 bytes
OR 1024 bytes ]
Zettabyte [ 1,000,000,000,000,000,000,000 bytes OR 1021 bytes ]
5 exabytes: All words ever spoken by human beings
2 exabytes: Total volume of information generated worldwide annually
Exabyte [ 1,000,000,000,000,000,000 bytes OR 1018 bytes
]
200 petabytes: All printed material
8 petabytes: All information available on the Web
2 petabytes: All U.S. academic research libraries
1 petabyte: 3 years of Earth Observing System (EOS) data (2001)
Petabyte [ 1,000,000,000,000,000 bytes OR 1015 bytes]
400 terabytes: National Climatic Data Center (NOAA) database
50 terabytes:The contents of a large mass storage system
10 terabytes:The printed collection of the U.S. Library of Congress
2 terabytes: An academic research library
1 terabyte: 50,000 trees made into paper and printed OR
daily rate of EOS data (1998)
Terabyte [ 1,000,000,000,000 bytes OR 1012 bytes ]
500 gigabytes: The biggest FTP site
100 gigabytes: A floor of academic journals
50 gigabytes: A floor of books
2 gigabytes: 1 movie on a Digital Video Disk (DVD)
1 gigabyte: A pickup truck fi lled with paper
Gigabyte [ 1,000,000,000 bytes OR 109 bytes ]
500 megabytes: A CD-ROM
100 megabytes: 1 meter of shelved books
10 megabytes: A minute of high-fidelity sound
5 megabytes:The complete works of Shakespeare
2 megabytes: A high-resolution photograph
1 megabyte: A small novel OR a 3.5-inch floppy disk
Megabyte [ 1,000,000 bytes OR 106 bytes ]
200 kilobytes: A box of punched cards
100 kilobytes: A low-resolution photograph
50 kilobytes: A compressed document image page
10 kilobytes: An encyclopaedia page
2 kilobytes: A typewritten page
1 kilobyte: A very short story
Kilobyte [ 1,000 bytes OR 103 bytes ]
100 bytes: A telegram or a punched card
10 bytes: A single word
1 byte: A single character
Byte [ 8 bits ]
Bit [ A binary digit - either 0 or 1 ]
Credit: "How much Information?," University of
California at Berkeley, 2001, http://sims.berkeley.edu/research/projects/how-much-info/index.html
|
| |
|
|
Idealized simulation of large-scale ocean eddies from NOAA's Geophysical
Fluid Dynamics Laboratory. (The lighter areas are warmer water.)
Current global models are not able to resolve data for this vigorous
oceanic action on scales of 10-100 kilometers.We do not yet understand the effects
of the constant eddies on global climate.
|
Representative FY 2003 agency activities
NSF: Support for development of new online collections;
research in architectures, tools, and technologies for digital libraries;
preservation of digital records; knowledge discovery, analysis,
and visualization in multiscale, heterogeneous data sets; multilingual
access to large audio archives; methods of information
extraction and synthesis
DARPA: Multilingual processing for summarization; data-mining technologies
with cross-repository information analysis functionality (semantic
retrieval, indexing, value filtering, user-defined alerting,
and categorizing); bio-surveillance technologies
NIH: Aggregate and manage large-scale
data resources for the medical community
NASA: Novel algorithms and software tools for extraction and
visualization of very-large-scale, multisource data sets
DOE Office of Science: Research in software and infrastructure
to manage very-large-scale data, instrumentation, and research
results; integration of massive, heterogeneous data sets
NIST: Evaluation methods to measure relevance of content extraction;
metrics,standards, and testing to advance technologies for access
to and use of multimedia information; measuring performance of
intelligent systems for information handling
NOAA: Apply advanced communications technologies to speed national
dissemination of critical weather information; extend real-time
collaborative access to disaster data with synchronous interfaces
and tools; enhance scientific study of environmental data through
advanced visualization techniques
ODDR&E: University-based research in reasoning across data
with diverse measures of uncertainty; representations of uncertainty
for decision making
AHRQ: Information management to enable studies
of health care and delivery system effectiveness; tools to enhance
patient safety by reducing medical errors; IT methods enabling providers
to share information with patients; establish and maintain National
Quality Measures Clearinghouse with detailed online information
about health care metrics
|
Information superiority is really decision
superiority, an executive recently told a newsweekly. That sounds
right - if the critical pieces of information are readily available
and identifiable. But today information is, as one IT researcher
puts it, a tsunami - an inconceivable volume of data engulfing everyone
and everything at an overwhelming rate. A University of California-Berkeley
study estimates that the world now produces one to two exabytes
of non-redundant information per year, about 93 percent of which
is stored in digital form. The U.S. generates more than 50 percent
of the total output, much of it in key scientific and government
activities.
For example, to produce a single day's 24-hour radar scan of the
weather, seen by billions of people around the world in televised
weather forecasts, NOAA's National Weather Service collects a half-terabyte
of real-time data from Doppler radars and turns that computationally
into near-real-time visual images. The flagship Terra Spacecraft
in NASA's Earth Observing System (EOS) program circles the globe
every 99 minutes, gathering information about the cycling of water,
trace gases, energy, and nutrients throughout the Earth's climate
system. Terra's instruments generate 850 gigabytes of data every
24 hours - the contents of 100,000 encyclopedias.
The information tsunami, which experts predict will swell by orders
of magnitude over this decade, presents enormous challenges for
society along with unprecedented opportunities for U.S. advanced
research and technological innovation. The September 11 attacks showed
that even a few ruthless adversaries within our borders can use
everyday information and communication systems to devastating effect.
Anticipating, detecting, and thwarting such attacks in the future
requires the Nation to sustain an unprecedented level of national
alertness. Among the most powerful tools in this work will be high-end
computing capabilities first developed by NITRD agencies to collect,
manage, search or "mine," synthesize, analyze, and visualize
massive amounts of data.
For people who work with advanced scientific, engineering, and
commercial processes as well as in such time-critical activities
as air traffic control and intelligence gathering, accessing and
making use of relevant data are core necessities. NITRD research
therefore focuses, not just on
technologies to store and organize information, but on how technologies
can help people find, "see" the significance of, and interact
with theinformation they need. These next-generation,
interactive information management technologies include the array
of innovative hardware and software capabilities we need to maximize
the benefits of information to our quality of life. Many scientists
consider effectively managing the information tsunami to be the
top technical challenge of the new millennium.
In FY 2003, the NITRD agencies, which pioneered the Nation's first
digital libraries and Internet search engines, will continue to
support R&D in interoperable technologies and tools for archiving,
cataloguing, accessing, and using heterogeneous materials in the
online environment and development of online information repositories
for research and education. The agencies will also support research
in advanced search and data-mining techniques; software and hardware
issues in integrating, accessing, and using very-large-scale data
sets; advanced interactive methods and tools for information display
and analysis; and technical issues in scalable archiving
and digital preservation.
Major Research Challenges
- Ultra-large-scale data-mining technologies for rapid mining, filtering, correlating, and assessing of vast quantities of heterogeneous and unstructured data (such as text and audio in many languages, video, images, and embedded code); intelligent search agents; tools for abstraction and summarization
- User-oriented frameworks and interfaces for analysis, reporting, and presentation
- Data storage and management technologies:
- Tools for collecting, indexing, archiving, and synthesis
- Protocols for data compatibility, conversion, interoperability, and interpretation in networked environments
- Technologies and tools for fusion of databases, such as molecules and macromolecular structures or disparate real-time weather observations, with remote access and analysis capabilities
- Component technologies and integration of dynamic, scalable, and flexible information environments
- Representation, preservation, and storage of multimedia collections
- Digital classification frameworks and interoperable search architectures
- Metadata technologies and tools for distributed multimedia archives
- Testbeds for prototyping and evaluating integration of types of digital content, software functionality, and large-scale applications
|