
Graphic by Janet Ward of NOAA's High Performance Computing and Communications
Program
Representative FY 2002 agency activities
NSF: Development of digital library collections, including
research in architectures, tools, and technologies for organizing,
annotating, searching, and preserving distributed multimedia archives;
expand online scientific data and large-scale data-mining research
to accelerate use of existing data to supplement new observations
DARPA: Deploy scalable prototype analysis environment in defense application
with cross-repository information analysis functionality (semantic
retrieval, indexing, value filtering, user defined alerting, and categorizing)
NIH: Continue work on query by image content to produce a reliable
method for computer-assisted x-ray image segmentation, indexing, and
query; establish an information storage, curation, analysis, and retrieval
(ISCAR) program for biological data
DOE Office of Science: Modular electronic notebook prototype, whiteboard,
and related tools for collaboratory sharing of scientific data, instrumentation,
and research results
NIST: Initial Internet-accessible repository of full structural crystallographic
data for inorganic materials and a second repository of Internet-accessible
molecular recognition knowledge; intelligent interfaces for using
existing bioinformatics tools for protein databases
NOAA: Extend real-time collaborative access to chemical disaster information
by surrounding this functionality with synchronous collaborative tools
to enable experts nationwide to consult while maintaining a consistent
view of the data
AHRQ: Develop Web-based applications to improve health data systems
and quality of care; innovative strategies for data collection in
clinical settings; approaches for integrating quality and outcomes
data into the care process
|
Early Federal IT investments have pioneered development
and implementation of digital repositories of information and such
basic enabling technologies as search engines, record management systems,
and linkages among distributed archives. Creating digital libraries
across the range of human knowledge and developing the technologies
and tools to make that knowledge universally available on demand is
a core challenge in information technology whose advances benefit
every profession, every academic discipline, every learner, and every
citizen.
Digital libraries form the basis of the Nation's 21st century knowledge
network. The Federally supported research to decode the human genome,
for example, was accelerated by many years because researchers could
create, store, and immediately share over the Internet massive databases
of genetic information representing pieces of the enormous biological
puzzle. Federal digital libraries funding not only established major
digital collections in such areas as Earth and space sciences, the
humanities, law, medicine, oral history, and science, mathematics,
and engineering education, but also spun off search engine technologies
that have become successful commercial enterprises.
Developmental issues in the digital libraries field are growing in
tandem with today's knowledge explosion. Their scale is suggested
by a recent University of California at Berkeley study estimating
that the world now produces between one and two exabytes (an exabyte
is a billion billion 8-bit bytes) of information annually; most of
this vast output is images, sound, and numeric data already in digital
formats; only 0.003 percent represents print documents. At the same
time, barely 10 percent of all public information ever produced in
print has been digitized and made available on the Internet. How to
determine, collect, and preserve what is of value in the world's dizzying
new digital output now joins older questions of how and what to digitize
from humanity's pre-digital knowledge stores as issues for archivists.
Building archives is only one step in generating the technological
framework that makes a digital library useable. It also takes advanced
technologies for managing and working with digital information, from
visualization, data fusion, and analysis capabilities to remote collaboration
and metadata notation schemes, to advanced interoperable systems.
The NITRD effort is building on early Federal successes to develop
the next-generation technologies that are needed to help realize the
full potential of electronic information. Today's search engines,
for example, are based on fundamental algorithms developed 20 years
ago; current search tools cannot locate audio or image information
by content description. Strategies to assure long-term preservation
of digital records constitute another particularly pressing issue
for research. As storage technologies evolve with increasing speed
to cope with the growing demand for storage space, the obsolescence
of older storage hardware and software threatens to cut us off from
the electronically stored past.
Federal agencies' FY 2002 research efforts will include development
of large-scale digital collections in engineering, sciences, and humanities;
research to increase interoperability and integration of software
in distributed systems; protocols and tools for data annotation and
management; and research in technical issues in preservation.
- Data storage and management technologies:
|
- Tools for collection, indexing, synthesis, and archiving
- Protocols for data compatibility, conversion, interoperability,
interpretation
- Technologies and tools for fusion of databases, such
as molecules and macromolecular structures in biology
or disparate real-time weather observations, with remote
access and analysis capabilities
- Component technologies and integration of dynamic, scalable,
flexible information environments
- Digital representation, preservation, and storage of
multimedia collections
- Protocols and tools to address legal issues such as
copyright protection, privacy, and intellectual property
management
|
- Usability of large-scale data sets:
|
- Intelligent search agents, improved abstracting and
summarizing techniques, and advanced interfaces
- Digital classification frameworks and interoperable
search architectures
- Metadata technologies and tools for distributed multimedia
archives
- Ultra-scale data-mining technologies
- Testbeds for prototyping and evaluating media integration,
software functionality, and large-scale applications
|
|