|
|
|
|||
| |
|
||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |
|
||||||||||||
|
|
|||||||||||||
| Overview |
The invention of the printing press confronted 15th-century scholars and publishers with challenging problems as they attempted to transform hand-drawn and -lettered documents into multiple printed representations while developing libraries to house and organize them. Even so, by 1501 printers had produced some 20 million copies of 35,000 manuscripts, fueling the expansion of literacy and the spread of human knowledge beyond the social elites. By comparison, today's fast-evolving capabilities of computing systems and networks make it possible to recreate, archive, display, and manipulate exponentially greater quantities of electronically generated documents, data, images, sounds, and video streams-and to offer potential instant access to this knowledge to a significant portion of the world's population. But organizing "collections" of these many forms of electronic information and developing systems and software tools to make them available to end users require complex technical innovations, including collaboration among experts from widely disparate fields of knowledge. Launched in 1994, the Digital Libraries Initiative (DLI) addresses the conceptual, structural, and computational challenges that must be met before we can realize the vision of universally accessible electronic repositories of human knowledge. Despite the modest scale of the initial DLI program, early interdisciplinary successes of participating researchers--demonstrated in such commercial spinoffs as the Lycos and Google search engines and Go2Net-- attracted the attention of growing numbers of scholars and highlighted the enormous potential of digital information resources. DLI Phase Two, begun in FY 1999, spans a larger, more diverse set of research efforts that apply today's increasing computational and bandwidth capacities to the goal of making large-scale, distributed electronic collections accessible, interoperable, and usable through global knowledge networks. DLI Phase Two activities are jointly supported by NSF, DARPA, NIH/NLM, the Library of Congress, NASA, the National Endowment for the Humanities, and the FBI, in partnership with the National Archives and Records Administration, the Smithsonian Institution, and the Institute of Museum and Library Services. |
||||||||||||
|
|
|||||||||||||
| DLI Phase Two directions |
DLI Phase Two activities are drawing computer scientists and engineers from academia, industry, and government together with researchers and archivists in the humanities, the arts, and biomedical and physical sciences to develop new digital resource collections and testbed linkages among distributed archives; create frameworks, software, and network architectures that enable fusion of multimedia materials into unified records; resolve semantic problems that currently prevent integration of digital resources from distributed collections; experiment with system designs to ensure the preservation, integrity, and privacy of data; and explore and codify educational applications of digital materials. Phase Two research focuses on three essential dimensions of digital libraries:
|
||||||||||||
|
|
|||||||||||||
| Human-centered research |
These activities explore next generation methods, algorithms, and software that can empower expanded educational, professional, and personal uses of high-quality digital information resources. Research focuses on development of intelligent search agents, improved abstracting and summarization techniques, advanced interfaces, and collaboration technologies and tools to enable individuals and groups to search for, retrieve, manipulate, and present electronic information archived in a variety of forms in adistributed network of source collections. Among the issues addressed by DLI Pha se Two efforts: |
||||||||||||
| Personalized Retrieval
and Summarization of Image, Video, and Language Resources (PERSIVAL) |
The explosion in Internet
sites devoted to medical and health-related information makes it increasingly
difficult for health care providers and consumers to find the most valuable
and useful current resources. In the Personalized Retrieval and Summarization
of Image, Video, and Language Resources (PERSIVAL) project, Columbia University
researchers are experimenting with system designs to provide practitioners
with quick and easy access to online medical resources tailored to individual
patient needs. The goal is to develop personalized search and presentation
tools to sort through distributed medical information, weed out repetitious
and non-germane content, and summarize and present current findings that
best match the real-time requirements of the practitioner or consumer. Using
secure online patient records available at Columbia Presbyterian Medical
Center as test models, the research team is linking a multimodal query interface
with information extracted from a patient's medical record and user background
to create a query graph for an online search of distributed medical resources.
Search results are then filtered using natural language processing to provide
the best matches with the patient's background. The results are presented
in a customized multimedia format. http://www.cs.columbia.edu/diglib/PERSIVAL/
|
||||||||||||
| Digital resources designed for children | The ways in which children
ages 5 to 10 access, explore, and organize digital learning materials and
the issues involved in creating learning environments suited to children's
age-specific needs are the focus of a University of Maryland project. University
researchers are fashioning developmentally appropriate tools for visualizing,
browsing, querying, and organizing information in digital libraries designed
for children. Audio, image, video, and text materials for the interdisciplinary
research effort-which will include construction of a testbed digital collection
about animals-are being made available by the Discovery Channel and the
U.S. Department of the Interior's Patuxent (Maryland) Wildlife Research
Center. http://www.cs.umd.edu/hcil/kiddiglib/
|
||||||||||||
| Technologies and
tools for students |
Technologies and tools
to make online educational resources more accessible and useful to communities
of older learners, including college students and adults, are under development
in several collaborative research efforts. For example, researchers at the
Hypermedia and Visualization Laboratory (HVL) at Georgia State University
and the Association for Computing Machinery (ACM) SIGGRAPH Education Committee
are developing a model for a reusable national collection of peer-reviewed
undergraduate educational applications in XML and improved navigation capabilities
using information visualization techniques based on XML and 3-D Web graphics.
Related work by researchers at the University of South Carolina in association
with collaboratories at the University of Iowa and Georgia State is creating
a "Web-lab Library" of simulation software, experiments, and databases designed
for students and researchers in the social and economic sciences. http://econ.badm.sc.edu/beam/
|
||||||||||||
| Video information collage | Researchers at Carnegie
Mellon University are creating an electronic workspace for video materials
called a "video information collage," which will enable users to search
for, view, and manipulate multiple video, text, image, and sound files from
heterogeneous distributed sources. This will allow them to organize their
discoveries into "chrono-collages" based on time relationships, "geo-collages"
based on spatial relationships, or "auto-documentaries" preserving video's
temporal nature. The research also involves creating a public video archive
of recordings of historical, political, and scientific importance. http://www.informedia.cs.cmu.edu
|
||||||||||||
|
|
|||||||||||||
| Alexandria Digital
Earth Prototype (ADEPT) |
The Alexandria Digital
Earth Prototype (ADEPT) program is a component of a large-scale digital
library collaboration of the University of California- Berkeley, the University
of California-Santa Barbara (UCSB), Stanford University, SDSC, and the California
Digital Library (CDL). The ADEPT project builds on a DLI Phase One project
that used UCSB's map and imagery collections to create a large-scale geospatial
digital archive, called the Alexandria Digital Library (ADL), featuring
maps, aerial photos, gazetteer items, and bibliographic records. In the
ADEPT effort, researchers are constructing-and will evaluate the educational
effectiveness of-customizable learning environments based on the ADL's geographically
referenced contents, enabling students to bookmark and organize information
from heterogeneous resources and online services for multidisciplinary academic
work. The ADEPT model employs a personalized interface called an Iscape,
or Information landscape, with several layers of service and resource materials
including meta-information tools indicating which resources in the personalized
collection can be used collaboratively. http://alexandria.ucsb.edu/adept/
|
||||||||||||
| Power browsers | Stanford researchers are experimenting
with "power browsers"-handheld information appliances that access information
sources, such as the Web, through wireless connections and software that
maximizes the visual and navigation performance of very small displays.
The software includes special information crawlers that save time by automatically
performing certain search-related tasks. Researchers are also working on
a large-scale "WebBase" database technology to store and index for subsequent
searching or analysis millions of Web pages distributed across computers
worldwide. http://www-diglib.stanford.edu/ |
||||||||||||
|
|
|||||||||||||
| Content and collections |
Researchers are creating novel digital archives of sound, image, and video as well as textual records from broad knowledge domains and specific disciplines in the sciences, arts, and humanities. They are evaluating methods of digital representation, preservation, and storage; exploring effective metadata systems (standard structures for presenting the intellectual context and pertinent related information about records in a collection); expanding access to educational materials and courseware; and developing technologies and protocols for addressing related legal and societal issues, such as copyright protection, privacy, and intellectual property management. Current research activities include: |
||||||||||||
| Digital library for the humanities | Tufts University researchers,
in partnership with the Max Planck Institute in Berlin, the Modern Language
Association (MLA), the Boston Museum of Fine Arts, and the Stoa electronic
publishing consortium, are developing the foundations for a scalable, interdisciplinary
digital library accessible and useful to scholars as well as everyday Internet
users. Materials included will date from ancient Egypt through 19th-century
London. This site was processing 5 million requests per month in fall 1999.
http://www.perseus.tufts.edu |
||||||||||||
| National
Gallery of the Spoken Word (NGSW) |
An interdisciplinary team at Michigan
State University is building the Nation's first large-scale, fully searchable
database and repository of historically significant audio materials spanning
the 20th century. The "gallery" will also provide high-quality digital versions
of such spoken words as Thomas Edison's first cylinder recordings and the
voices of Babe Ruth and Florence Nightingale, with standard bibliographic
and metadata access. A key research product will be a set of best practices
for future Web sound development, including methods for conversion, preservation,
access, and copyright compliance. http://www.ngsw.org/app.html |
||||||||||||
| National
digital library for science, mathematics, engineering, and technology education (SMETE) |
University of California-Berkeley
researchers who developed the National Engineering Education Delivery System
(NEEDS) digital library are exploring ways to expand the collection to encompass
science, mathematics, and technology. The group is using its Web-based information
portal, which supports cataloguing, searching, displaying, and reviewing
of digital learning materials and courseware, to begin developing a SMETE
digital library, demonstrate the online resource's capabilities, and evaluate
the initial SMETE testbed collection. The NSF-supported effort aims to create
a broad-based digital learning resource for K-12 and postsecondary education.
http://www.needs.org |
||||||||||||
| Digital Atheneum | NSF-funded researchers at the Univesrity
of Kentucky, in partnership with the British Library and with support fro
m IBM's Shared University Research (SUR) program, are developing state-of-the-art
techniques to digitally restore and enhance aging and damaged original documents
and create searchable archives of such materials. Working with documents
from the British Library's Cottonian Collection (which contains Greek, Hebrew,
and Anglo-Saxon manuscripts collected by 17th century antiquarian Sir Robert
Bruce Cotton), they are testing new methods to illuminate otherwise invisible
text and markings on documents and create digital annotation systems and
semantic frameworks for domain- and data-specific searches of these materials.
http://www.digitalatheneum.org
|
||||||||||||
| Digital workflow management | The more than 29,000
pieces of American popular sheet music in the Johns Hopkins University's
Lester S. Levy Collection, already converted into digital records, will
be made more accessible and usable through this project to create sound
renditions and enhanced search capabilities. From items in the collection,
which covers the period from 1790 to 1960, researchers will generate audio
files and full-text lyrics using optical music recognition software written
by staff of the Peabody Conservatory of Music at Johns Hopkins, and will
develop workflow management tools to reduce and focus the human labor involved.
The activities will result in a framework, tested process, and set of tools
transferable to other large-scale digitization projects. http://levysheetmusic.mse.jhu.edu
|
||||||||||||
| Data provenances |
Research at the University of Pennsylvania addresses one of the most difficult aspects of online resource collections: the questions surrounding the origin, or provenance, of an electronic record-such as how old it is, how it was originally generated, who produced it, and who has modified it. These questions are even more challenging in electronic than in traditional archives because the material involved ranges from a single pixel in a digital image to an entire database. Drawing on concepts from emerging software for presenting structured documents on the Web, researchers will develop prototype document "attachments" where annotations regarding provenance can be stored and queried, providing new data models, query languages, and storage techniques. http://db.cis.upenn.edu/Research/provenance.html |
||||||||||||
|
|
|||||||||||||
| Systems and testbeds |
Systems research focuses on developing component technologies and the integration needed to create information environments that are dynamic and flexible; responsive at the individual, group, and institutional levels; and capable of continually adapting growing and changing bodies of data to new user-defined structures. These capabilities are prototyped and evaluated in testbed demonstrations that focus on media integration, software functionality, and breakthrough applications that offer transforming paradigms for social and work practices on a large scale. |
||||||||||||
| New
model for scholarly publishing |
The current print model of academic
publishing, based on centralized control and restricted distribution, originated
long before the start of the information age. In another component of the
large-scale DL collaboration at California institutions, University of California-Berkeley
researchers are developing technologies and tools to create a distributed,
continuous, self-publishing paradigm to use and disseminate scholarly information
in this era of instantaneous global communication. The publishing system
prototypes will be tested and demonstrated in the emerging CDL and on a
testbed developed by SDSC. http://elib.cs.berkeley.edu |
||||||||||||
| Classification systems | Among the most complex technical
challenges of digital archives is how to adapt or re-invent standardized
identification and classification schemes for their contents, as well as
interoperable search architectures that users need to locate these resources.
On top of traditional print catalog taxonomies, archivists of electronic
artifacts are juggling a number of new content categories (for example,
video, image, sound, and software programs), formats (such as the jpeg and
gif formats for graphic images) and related operational annotations. Researchers
at the University of Arizona, in partnership with SGI, NCSA, NIH's NLM and
NCI, GeoRef Information Services, and Petroleum Abstracts, are working on
an architecture and associated techniques to automatically generate classification
systems from large domain-specific textual collections and unify them with
manually created classification structures. To generate and test prototypes,
they are parallelizing and benchmarking computationally intensive scalable
automatic clustering methods for keyword searching on large-scale collections
with existing classification systems such as CancerLit (700,000 abstracts)
and the NLM's UMLS (500,000 medical concepts) in medicine; GeoRef and Petroleum
Abstracts (800,000 abstracts) and GeoRef Thesaurus (26,000 terms) in geoscience;
and on Web applications including a collection with 1.5 million Web pages
and the Yahoo! classification system (20,000 categories). Using simulations
on parallel, high performance platforms, scientists will optimize and evaluate
the output of the various algorithms and develop hierarchical display methods
to visualize the results. http://ai.bpa.arizona.edu/go/dl/ |
||||||||||||
| Virtual workspaces | Even when digital collections
are structured and catalogued internally, many users also need something
akin to a large, well-lit library table on which they can spread out items
from various sources to work with and compare. Harvard-MIT Data Center researchers
are jointly designing a Virtual Data Center (VDC) to manage and share quantitative
social-science materials for research and teaching among multiple institutions
and the public. The VDC will link with other research centers and databases,
enabling participants to deposit data in many formats and set terms of access
to their materials. Users will be able to download data containing only
the variables they specify. The VDC's suite of open software tools ultimately
will be offered as a free, portable product. http://www.thedata.org |
||||||||||||
| Security, quality, access, and reliability |
In addition to effective classification systems and tools for users, the infrastructure of digital libraries, like that of their physical counterparts, requires systems ensuring the physical security of the collection, quality control, and remote access to the contents. Stanford University researchers are exploring ways to guarantee the long-term survival of digital information despite media obsolescence, natural disasters, and institutional change. They are prototyping techniques for automatically monitoring changes in collections and continuously "mirroring" the information into a large-scale archive that is automatically replicated at other sites. The prototype also uses mathematical models of projected failures in storage media to alert human operators to possible malfunctions. At Cornell University, researchers
are focusing on the integrity of digital library information, devising
prototype administrative architectures to ensure that archived information
is reliable and readily available, and that the intellectual-property
rights of authors and the privacy rights of users are protected. http://www.prism.cornell.edu
|
||||||||||||
|
|
|||||||||||||
| International efforts |
In only a few years, DL activities have expanded to encompass not only digital construction work on important human records but also international collaborations to facilitate universal access to these new information resources. |
||||||||||||
| U.S.-U.K. activities | An initiative
of NSF and Britain's Joint Information Systems Committee, for example, supports
international research to solve fundamental technical problems in linking
and accessing geographically distributed materials in differing formats.
These projects include:
|
||||||||||||
| U.S.-Germany activities | In January
2000, NSF/DLI Phase Two and Germany's Deutsche Forschungsgemeinschaft issued
a joint call for collaborative proposals from U.S. and German university
researchers on developing and organizing internationally accessible digital
collections. |
||||||||||||
| NSF-EU working groups | The Joint
NSF-European Union (EU) working groups on future directions for digital
libraries research have completed their initial studies of national, technical,
social, and economic issues and plans for common research agendas. Five
working groups--each of which includes U.S. researchers from academia, industry,
and government-addressed economic issues and intellectual property rights,
interoperability among digital library systems, metadata, multilingual information
access, and resource indexing and searching issues in globally distributed
digital libraries. The final report, entitled "An International Research
Agenda for Digital Libraries," and working papers can be found at: http://www.si.umich.edu/UMDL/EU_Grant/home.htm
and http://www.iei.pi.cnr.it/DELOS//NSF/nsf.htm. |
||||||||||||
| |
|
||||||||||||
|
|
|||||||||||||