Information Technology Frontiers for a New Millenium
Digital Libraries Initiative, Phase Two
LeftRight
Introduction
Focus areas
Tracking footprints through an information space:
leveraging the document selections of expert problem solvers

Trusted Image Dissemination (TID)
Automatic reference librarians for the World Wide Web
Medical Informatics
International digital libraries


Overview

To help advance the digital revolution, the Digital Libraries (DL) Phase Two Initiative is building upon the successes of previous Federally supported DL research. The Initiative provides leadership in research to develop next generation digital libraries, to advance the use and usability of globally distributed networked information resources, and to focus on innovative applications. The DL Phase Two Initiative is jointly supported by NSF, DARPA, NLM, the Library of Congress, NASA, and the National Endowment for the Humanities, in partnership with the National Archives and Records Administration, the Smithsonian Institution, and the Institute of Museum and Library Sciences.
 
DL researchers are faced with the continued challenge of applying increased computational capacity and network bandwidth to large amounts of distributed, complex data and transforming that data into coherent, usable, and accessible information. DL researchers will build on and extend research and testbed activities in promising digital libraries areas; accelerate development, management, and accessibility of digital content and collections; create new capabilities and opportunities for digital libraries to serve existing and new user communities, including all levels of education; and study the interactions between humans and digital libraries in social and organizational contexts. The Initiative encourages partnering arrangements to create next generation operational systems in such areas as education, engineering and design, Earth and space sciences, biosciences, geography, economics, and the arts and humanities. It also addresses the digital libraries life cycle from information creation, access, and use, to archiving and preservation.
 
 
Many of poet and engraver William Blake's best-known poems were self-published by the author as "illuminated books," limited editions engraved and hand-colored by Blake himself, perhaps with the assistance of his wife. Due to the rarity of these volumes, most readers have never seen Blake's poems as he actually wished to present them. This problem is now being addressed with digital library technologies. Digital reproductions of many of Blake's illuminated books now appear in The William Blake Archive, which, according to its creators, is a free Web site "conceived as an international public resource that would provide unified access to major works of visual and literary art that are highly disparate, widely dispersed, and often severely restricted. The Archive integrates editions, catalogues, databases, and scholarly tools into an archive with fully searchable texts and images." The image to the left was featured in the January, 1999, issue of "D-Lib Magazine: The Magazine of Digital Library Research," which is produced by the Corporation for National Research Initiatives and is sponsored by DARPA on behalf of the Digital Libraries Initiative.
 
William Blake's The Book of Thel, copy O, plate 1 (detail). Lessing J. Rosenwald Collection, Library of Congress. Courtesy of the William Blake Archive <http://www.iath.virginia.edu/blake/>. Used by permission.



Focus areas

DL focus areas are:

  • Human-centered research
  • Content and collections-based research
  • Systems-centered research
Research topics include intelligent user interfaces; collaboration technologies and tools; methods, algorithms, and software leading to wide-spectrum information discovery, search, retrieval, manipulation, and presentation capabilities; efficient data capture, representation, preservation, and archiving; metadata; interoperability of content and collections; technologies, methods, and processes to address social, economic, and legal issues associated with the creation and use of digital collections; intelligent agents; advanced multimedia information capture, representation, and digitization; and open, networked architectures for new information environments capable of supporting complex information access, analysis, and collaborative work. Current DL projects include:



Tracking footprints
through an information
space: leveraging the
document selections of
expert problem solvers


At the Oregon Graduate Institute of Science and Technology, NSF-supported researchers are helping problem-solvers find information in large, complex information spaces without the distraction of redundant or irrelevant information. Their research focuses on healthcare, where patients' medical records are typically large, complex, geographically distributed collections of documents created by numerous healthcare professionals for divergent purposes over an extended period. Their approach is to capture and trace the information used by experts as they solve problems and exploit this information to assist others. This research is conducted by a cross-disciplinary team comprised of a medical doctor who is focusing on the information-seeking behavior of physicians and computer scientists who are extracting and using regularly structured information.



Trusted Image
Dissemination (TID)


Since modern computing has greatly facilitated the use of information in the form of images, filtering images in addition to text has become essential. To help extend security and privacy protection to multimedia databases, researchers at Stanford University are focusing on TID technologies to provide image-filtering capabilities that complement traditional means of checking the contents of multimedia documents. TID will be used to restrict or screen information contained in images that are part of electronic patient records to avoid violations of security or privacy. TID research efforts focus on:

  • Further development of an existing wavelet-based algorithm for searching medical image databases and development of techniques to retrieve digital images and relevant textual information from multimedia medical databases
  • Extracting textual information from retrieved images
  • Developing techniques to edit digital medical images automatically and adapting and developing tools to manually edit the images to omit identifying information
  • Defining rules for protecting the privacy of medical images and implementing them in a security mediator
  • Developing a Web customer interface for the security mediator



Automatic reference
librarians for the
World Wide Web


At the University of Washington, researchers are developing more powerful automatic reference tools to help people retrieve high quality information efficiently from the Web. The central objective is to create software agents that possess "reference intelligence," which assumes only a limited understanding of complex topics but relies on a sophisticated understanding of how and where to find information. This reference intelligence would emulate human reference librarians, who are often not specialists in a topic of inquiry (for example, computational fluid dynamics) but are experts in identifying relevant resources on the topic (such as The International Journal of Fluid Dynamics).



Medical Informatics

In FY 2000, NLM plans to fund DL research in medical informatics, including projects that benefit healthcare consumers. The UMLS Knowledge Sources and the Visible Human dataset will be made available by NLM for use and testing in DL Phase Two projects.



International digital
libraries

To help avoid duplication of effort, prevent the development of fragmented digital systems, and encourage productive interchange of scientific knowledge and scholarly data around the world, NSF supports a program on International Digital Libraries Collaborative Research in FY 1999. This program will contribute to creating information systems that can operate in multiple languages, formats, media, and social and organizational contexts.
 
The program's goal is to enable users to access digital collections easily, regardless of location, language, or format, and to enable broad use in research, education, and commerce. A global information environment requires research on:

  • Methods and standards to ensure long term interoperability among distributed and separately administered databases for advanced retrieval of many kinds of information; for worldwide self-organizing databases and data mining; and for organizing and preserving domain-specific content
  • The development of linked, compatible databases with inherently regional information, such as databases of geographic, botanic, agricultural, demographic, and economic data
  • Technology for intellectual property protection in a global marketplace
NSF will fund U.S. participation in multi-country, multi-team projects to help foster long term sustainable relationships between U.S. and non-U.S. researchers and research organizations. Specific research areas include multi-lingual information systems, cross-language retrieval systems, language translation, and language teaching software; multinational digital libraries including sound, data, image, multimedia, software, and other forms of content; interoperability and scalability technologies for extremely large worldwide collections; metadata (data about data) techniques and tools; geospatial, environmental, biological, historical, and other information systems in which location is highly relevant; preservation and archiving of digital scholarly information, including technologies and procedures for long term information asset management; social aspects of digital libraries and cross-cultural context studies; use of digital libraries in educational technology at all levels; economic and copyright issues, including authentication, payment, rights formalism, trust, and fair use; electronic publishing and scholarly communication technologies, including collaboratories, online repositories, and new methods for distributing scientific knowledge.

LeftRight