NIH NLM Biotechnology Informatics Budget Code:  
The NLM's National center for Biotechnology Information (NCBI) has been given the legislative mandate to create automated systems for storing and analyzing the vast and growing volume of data related to molecular biology, biochemistry, and genetics. This information science is an essential component of genome research and protein engineering and drug design, and seeks to develop analytical and predictive methods to identify key molecular patterns associated with health and disease as represented in a large number of related data banks encoding DNA, RNA, proteins, and other biologically important molecules. NCBI is implementing an integrated database of molecular sequence and structure which contains key linkages to the scientific literature and to existing biological data banks; the database is available to researchers nationally via the Internet using special purpose server-client software as well as the World Wide Web. Research on efficient and expressive data representation techniques for molecular sequence objects is conducted within NCBI's Computational Biology Branch and has lead to the development of new sequence analysis and retrieval methods. Within this distributed database architecture, the center builds and provides access to GenBank, the NIH DNA sequence databank which is a key data resource of the Human Genome Project. The Biotechnology Informatics program administered through NLM's Extramural Program also supports investigator-initiated research in computational biology via peer-reviewed grants.
Budget ($ M)
FY 95 Act 4.32
FY 96 Pres 4.35
FY 96 Est 5.67
FY 97 Rqst 6.17
Program Component Areas
  FY 96 FY 97
HECC    
LSN 4.54 4.92
HCS 0.68 0.75
HuCS 0.45 0.50
ETHR    
Agency Ties
DARPA  
NSF  
DOE  
NASA  
NIH  
NSA  
NIST  
NOAA  
EPA  
ED  
AHCPR  
VA  
Milestone Changes  
FY 1995 Actual Milestones FY 1996 Estimated Milestones FY 1997 Agency Requested Milestones
Tenfold increase in linkage of gene sequences to related scientific literature. Linkage of sequence data to three-dimensional protein structure data.

Coordinated effort with Washington University and Merck to add over 200,000 human cDNA sequences to GenBank.

Completed implementation of network-based sequence submission software.
Automated sequencing technology accelerates pace of input to database, from the current rate of doubling every 20 months. High throughput from cDNA sequencing may is expected to double size of database in less than one year.

NCBI begins distribution of human genetic disease database produced at Johns Hopkins.

Increased user and network demands for sequence search and retrieval, now approaching 30,000 queries per day, will require upgrading server software and hardware.
Increase in the number of funded genome centers will increase sequence data output to greater than 500,000 sequences per year.

Increased demand for retrieval services due to growth in Internet and the World Wide Web and the availability of complete genome from several organisms.

NCBI will continue participation with publishers of online journals in a WWW project linking the GenBank sequences and MEDLINE abstracts to the full text of scientific articles.

Immensity and complexity of genomic data will drive development of tools for synthesizing and summarizing data into higher level interactive views.