Molecular Design and Process Optimization
Biological Applications of Quantum Chemistry
Ab initio quantum mechanical methods are being developed for massively parallel computing systems. Unlike empirical force field (or molecular mechanics) or semi-empirical methods, ab initio methods are not parameterized and therefore can be used to describe previously unknown chemical substances with a good degree of accuracy. Because they are computationally intensive, to date these methods have been applied to small chemical systems that generally have fewer than 30 atoms. Systems of biological interest usually have more than 100 atoms and will require the gigaflops speed of massively parallel systems.
The Hartree-Fock Self-Consistent Field (SCF) approximation to the time-dependent Schroedinger equation has been implemented on a MIMD distributed memory parallel system. The computational bottleneck is in calculating a large number of integrals, saving the large data set of results on disk, and reading a subset of the data at every iteration in the approximation. The alternative direct SCF method that recalculates a smaller number of integrals at each iteration and saves the results in memory lends itself to parallelization. A new "mpqc" software and its underlying library written in C perform direct method calculations. One feature of the library is that matrices are distributed across the processors, eliminating the restriction on matrix size to the size of a processor's memory.
The mpqc software and extensions and enhancements to it are the building blocks for optimizing molecular geometries. Such geometries can be used in determining other molecular properties and the energy of chemical reactions. Optimization methods written in C++ have been implemented in mpqc. This software has been used to optimize geometries of molecules documented in the scientific literature in 10 iterations compared with other methods that took 90 iterations or in some cases were not converging after similar numbers of iterations.
This is joint work involving DCRT and Sandia National Laboratories. Future efforts involve converting mpqc libraries to C++ and applications in other areas of computational chemistry.
Physical models and mathematical algorithms for numerical simulation of biomolecular systems are being developed at the University of Houston with NSF support. The research team combines expertise in chemistry, biophysics, biochemistry, chemical engineering, computer science, and mathematics.
The acetylcholinesterase dimer (AChE), partly shown in the figure below, is an enzyme responsible for degrading the neurotransmitter acetylcholine in species from man on down to insects. AChE is a target for many commonly used drugs and toxins; among the drugs that bind to AChE are therapeutic agents for Alzheimer's disease, myasthenia gravis, and glaucoma. Using numerical simulation, a second "back door" to the active site was recently discovered, creating the likelihood that substrates can come in one door and exit through the other. The Intel Touchstone Delta at Caltech and the Intel Paragon at SDSC will now be used for more extensive simulations that should explain how this back door works: what makes it open and close, and what can pass through it and what can't.
The computations are being done using EulerGromos, the scalable parallel molecular dynamics software developed at the University of Houston. Using 256 processors on the Delta, simulations of the full solvated dimer involving 131,653 dynamical atoms take approximately 20 seconds per time step and require on the order of 50,000 time steps.
Recent work by the team and others on data parallel programming languages and techniques has resolved many of the core programming issues for multicomputers. The programming languages that have been developed require further refinement based on experience gained by implementing the sophisticated numerical algorithms the simulations require. Software engineering techniques have been developed to facilitate efficient porting of several codes to both distributed-memory and shared-memory processors. Furthermore, at least three different parallel programming paradigms are being used as appropriate for different applications, and in most cases automatic translation between them can be accomplished easily.
http://sina.tcamc.uh.edu/tcamc
Electrostatic field, shown in yellow, of the acetylcholinesterase enzyme. The known active site is shown in blue; the second 'back door' to the active site is thought be at the spot where the field lines extend toward the top of the picture.
In this NSF-funded Grand Challenge, the Theoretical Biology Group at the University of Illinois at Urbana-Champaign in collaboration with Duke University, Yale University, and New York University are developing tools for using high performance computing systems for research in structural biology. Their MDScope product includes (1) molecular visualization software for interactive display of molecular systems, (2) molecular dynamics software designed for performance, scalability, modularity, and portability, and (3) a protocol and library that functions as the unifying communication agent between the other two components.
MDScope allows scientists to explore the attributes of macromolecules in an immediate and visual way, and facilitates research into more complex systems than could not be readily understood using traditional methods. It will have uses in computer-aided drug design and protein structure refinement and prediction. On-going projects include specific membrane proteins, protein-DNA complexes, muscle proteins, and virus coats.
This work was conducted on Silicon Graphics and HP workstations.
http://www.ks.uiuc.edu:1250/NSF_HPCC/
A portion of the Glucocorticoid Receptor bound to DNA; the receptor helps to regulate expression of the genetic code.
The so-called de novo protein structure prediction problem is important to both pharmaceutical drug designers and molecular biologists because three-dimensional protein structure essentially determines the protein's biological function. This DOE Grand Challenge has two goals: (1) the development and validation of algorithms and procedures for three-dimensional protein structure modeling and prediction, and (2) the implementation of a Protein Folding Workbench that allows biologists and biochemists to use large-scale parallel computing resources to explore various approaches to protein structure modeling and prediction. Biologically relevant models that can incorporate general experimental data and that provide interfaces oriented to the needs of the biology and biochemistry communities are emphasized. A hierarchical approach to the structure prediction problem moves from a "coarse grained" discrete lattice representation to a "fine grained" full atom representation.
This work was conducted by researchers from Caltech, the University of Washington, Argonne, and CRPC. Resources included an Intel Paragon, an IBM SP-1, 12 networked IBM workstations, and an ATM fiber optic network funded by Pacific Bell.
http://www.compbio.caltech.edu
The upper figure shows the known structure of the protein crambin from the Brookhaven Protein Data Base (PDB), and the lower figure is the best selection from a large ensemble of candidate chains, generated on a fcc (face-centered cubic) lattice using a guided replication Monte Carlo chain generation algorithm. Development of the algorithm and its serial and parallel implementations was funded by the HPCC Program. The three-dimensional structure prediction procedure was benchmarked at about 6 minutes on a 500- node Intel Paragon versus 24 hours on a single-processor IBM RS6000 workstation, a 225-fold speedup.
Modern genetic engineering provides methods for modifying biological molecules (biomolecules) such as proteins to have properties for applications outside of living systems -- for example in chemical manufacturing, environmental remediation, and materials development. This project focuses on developing computational methods for relating the atomic structure of such proteins to their properties and function.
In biological systems, chemical reactions are usually catalyzed by protein molecules called enzymes. Accurate computational methods that use quantum-mechanical models have been developed for studying these enzymatic reactions. These models describe the making and breaking of chemical bonds in a reaction that is controlled by the enzymatic catalyst. Because these enzymes typically contain thousands of atoms, fully quantum methods are intractable even on the largest computing systems, and new computational methods have been developed. The new methods are hybrids in which the active portion of the enzyme, which is involved directly in the chemical reaction, is modeled with quantum mechanics while the bulk of the molecule and the solvent are treated by computationally less demanding classical methods.
The core of the computation is the GAMESS (Generalized Atomic and Molecular Electronic Structure System) quantum chemistry software that has been adapted for parallel computing systems and allows complex enzyme active sites to be modeled for the first time. Special modifications of GAMESS allow the interfacing of quantum and classical models. This project is joint between NIST and Iowa State University, which maintains and freely distributes the software to hundreds of government, industrial, and academic research laboratories worldwide.
http://gams.cam.nist.gov/hpcc