Progress in Predicting Protein Folding

The process of a protein folding into its final (functional) structure is often classified in three stages: primary (protein amino acid sequence), secondary (helix bundles and beta sheets present in the protein), and tertiary (the final protein structure). The figures above illustrate the progress made during the last four years Ñ enabled by HPC hardware and algorithm enhancements Ñ in predicting tertiary structure from a specified secondary structure. The earliest effort, shown in the leftmost figure, shows good agreement (light strands are calculated and dark strands are experimental) for a relatively small four helix bundle protein, myoerythrin; this calculation took 30 minutes on an IBM RS/6000 workstation. The middle figure shows the results of calculations on a 16-node CM-5 parallel computer a few years later to predict the tertiary structure of a relatively large protein, myoglobin (red strands are calculated and blue strands are experimental). The novel algorithms used on the CM-5 were capable of examining approximately 10 billion structures, an effort that took 48 hours. Although promising, agreement between the calculated and experimental structures is not very good. The most recent simulations, shown in the right-most figure, used improved algorithms and refined potential functions to yield good agreement between calculated and experimental structures of myoglobin, especially in the core region where the helices are packed. A near term goal is to render this method capable of folding a large number of different proteins using information from existing databases.

The objective is to be able to predict protein structure directly from sequence information. In order to do this, the algorithms discussed here must be augmented by an accurate approach to secondary structure prediction. New methods to carry this out are currently being developed by the National center for Research Resources at NIH through their support of the center for Theoretical Simulation of Biological Systems at Columbia University.