PDB Lite: Nature of 3D Structural Data

Origins and Limitations of 3D Structural Data

Most of the three-dimensional macromolecular structure data in the Protein Data Bank were obtained by one of three methods: X-ray crystallography, solution nuclear magnetic resonance (NMR), or theoretical modeling. The first two are experimental methods. The empirical results of these experimental methods accurately describe the 3D structure of the molecule in the state in which measurements were made (provided the authors did not misinterpret the data, which happens on rare occasions). Crystallization sometimes distorts portions of a structure due to contacts between neighboring molecules in the crystal (e.g. malate dehydrogenase/4MDH, see Rhodes, p. 34). However, protein crystals as used for diffraction studies are highly hydrated ("wet and gelatinous") so structures determined from crystals are not much different from the structures of soluble proteins in aqueous solution. Some molecules have been studied both by crystallography and by solution NMR, and in these cases the agreement has been excellent. An early example is thioredoxin from E. coli (3TRX vs. 1SRX or 1TDE).

Crystal diffraction cannot resolve the positions of hydrogen atoms, and usually cannot reliably distinguish nitrogen from oxygen from carbon, making the positions of the terminal side-chain atoms uncertain for Asp, Gln and Thr. Sometimes there is also uncertainty about whether an atom not part of the known amino acids is a water oxygen, or a metal ion. Newer crystal diffraction PDB files contain hydrogen positions; these hydrogens were added by modeling.

Data obtained by theoretical modeling tend to be less accurate than those obtained by experimental methods. One kind of modeling, called homology modeling, involves fitting a known sequence to the experimentally determined 3D structure of a sequence-similar molecule. Results of homology modeling are more likely to be reliable than are results derived purely from theory (ab initio modeling).

X-Ray Crystallography

There are several experimental hurdles which must be crossed before the 3D structure of a macromolecule can be determined by X-ray diffraction from crystals. A good introduction to X-ray crystallography is the book Crystallography Made Clear by Gale Rhodes of the University of Southern Maine, Portland, USA.

First, the molecule must be crystallized, and the crystals must be singular (not 2 or more stuck together) and of perfect quality. Countless attempts to determine molecular structures have failed at this stage. Protein crystallization is as much an art as a science. Many important molecules are absent from the PDB because attempts to produce suitable crystals failed. Molecules with highly hydrophobic portions usually cannot be crystallized, although in a few cases crystals have been obtained in the presence of detergent (e.g. porin/2POR). This accounts for the near absence of the transmembrane portions of proteins in the PDB (though there are a few examples from solution NMR in detergent, e.g. glycophorin/1AFO, bacteriorhodopsin/1BHA, or in organic solvents, e.g. 1BHB). Much success has been had, however, in crystallizing cloned and expressed extracellular domains or portions of membrane-anchored receptors, portions lacking the hydrophobic membrane-anchoring regions.

Once a crystal is obtained, a diffraction pattern is produced by X-irradiation. This pattern consists of thousands of spots. The position and intensity of each spot is relatively easily determined, but the phases of the waves which formed each spot must also be determined in order to produce an electron density map. Solving the "phase problem" is the second hurdle. Often this has been accomplished by irradiating two or more derivatives of the same crystal which differ only in the presence of heavy metal ions. This "isomorphous replacement" method requires that metal ions be incorporated into a crystal without affecting its structure, sometimes difficult to achieve. A more recent solution to the phase problem involves using synchrotron radiation at multiple wavelengths. This has greatly accelerated the rate of solving crystal structures.

X-rays are diffracted by the electrons of the molecules in the crystal, so the result of successful crystallization and solution of the phase problem is a 3D image of the electron clouds of the molecule (an electron density map). A molecular model of the sequence of amino acids or nucleotides, which must be known independently, is then fitted into this electron density map, and a series of refinements are performed. The result is a set of X, Y, Z Cartesian coordinates for every non-hydrogen atom in the molecule.

One result of refinement is the assignment of a temperature factor to each atom in the final model. A high temperature factor suggests either disorder or thermal motion. Disorder means that the atom occupied different positions in different molecules in the crystal, while "thermal motion refers to vibration of an atom about its rest position" (Rhodes, p. 162). These possibilities cannot be distinguished solely from crystal diffraction data. If portions of a chain have high mobility or disorder, they produce low and fairly uniform electron density, making it impossible to assign positions to atoms in such portions. For this reason, it is not uncommon to find the ends of a protein chain, and perhaps a loop or two in the middle, missing from a crystallographic atomic coordinate file.

RasMol and Chime have on their Color menus a color scheme designated "temperature". This assigns warm colors to high temperature factors, and cool colors to low ones.

The reader is referred to Rhodes for an introduction to judging the quality of a crystallographic model. On page 183, Rhodes offers this caveat:

"All crystallographic models are not equal. ... The brightly colored stereo views of a protein model, which are in fact more akin to cartoons than to molecules, endow the model with a concreteness that exceeds the intentions of the thoughtful crystallographer. It is impossible for the crystallographer, with vivid recall of the massive labor that produced the model, to forget its shortcomings. It is all too easy for users of the model to be unaware of them. It is also all too easy for the user to be unaware that, through temperature factors, occupancies, undetected parts of the protein, and unexplained density, crystallography reveals more than a single molecular model shows."

Solution Nuclear Magnetic Resonance

A brief, readable introduction to the determination of protein structures by NMR is given by Creighton (pages 238-243). Solution nuclear magnetic resonance is performed on an aqueous solution of macromolecules, while the molecules tumble and vibrate with thermal motion. The result is "not as detailed and accurate as that obtained crystallographically" (Creighton, p. 238). The result of an NMR experiment is a series of similar models, rather than a single structure. Unlike X-ray crystallography, NMR resolves the positions of many hydrogen atoms. Portions of a chain with high mobility produce telltale signals in NMR. In contrast, by crystallography such portions would produce low and nearly uniform electron density and yield no information.

Non-Redundant, Representative Structures

Over 80% of the atomic coordinate files in the PDB represent closely related or similar sequences, single amino acid mutations or other slight variations from the physiologic structure. For example, if you search for perhaps the most exhaustively studied protein of all time, the enzyme lysozyme, you find over 450 structures.

Uwe Hobohm and Chris Sander at the European Molecular Biology Laboratory in Heidelberg, Germany, developed a list of Representative Structures, which is less than 20% of the entire PDB. Based on sequence comparisons, these structures meet criteria for non-redundancy or unrelatedness, so each is considered to be the representative of a class of similar sequences.

If you check the box for non-redundant structures, and search for "lysozyme", you find only 5 instead of >450!


This page by Eric Martz, who gratefully acknowledges critical reading and suggestions from Gale Rhodes.