Preparing protein data sets
Protein specific features
Constructing data-sets for polypeptides, proteins and protein-like molecules
is difficult. Most of the sources of protein structures, such as the
Protein Data Bank, do not include
hydrogen atoms, or, if they are present, there are errors in where they are put.
These errors usually result in sites being ionized when they should be neutral,
and vice versa. If hydrogen atoms are not present, add them using
any of the readily available utilities.
Before a meaningful calculation on a protein can be done, some checks need to be run to
ensure that the system is chemically sensible. These checks are:
- Ensure that the ionized sites, if any, are correct.
- Ensure that the total ionization is in the range -3 to +3.
- Ensure that the system is not a radical (unless it should be a radical)
The sequence for running these checks is as follows:
- Add keyword MOZYME and
CHARGE=500, and run the data set. It will run
for a very short time, then quit. Look at the end of the output file. It
will say that the charge is incorrect, obviously, but more important, it will
also say which sites are charged. Using a GUI, look at each of these
sites in turn, and check that they should or should not be charged. Make
whatever changes are necessary.
- Make sure that the net charge is in the range -3 to +3. Edit the
keyword CHARGE=n to use the correct net charge.
- If the system is a protein, a useful next step is to re-sequence the atoms
to put them into the standard PDB format. To do this, run the data set
with keyword RESEQ. This will also identify all the 20 common amino acid
residues. If any residues are not correct, edit the data set to correct
them. If you are using an unusual residue, its formula can be added
using keyword XENO.
- Before optimizing everything, first optimize the positions of all hydrogen
atoms. To do this, set all optimization flags to zero, and do an
optimization using OPT-H.
- To optimize the entire geometry, use Cartesian coordinates. The
MOZYME technique is very efficient for geometry optimization. Once the
geometry has been optimized, a single-point calculation can be done using
conventional methods, if desired. This can be used to verify that MOZYME
gave the correct heat of formation.