Preparing protein data sets

Protein specific features

Constructing data-sets for polypeptides, proteins and protein-like molecules is difficult.  Most of the sources of protein structures, such as the Protein Data Bank, do not include hydrogen atoms, or, if they are present, there are errors in where they are put.  These errors usually result in sites being ionized when they should be neutral, and vice versa.  If hydrogen atoms are not present, add them using any of the readily available utilities.

Before a meaningful calculation on a protein can be done, some checks need to be run to ensure that the system is chemically sensible.  These checks are:

The sequence for running these checks is as follows:

  1. Add keyword MOZYME and CHARGE=500, and run the data set.  It will run for a very short time, then quit.  Look at the end of the output file. It will say that the charge is incorrect, obviously, but more important, it will also say which sites are charged.  Using a GUI, look at each of these sites in turn, and check that they should or should not be charged.  Make whatever changes are necessary.
  2. Make sure that the net charge is in the range -3 to +3.  Edit the keyword CHARGE=n to use the correct net charge.
  3. If the system is a protein, a useful next step is to re-sequence the atoms to put them into the standard PDB format.  To do this, run the data set with keyword RESEQ.  This will also identify all the 20 common amino acid residues.  If any residues are not correct, edit the data set to correct them.  If you are using an unusual residue, its formula can be added using keyword XENO.
  4. Before optimizing everything, first optimize the positions of all hydrogen atoms.  To do this, set all optimization flags to zero, and do an optimization using OPT-H.
  5. To optimize the entire geometry, use Cartesian coordinates.  The MOZYME technique is very efficient for geometry optimization.  Once the geometry has been optimized, a single-point calculation can be done using conventional methods, if desired.  This can be used to verify that MOZYME gave the correct heat of formation.