The objective of LOCATE-TS is to locate and refine a transition state joining two stationary points in an enzyme-catalyzed reaction. These points are more commonly referred to as reactants and products, but for convenience in this description these geometries will be referred to as A and B. A and B are used because during the calculation the geometries will be modified as they move towards each other, and would therefore no longer be stationary points. In LOCATE-TS, the geometry optimization is performed on both A and B simultaneously. For small systems, e.g., non-enzymic systems, consider using SADDLE.
The function being optimized is:
ΔHf' = ΔHfA + ΔHfB +cΣi(XAi - XBi)2
where ΔHfA and ΔHfB are the calculated heats of formation of A and B, respectively. "c" is a constant, in kcal/mol/Ångstrom2, and XAi and XBi are the coordinates of atom "i" in the A and B, respectively.
Both geometries used must be as good as possible: They must be stationary points on the Potential Energy Surface (PES); they should include PDB data, either by using a PDB file or, more commonly, by using the normal MOPAC data-set format, and having the atoms labeled with PDB data.
Have all the files involved in the same folder. Although not essential, this will make the job of defining GEO_DAT and GEO_REF easier.
Because of the large probability of introducing errors into the data sets, instead of preparing specific data-sets it is easier to define the geometries to be used using GEO_DAT and GEO_REF, and having these keywords point to ARC files. This results in a very small data set. For example, if the reactant geometry is in Step_1.arc and the product geometry is in Step_2.arc, the data set would be as follows:
Example of a complete data set for LOCATE-TS
LOCATE-TS GEO_DAT="Step_1.arc" GEO_REF="Step_2.arc" EPS=78.4
EPS is specified here, because using implicit solvation gives a more realistic model. Keyword LOCATE-TS causes the MOZYME function to be used, so there is no need to add keyword MOZYME, if it is supplied, it will be ignored.
WARNING: The quality of the transition state geometry is normally not very good. It's good enough to allow tests, such as FORCE and inspection using a GUI, to be run to verify that it is, in fact, a transition state, but the heat of formation is likely to be in error because the geometry is not fully optimized. Unless corrected, this fault would prevent any comparison of transition state heats of formation with heats of formation of ground-state systems. This correction is not done automatically - finding transition state geometries requires a lot of CPU time, and, often at the start of a project, many attempts will fail. So rather than perform a lengthy geometry optimization on a faulty transition state, a "quick and dirty" job is done. Once a good transition state is found, it can then be refined as follows:
Geometries resulting from transition state refinement could be regarded as being composed of two sets: those atoms directly involved in the reaction, and all other atoms. As a result of the refinement the positions of all atoms in the first set, typically 7 or fewer, would be highly optimized. By freezing the positions of these atoms and then running the second geometry optimization procedure, the geometry of the entire transition-state system could be optimized to the same precision as that used for stable intermediates. In test calculations, a gradient minimization was performed using the atoms in the first set to verify that this operation did not introduce spurious forces; in all cases, this resulted in only insignificant changes in the heat of formation and in the positions of the atoms. Use INVERT to quickly switch optimization flags.
If keyword LET is present, docking of the two systems will not be performed in a LOCATE-TSS job. This is useful in small systems if the atoms are in the same order in the reactants and products. Avoid using LET in large systems, instead identify any mis-matches and edit the reactants or product labels to correct the error.
Quite often, particularly with difficult reactions, e.g., hydride, H-, migration, the gradient minimizer might fail to produce a valid geometry. When this happens, have a look at the geometry in "<filename> Loop1.mop" If it looks reasonable, proceed as follows:
Edit the set of keywords to convert it into a normal transition-state optimization calculation. This will involve deleting the keywords GEO_REF and LOCATE-TS, and adding the keywords MOZYME, TS, GNORM=3. The geometry already has the flags set for the atoms involved in the reaction. Run that job. It will likely fail, but the resulting geometry will likely be nearer to the transition state, and if that geometry is run, the gradient minimization will likely work. Well, it did work on several difficult transition states.
At this point, a reasonable question is, why doesn't MOPAC do this itself. The short answer is "Yes, it should." The longer answer is, someone has to write the software to do this. An even longer answer is, there should be better gradient minimizers. Until work on this is done, the process just described is the best that exists in MOPAC.
A complete enzyme catalyzed mechanism for the hydrolysis of a peptide bond is given in Chymotrypsin Mechanism. The worked exercise involves determining the transition state for the first reaction step. Except for Step 3 => Step 4, the other steps are similar.
Download Step_1.arc and Step_2.arc and store them in a new folder. In the same folder, create a text file, Step_1_2_Transition_State.mop. Edit this file to add the text in the above example. Run the job using MOPAC. It should run for about one day. If an ARC file is generated, the test was successful. If it was not generated, please contact me at MrMOPAC@ATT.net, and send me the data set and output files.
The process that LOCATE-TS uses is as follows: The
reference data set geometry, defined using GEO_REF,
is rotated and translated to put it into the best overlap with the geometry from
the input data set. At this point, the two geometries are likely to be
quite different. This difference can be expressed as a distance, defined
as the square root of the sum in the above equation. Typical distances in
enzyme chemistry are
in the order of 50 - 300Å. Geometry optimization is then started, using a small
value for "c," by default this is 3 kcal/mol/Ångstrom2.
The first step consists of solving the SCF equations using the
MOZYME technique. (There is no need to specify
MOZYME, the presence of LOCATE-TS implies MOZYME.) All subsequent steps
use the wavefunction from this initial SCF calculation, and the pull exerted by
the "c" term. This causes the two geometries to move towards each other without
imposing a large stress. During this process, the distance will typically drop
by a large amount - the two geometries can be regarded as moving across an
almost level plane until they stop near the bottom of the activation barrier.
The value of "c" is then increased; the new default value is
An estimate of the transition state geometry, C, is obtained by averaging the two geometries. The next few steps involve a pair of operations, each pair being repeated a small number of times, typically two to five times. The first of this pair of operations involves geometry optimization of C, while holding fixed all atoms involved in bond making or bond breaking, i.e. optimizing the positions of all atoms except those in the active site. The second step involves transition state location, i.e., gradient minimization, but this time using only the atoms in the active site.
LOCATE-TS can be used on its own, i.e., without any of the terms in the square brackets; if that is done, then the default optimization procedure is used, and output is small. The default optimization can be reproduced using LOCATE-TS(C:3,30,30,30;SET:1)
About half the time the default LOCATE-TS does
not finish correctly. Almost always the first big step, moving the reactants and
products up the reaction barrier, runs correctly, and most of the time when
failures occur they occur in the refinement of the transition state. Because of
this behavior, at the end of the first big step three data sets are generated
that can then be used in attempting to refine the transition stae. These
data sets have the names <name>_30p0_first.mop,
and <name>_30p0_average.mop. <name>_30p0_first.mop is the
final geometry generated from the original data set (the reactant) or by the
file defined by GEO_DAT. <name>_30p0_second.mop is the final
geometry from GEO_REF. <name>_30p0_first.mop and
are thus the geometries on each side of the transition state, near the top of
the reaction barrier. An approximation to the transition state geometry is
the average of these two structures, this is given in <name>_30p0_average.mop.
The final value of "c" used is indicated by the text "30p0" This specific
case represents a value of "c" of
If LOCATE-TS does not finish correctly, examine the structures on each side of the transition state. If these look correct, then try to refine them by using this option. Do not start with a constraint lower than the last value used in the previous run - if the constraint is lowered, the two systems will move away from the transition state. Using options of the type shown here increases the possibility that the refinement would work correctly because the refinement would be started with a geometry that was nearer to the transition state.
If the structures on each side of the transition state do
not look correct, then start over with a more cautious set of constraints. An example of
such a cautious set would be LOCATE-TS(C:1,3,10,20,25,25;SET:1) This
starts the procedure with a small penalty function,
When this form of the keyword is used, the output will be larger, and intermediate files generated; these files can then be used in locating the transition state manually. The number of constraints used can be decreased to zero or increased up to 20. Two main options for the size of the active site are provided. SET:1 consists of only those atoms involved in bond making or bond breaking and SET:2 which consists of SET:1 plus nearest neighbors.
The two data sets used by LOCATE-TS are passed directly, i.e., without modification, to the transition state refinement operation. If the two geometries are near to the transition state, there is an increased probability that the process for recognizing bond-breaking and bond making operation will not work correctly. To avoid problems with this operation, use the next option, vis LOCATE-TS(SET:1).
This option uses only one data-set, usually the one generated by an earlier run, e.g., <name>_30p0_average.mop. Carefully define the atoms involved in bond-breaking and bond-making by setting their optimization flags set to "1", all other optimization flags being set to zero. When this option is used the option for "SET:2" is meaningless.
LOCATE-TS was developed and optimized for use with enzymes. It can be used for other species, but the probability of success is lower.
Three different gradient optimization methods can be used: Baker's Eigenfollowing method TS, the McIver-Komornicki gradient minimization method SIGMA, and Bartel's NLLSQ. If one of these keywords is present, that method will be used. Otherwise, Baker's Eigenfollowing method will be used by default.
LOCATE-TS was used to locate transition states in the chymotrypsin mechanism. To reproduce this mechanism, download the ARC files for the intermediates (Step 1 to Step 6), and use the LOCATE-TS keyword in the transition state stationary point arc files.