MOZYME

MOZYME

Main MOZYME function

Conventional semiempirical methods use matrix algebra methods, most of which scale as the third power of the number of atoms. A consequence of this is that modeling of systems of more than 1,000 atoms is impractical.

By using Localized Molecular Orbitals (LMO), the Self-Consistent Field equations can be solved in a time proportional to the size of the system. Increasing the speed of the SCF moved the slow step to other parts of the calculation, so changes were made to all the remaining time-consuming steps to make them more efficient. Changes were also made that reduced the memory demand. The result of all these changes is that routine modeling operations can now be carried out on over 90% of all the entries in the Protein Data Bank (PDB), previously only about 10% of all entries could be used.

Geometric functions

Most of the common modeling functions have been enhanced by the MOZYME function, these include:

Single-point calculations.

A general impression of the change in behavior can be obtained by comparing the resources required for various single-point, i.e., single SCF, calculations. Other operations, such as geometry optimization, consist of repeated SCF calculations. All calculations shown here were done using a Windows XP Pro computer with a 4000+ or 2.41 GHz Athlon processor.

The test systems used here were constructed from a single large protein, cut into pieces of the appropriate size.

Comparison of MOPAC and MOZYME computer resources required for a 1SCF calculation

No. of atoms	Time for 1SCF (minutes)		Memory (megabytes)		Ratio MOPAC/MOZYME
No. of atoms	MOZYME	MOPAC	MOZYME	MOPAC	time	memory
400	0.2	2.3	17	101	12	6
800	0.4	25.4	33	391	64	12
1000	0.9	59.0	43	607	65	14
1500	2.3	222.6	78	1,424	97	18
2000	2.9	(527)¹	102	(2,532)	(182)	(25)
3000	5.6	(1,781)	164	(5,696)	(318)	(35)
5000	15.8	(8,244)	367	(15,822)	(522)	(43)
10000	56.9	(65,956)	1,007	(63,288)	(1,159)	(63)
15000	230.3	(222,600)	1,026	(142,400)	(966.4)	(139)
18000	308.4	(384,653)	1,262	(205,056)	(1,247)	(162)

1: Numbers in parentheses are estimated

After the first SCF calculation is done, all subsequent SCF calculations run much faster:

Geometry optimization:

Transition state location.

Transition state refinement.

Transition state characterization.

FORCE

Intrinsic Reaction Coordinate.

Reaction paths, grids, etc.

Accuracy

Properties of proteins (structure, energetics, reactions, etc.) are reproduced with good accuracy using PM7. For details of PM7, see: http://link.springer.com/article/10.1007%2Fs00894-012-1667-x

For accuracy of PM7 compared to other methods, see:

http://openmopac.net/manual/index_accuracy.html

For accuracy of modeling protein properties, see: HTTP<add URL here>

Limitations

The current MOZYME function is restricted to closed-shell systems for which a Lewis structure can be generated. Only Restricted Hartree Fock methods can be used, ROHF and UHF are not available. This means that well-known radicals, such as ethyl, C₂H₅·, and triphenylmethyl, (C₆H₅)₃C·, cannot be modeled. An exception to this rule concerns biomolecules that contain transition metals. These are normally treated as open-shell systems, but if the main focus of interest is on geometries and energetics, they can be modeled using closed shell methods.

Electronic properties:

Structures:

Upper size limit for geometry optimization:

Upper size limit for single point calculation (usually for generating forces):

Upper size limit for transition state operations, the IRC and the DRC:

Utilities

Detection of errors in the starting geometry. Of their nature, proteins are large molecules, and making sure that every atom has the correct valency is tedious. Every program for adding hydrogen atoms that was tested has faults, and even one fault can ruin a simulation. All faults resulting in incorrect valences can be identified by keyword CHARGES. This prints out the atom numbers and types of all charged atoms. These can then be quickly checked using a GUI, and any corrective action then taken.

Identification of residues.

RESIDUES

XENO

CVB

Identification of ionized residues.

Ramachandran angles.

RAMA

Atom and residue gradients.

Restart option.

Read PDB files. Protein data bank files are automatically recognized and read in. Most of the PDB format is supported.

Write PDB files. When PDBOUT or RESEQ is used, the geometry is printed out in PDB format.

Re-define chains and residue numbers

Resequence. When hydrogen atoms are added, they are usually put at the end of the data set. This results in a non-standard PDB format. The standard PDB format can be generated using RESEQ. This puts all atoms in the same residue together. If two geometries, from entirely different PDB files are compared, the atoms in each file must first be put into the same sequence, so that there is an exact one-to-one correspondence. Unavoidable ambiguities in the PDB format definition require some extra re-arranging of atoms to ensure maximum coincidence. This is done using keyword GEO_REF.

Getting started using MOZYME

Because the issues involved in manipulating proteins are very different from those involving small molecules, all users are strongly recommended to spend some time familiarizing themselves with the MOZYME utilities.

General guidelines

The initial geometry should be uncharged. In many proteins there are charged sites such as carboxylate anions and ammonium cationic sites; often these form internal salt bridges. Nevertheless, in order to avoid any ambiguity about the electronic state of the system, all charges should be neutralized during the preparation of a starting geometry by adding or removing hydrogen atoms

. Even systems, such as bacteriorhodopsin, that have a definite net charge, should be first represented as the neutral species. The only exceptions are when a strong monovalent ion such as potassium or tetra-alkylammonium cation, or a halide anion is present, in which case the ion should be correctly represented.

Preconceived ideas of charge are often wildly inaccurate. These include generalizations such as "all acid functions should be ionized" and "all arginine residues should be protonated." If these ideas were implemented, the resulting net charge on a protein would likely be nonsense.

This policy of requiring the initial structure to be uncharged might appear to be restrictive, but during a geometry optimization salt bridges will often form spontaneously, so the initial uncharged species is quickly replaced by a more realistic system. After the geometry is optimized, any ions the user considers important can then be added. If the entire protein has to have a certain net charge, protons can be added or removed as necessary. Thus bacteriorhodopsin would have the retinal Schiff base nitrogen atom protonated to give bR⁺.

The end result is a well-defined protein with the correct charged sites.

Check hetero groups. Most utilities that add hydrogen atoms do a good job of handling the common residues, but a poorer job with hetero systems, such as the heme group. Because of this, almost all the errors in adding hydrogen atoms occur with hetero groups. Therefore, these groups should be checked carefully to verify that the degree of saturation is correct.

Identify and characterize uncommon residues. If a residue is not recognized by MOPAC, check that it is correct. If it should be one of the 20 common residues, either add the missing atoms, or use XENO to allow the residue to be recognized. If it is a modified residue, such as the retinal group in bR, use XENO to re-define it as the parent residue Do not ignore unrecognized residues they usually indicate a fault in the data set.

Trust that the charged sites printed are correct. The method for calculating charged sites is very robust. If it says that a site is charged, believe it. In older versions of MOPAC, charged sites were not printed, and about twenty users reported errors in MOPAC, saying that the net charge it wanted was incorrect. In every case, errors were found in the users data set, due to simple errors in assigning hydrogen atoms.
Although the charges reported by MOPAC are always correct, minor problems might be encountered in extended, delocalized,

p systems. Thus if the hydrogen on the phenol oxygen in tyrosine is missing, the charge might be placed on the Cg rather than on the oxygen, that is, the Lewis structure MOPAC generated was for the quinone rather than the phenolic resonance structure.