MOZYME

 

Main MOZYME function

Conventional semiempirical methods use matrix algebra methods, most of which scale as the third power of the number of atoms. A consequence of this is that modeling of systems of more than 1,000 atoms is impractical.

By using Localized Molecular Orbitals (LMO), the Self-Consistent Field equations can be solved in a time proportional to the size of the system. Increasing the speed of the SCF moved the slow step to other parts of the calculation, so changes were made to all the remaining time-consuming steps to make them more efficient. Changes were also made that reduced the memory demand. The result of all these changes is that routine modeling operations can now be carried out on over 90% of all the entries in the Protein Data Bank (PDB), previously only about 10% of all entries could be used.

Geometric functions

Most of the common modeling functions have been enhanced by the MOZYME function, these include:

A general impression of the change in behavior can be obtained by comparing the resources required for various single-point, i.e., single SCF, calculations. Other operations, such as geometry optimization, consist of repeated SCF calculations. All calculations shown here were done using a Windows XP Pro computer with a 4000+ or 2.41 GHz Athlon processor.

The test systems used here were constructed from a single large protein, cut into pieces of the appropriate size.

Comparison of MOPAC and MOZYME computer resources required for a 1SCF calculation

No. of atoms

Time for 1SCF (minutes)

Memory (megabytes)

Ratio MOPAC/MOZYME

MOZYME

MOPAC

MOZYME

MOPAC

time

memory

400

0.2

2.3

17

101

12

6

800

0.4

25.4

33

391

64

12

1000

0.9

59.0

43

607

65

14

1500

2.3

222.6

78

1,424

97

18

2000

2.9

(527)¹

102

(2,532)

(182)

(25)

3000

5.6

(1,781)

164

(5,696)

(318)

(35)

5000

15.8

(8,244)

367

(15,822)

(522)

(43)

10000

56.9

(65,956)

1,007

(63,288)

(1,159)

(63)

15000

230.3

(222,600)

1,026

(142,400)

(966.4)

(139)

18000

308.4

(384,653)

1,262

(205,056)

(1,247)

(162)

1: Numbers in parentheses are estimated

After the first SCF calculation is done, all subsequent SCF calculations run much faster:

Accuracy

Properties of proteins (structure, energetics, reactions, etc.) are reproduced with good accuracy using PM7. For details of PM7, see: http://link.springer.com/article/10.1007%2Fs00894-012-1667-x

For accuracy of PM7 compared to other methods, see:

http://openmopac.net/manual/index_accuracy.html

For accuracy of modeling protein properties, see: HTTP<add URL here>

 

Limitations

The current MOZYME function is restricted to closed-shell systems for which a Lewis structure can be generated. Only Restricted Hartree Fock methods can be used, ROHF and UHF are not available. This means that well-known radicals, such as ethyl, C2H5·, and triphenylmethyl, (C6H5)3C·, cannot be modeled. An exception to this rule concerns biomolecules that contain transition metals. These are normally treated as open-shell systems, but if the main focus of interest is on geometries and energetics, they can be modeled using closed shell methods.

Utilities

  • Write PDB files. When PDBOUT or RESEQ is used, the geometry is printed out in PDB format.
  • Getting started using MOZYME

    Because the issues involved in manipulating proteins are very different from those involving small molecules, all users are strongly recommended to spend some time familiarizing themselves with the MOZYME utilities.

     

    General guidelines

  • The initial geometry should be uncharged. In many proteins there are charged sites such as carboxylate anions and ammonium cationic sites; often these form internal salt bridges. Nevertheless, in order to avoid any ambiguity about the electronic state of the system, all charges should be neutralized during the preparation of a starting geometry by adding or removing hydrogen atoms
  • . Even systems, such as bacteriorhodopsin, that have a definite net charge, should be first represented as the neutral species. The only exceptions are when a strong monovalent ion such as potassium or tetra-alkylammonium cation, or a halide anion is present, in which case the ion should be correctly represented.

    Preconceived ideas of charge are often wildly inaccurate. These include generalizations such as "all acid functions should be ionized" and "all arginine residues should be protonated." If these ideas were implemented, the resulting net charge on a protein would likely be nonsense.

    This policy of requiring the initial structure to be uncharged might appear to be restrictive, but during a geometry optimization salt bridges will often form spontaneously, so the initial uncharged species is quickly replaced by a more realistic system. After the geometry is optimized, any ions the user considers important can then be added. If the entire protein has to have a certain net charge, protons can be added or removed as necessary. Thus bacteriorhodopsin would have the retinal Schiff base nitrogen atom protonated to give bR+.

    The end result is a well-defined protein with the correct charged sites.

    p systems. Thus if the hydrogen on the phenol oxygen in tyrosine is missing, the charge might be placed on the Cg rather than on the oxygen, that is, the Lewis structure MOPAC generated was for the quinone rather than the phenolic resonance structure.