MOZYME

 

Main MOZYME function

Conventional semiempirical methods use matrix algebra methods, most of which scale as the third power of the number of atoms. A consequence of this is that modeling of systems of more than 1,000 atoms is impractical.

By using Localized Molecular Orbitals (LMO), the Self-Consistent Field equations can be solved in a time proportional to the size of the system. Increasing the speed of the SCF moved the slow step to other parts of the calculation, so changes were made to all the remaining time-consuming steps to make them more efficient. Changes were also made that reduced the memory demand. The result of all these changes is that routine modeling operations can now be carried out on over 90% of all the entries in the Protein Data Bank (PDB), previously only about 10% of all entries could be used.

Geometric functions

Most of the common modeling functions have been enhanced by the MOZYME function, these include:

A general impression of the change in behavior can be obtained by comparing the resources required for various single-point, i.e., single SCF, calculations. Other operations, such as geometry optimization, consist of repeated SCF calculations. All calculations shown here were done using a Windows XP Pro computer with a 4000+ or 2.41 GHz Athlon processor.

The test systems used here were constructed from a single large protein, cut into pieces of the appropriate size.

Comparison of MOPAC and MOZYME computer resources required for a 1SCF calculation

No. of atoms

Time for 1SCF (minutes)

Memory (megabytes)

Ratio MOPAC/MOZYME

MOZYME

MOPAC

MOZYME

MOPAC

time

memory

400

0.2

2.3

17

101

12

6

800

0.4

25.4

33

391

64

12

1000

0.9

59.0

43

607

65

14

1500

2.3

222.6

78

1,424

97

18

2000

2.9

(527)¹

102

(2,532)

(182)

(25)

3000

5.6

(1,781)

164

(5,696)

(318)

(35)

5000

15.8

(8,244)

367

(15,822)

(522)

(43)

10000

56.9

(65,956)

1,007

(63,288)

(1,159)

(63)

15000

230.3

(222,600)

1,026

(142,400)

(966.4)

(139)

18000

308.4

(384,653)

1,262

(205,056)

(1,247)

(162)

1: Numbers in parentheses are estimated

After the first SCF calculation is done, all subsequent SCF calculations run much faster:

Accuracy

Properties of proteins (structure, energetics, reactions, etc.) are reproduced with good accuracy using PM6. For details of PM6, see: http://www.springerlink.com/content/ar33482301010477/fulltext.pdf

For accuracy of PM6 compared to other methods, see: http://openmopac.net/manual/index_accuracy.html

For accuracy of modeling protein properties, see: HTTP<add URL here>

 

Limitations

The current MOZYME function is restricted to closed-shell systems for which a Lewis structure can be generated. Only Restricted Hartree Fock methods can be used, ROHF and UHF are not available. This means that well-known radicals, such as ethyl, C2H5·, and triphenylmethyl, (C6H5)3C·, cannot be modeled. An exception to this rule concerns biomolecules that contain transition metals. These are normally treated as open-shell systems, but if the main focus of interest is on geometries and energetics, they can be modeled using closed shell methods.

Utilities

Getting started using MOZYME

Both because MOZYME is a new function in MOPAC, and because the issues involved in manipulating proteins are very different from those involving small molecules, all users are strongly recommended to spend some time familiarizing themselves with the new functions.

 

General guidelines

Starting Exercises

To help with this, the following exercises are suggested:

  1. Verify that MOZYME works: This is a simple example to show what a MOZYME output file looks like.  The system is Cys-Phe-Glu, but in the data set only the Cartesian coordinates are given.  By adding RESIDUES the various amino-acid residues in the tripeptide are identified. Information on the residues is printed shortly after the empirical formula is written in the output file. More new information on the residues in printed in the output.  This includes the total forces acting on the residues (a measure of how far the residue is from being optimized), and the net charge on each residues.  Most of these will be small, the exception being ionized residues, where the charge will be near to +1 or -1.
  2. Generate a PDB file: MOPAC can convert a data set into PDB format.  The simplest way to do this is to add PDBOUT and 0SCF to the keyword list.  0SCF prevents any semiempirical calculations from being run, so this operation is very fast, even for large systems. If information on the secondary structure (helices and beta sheets, etc.) is available, add that information to the comments section of the data set (lines at the start of the data set that begin with an asterisk, "*").
  3. Read in a PDB file: MOPAC should automatically recognize and read in PDB files.  All the text in the PDB file up to the first atom (a line starting with the word "ATOM") should be deleted, and the standard three lines of a MOPAC input data set put there instead.  If the PDB header text is needed, for example if PDBOUT is present, then convert the header lines into MOPAC comments by adding an asterisk ("*") to the start of each line.
  4. Re-sequence a data set to conform to the PDB format:  If atoms are added or removed, so that the sequence of atoms in the data set does not follow that of the PDB format, the atoms can be re-sequenced, using keyword RESEQ.  This example is artificial, in that all atoms of each type are together.  When RESEQ is used, the job is stopped after resequencing.
  5. Recognize an uncommon residue: In this example, a modified lysine group is correctly recognized by specifying the modification using  XENO.
  6. Identify a fault in a data set: A data-set will not run because the charge specified (zero) is incorrect.  By using CHARGES, the ionized sites can be identified.  By analyzing these sites, any faults in the data set can be identified, and corrective action taken.  In this case, the only fault was an incorrect charge.
  7. Optimize a polypeptide: In general, optimizing geometries is best done in two stages.  First, optimize the positions of all hydrogen atoms, then optimize the positions of all atoms.  In this example, all atoms are initially marked for optimization, but by switching off all optimization flags (NOOPT) then turning on the flags for hydrogen (OPT-H), only the hydrogen atoms are optimized. Once that's done, all atoms can then be optimized by switching on all the optimization flags (OPT).