Use of MKL and Multi-Threading to reduce computation time

This description is based on materials provided by Prof Gerd B. Rocha, Universidade Federal da Paraíba, João Pessoa, Brazil.

The modifications in the semiempirical quantum chemistry MOPAC2016 code that accelerate single-point energy calculations (1SCF) of medium-size (up to 4000 atoms) molecular systems using multithreaded shared-memory CPUs are now presented.

Computational aspects

The modifications consisted of using a combination of highly optimized linear algebra libraries (LAPACK and BLAS from Intel MKL (Math Kernel Library)) to reduce the computation time for parts of MOPAC such as the pseudodiagonalization, full diagonalization, and density matrix assembling.

The SCF procedure in MOPAC is driven by the ITER subroutine which is called exclusively by the COMPFG subroutine. The most time-consuming steps in conventional semi-empirical calculations are the Fock matrix diagonalization and density matrix assembly, since these are O(N³) procedures, N being the number of basis functions. The rest of the procedures carried out in the SCF have complexity, such as the assembly of Fock Matrix. So, most of our efforts were to replace these subroutine calls for more-optimized similar versions from MKL.

The pseudodiagonalization in MOPAC is carried out in subroutine DIAG, and its algorithm starts by forming the occupied-virtual block of the Fock matrix, Fo−v, in the molecular orbital basis of the previous iteration:

Fo-v = CoTFCv

where Co is the occupied orbitals sub-matrix and Cv is the same for the virtual ones. Two matrix multiplications are necessary for the assembly of Fo−v in this first part.

Once Fo−v is built, the second part of DIAG performs a series of 2x2 vector rotations (unitary transformations) which approximately eliminates the elements of Fo−v. Let i be indexes for the occupied orbitals and
α be indexes for the virtual ones. The expressions to calculate the new eigenvectors (Cinew and Canew) from the old ones (Ci and Cα) by applying successive rotations of 2x2 vectors, and these are given by the following formulae:

Cinew = cCi - sCa    and    Canew = sCi + cCa

where

s = Fia/(

εi - εa )  and c = (1 - s2)½

In these equations,
εi and εα are the molecular orbital energies for the last conventional SCF iteration (using a full diagonalization procedure). This approach significantly reduces the SCF procedure, making it faster.

To hasten it, we have replaced both matrix multiplications for more-optimized MKL procedures (DGEMM), as well as the Jacobi rotations step. Also, we have replaced the full diagonalization procedure subroutine (using subroutines from EISPACK 3 library) with Intel MKL DSYEVD. At the density matrix assembly step, we have used DSYRK instead of DGEMM, since the product matrix is symmetric.

Benchmarks

Below, we present times, in seconds, for a single-point calculation run on a 2 x 2.93 GHz 6-Core Intel Xeon with 16Gb of 1333 DDR3 memory chips Mac Pro computer, using 12 of the 24 threads

Species

#atoms

Current MOPAC2016

MOPAC with MKL

MOPAC2016 with MKL and multi-threading

Naphthalene

18

0.02

0.15

0.02

Crambin 642 468 114 12

(H

2O)573

1719

3,082

584

79

1G6X

1455

8,612

1,240

142

1EZG (Thermal Hysteresis Protein) 2064 22,959 2,118 300

1RNB (Barnase)

2066

34,372

4,108

411

Bacteriorhodopsin (1BRX) 3352 141,773 11,192 1,394

 

Additional details

Multi-threading does not speed up MOZYME calculations because they use sparse matrices.

In MOPAC, proteins can be modeled using the MOZYME technique, however, that technique is limited to closed shell RHF calculations. This means that proteins with free radical sites, excited state proteins, and proteins containing iron, chromium or other transition metal atoms should not be modeled using MOZYME.

Such modifications reduce the computation time so that conventional MOPAC methods can now be used for many proteins. This means that it is now practical to use conventional MOPAC methods - RHF-CI and UHF - for modeling most of the smaller proteins, and by implication that it will be possible to model many systems that currently cannot or should not be modeled using MOZYME.

All technical details of the modifications to the MOPAC2016 code, as well as some results, benchmarks and an application in biomolecules, can be found in the publication:

Maia, J. D. C.; Urquiza Carvalho, G. A.; Mangueira, C. P.; Santana, S. R.; Cabral, L. A. F.; Rocha, G. B., "GPU Linear Algebra Libraries and GPGPU Programming for Accelerating MOPAC Semiempirical Quantum Chemistry Calculations" Journal of Chemical Theory and Computation 2012, 8, 3072–3081. See: http://pubs.acs.org/doi/abs/10.1021/ct3004645


 

In this publication you also can find all about the GPU-powered MOPAC that was released in January 2014. For details of the effect of using a GPU, see GPU.