Brandon CJ, Martin BP, McGee KJ, Stewart JJP, Braun-Sand SB (2015) An approach to creating a more realistic working model from a protein data bank entry. J Mol Modeling 21:1:11
An accurate model of three-dimensional protein structure is important in a variety of fields such as structure-based drug design and mechanistic studies of enzymatic reactions. While the entries in the Protein Data Bank (http://www.pdb.org) provide valuable information about protein structures, a small fraction of the PDB structures were found to contain anomalies not reported in the PDB file. The semiempirical PM7 method in MOPAC2012 was used for identifying anomalously short hydrogen bonds, C–H…O/C–H…N interactions, non-bonding close contacts, and unrealistic covalent bond lengths in recently published Protein Data Bank files. It was also used to generate new structures with these faults removed. When the semiempirical models were compared to those of PDB_REDO (http://www.cmbi.ru.nl/pdb_redo/), the clashscores, as defined by MolProbity (http://molprobity.biochem.duke.edu/), were better in about 50 % of the structures. The semiempirical models also had a lower root-mean-square-deviation value in nearly all cases than those from PDB_REDO, indicative of a better conservation of the tertiary structure. Finally, the semiempirical models were found to have lower clashscores than the initial PDB file in all but one case. Because this approach maintains as much of the original tertiary structure as possible while improving anomalous interactions, it should be useful to theoreticians, experimentalists, and crystallographers investigating the structure and function of proteins.
Martin BP, Brandon CJ, Stewart JJ, Braun-Sand SB (2015) Accuracy issues involved in modeling in vivo protein structures using PM7. Proteins: Structure, Function, and Bioinformatics 83 (8):1427-1435
Using the semiempirical method PM7, an attempt has been made to quantify the error in prediction of the in vivo structure of proteins relative to X-ray structures. Three important contributory factors are the experimental limitations of X-ray structures, the difference between the crystal and solution environments, and the errors due to PM7. The geometries of 19 proteins from the Protein Data Bank that had small R values, that is, high accuracy structures, were optimized and the resulting drop in heat of formation was calculated. Analysis of the changes showed that about 10% of this decrease in heat of formation was caused by faults in PM7, the balance being attributable to the X-ray structure and the difference between the crystal and solution environments. A previously unknown fault in PM7 was revealed during tests to validate the geometries generated using PM7. Clashscores generated by the Molprobity molecular mechanics structure validation program showed that PM7 was predicting unrealistically close contacts between nonbonding atoms in regions where the local geometry is dominated by very weak noncovalent interactions. The origin of this fault was traced to an underestimation of the core-core repulsion between atoms at distances smaller than the equilibrium distance.
Modern computational chemistry methods provide a powerful tool for use in refining the geometry of proteins determined by X-ray crystallography. Specifically, computational methods can be used to correctly place hydrogen atoms unresolved by this experimental method and improve bond geometry accuracy. Using the semiempirical method PM7, the structure of the nucleotide-sanitizing enzyme MTH1, complete with hydrolyzed substrate 8-oxo-dGMP, was optimized and the resulting geometry compared with the original X-ray structure of MTH1. After determining hydrogen atom placement and the identification of ionized sites, the charge distribution in the binding site was explored. Where comparison was possible, all the theoretical predictions were in good agreement with experimental observations. However, when these were combined with additional predictions for which experimental observations were not available, the result was a new and alternative description of the substrate-binding site interaction. An estimate was made of the strengths and weaknesses of the PM7 method for modeling proteins on varying scales, ranging from overall structure to individual interatomic distances. An attempt to correct a known fault in PM7, the under-estimation of steric repulsion, is also described. This work sheds light on the specificity of the enzyme MTH1 toward the substrate 8-oxo-dGTP; information that would facilitate drug development involving MTH1.
James J. P. Stewart, Journal of Molecular Modeling (2016) 22: 259.
A new method for predicting the energy contributions to substrate binding and to specificity has been developed. Conventional global optimization methods do not permit the subtle effects responsible for these properties to be modeled with sufficient precision to allow confidence to be placed in the results, but by making simple alterations to the model, the precision of the various energies involved can be improved from of about ±2 kcal∙mol-1 to ±0.1 kcal∙mol-1. This technique was applied to the oxidized nucleotide pyrophosphohydrolase enzyme MTH1. MTH1 is unusual in that the binding and reaction sites are well-separated, an advantage from a computational chemistry perspective that allows the energetics involved in docking to be modeled without the need to consider any issues relating to reaction mechanisms. In this study, two types of energy terms were investigated: the non-covalent interactions between the binding site and the substrate, and those responsible for discriminating between the oxidized nucleotide 8-oxo-dGTP and the normal dGTP. Both of these were investigated using the semiempirical method PM7 in the program MOPAC. Individual contributions from each residue to both the binding energy and the specificity of MTH1 were calculated by simulating the effect of mutations. Where comparisons were possible, all calculated results were in agreement with experimental observations. This technique provides a new insight into the binding mechanism that enzymes use for discriminating between possible substrates.
James J. P. Stewart, Journal of Molecular Modeling (2017) 23: 154. doi:10.1007/s00894-017-3326-8
The complete catalytic cycle for the serine protease a-chymotrypsin was investigated in an attempt to determine the suitability of using the semiempirical method PM7 in the program MOPAC for investigating enzyme‑catalyzed reactions. All six classical intermediates were modeled using standard methods, and were characterized as stable minima on the potential energy surface. Using a modified saddle point optimization method, five transition states were located and verified both by vibrational and by intrinsic reaction coordinate analysis. Some individual features, such as the hydrogen bonds in the oxyanion hole, the nature of various electrostatic interactions, and the role of Met192 were examined. This involved designing and running computational experiments to model mutations that would allow features of interest to be isolated.
Three features within the enzyme were examined in detail: the reaction site itself, where covalent bonds were made and broken, the electrostatic effects of the buried aspartate anion, a passive but essential component of the catalytic triad, and the oxyanion hole, where hydrogen bonds help stabilize charged intermediates.
With one minor exception, all phenomena investigated agreed with previously-reported descriptions. This result, along with the fact that all the techniques used were relatively straightforward, leads to the recommendation that PM7 and similar methods, such as PM6-D3H4, are appropriate for modeling similar enzyme-catalyzed reactions.