Content-type: text/HTML
The perspectives of quantum chemistry as of a valuable tool in chemical research are generally attributed to its ability to predict the electronic structures and properties of molecular systems with high degree of accuracy for reasonable time intervals. Nowadays, modern ab initio methods allow to obtain very accurate results for small molecules []. The price for this accuracy is a very fast growth of the required computational resources with that of the system's size (N4¸N7, where N is the dimension of the basis set). It limits the area of applicability of these methods by small and medium-sized molecules. Although the development of computer technology pushes forward our computational abilities the problem of large-scale calculations remains actual, especially taking into account growing needs of biological chemistry and nanochemistry.
There are two solutions of this problem proposed in the literature. The first one is the construction of hybrid schemes where different parts of a molecule are treated using different methods [,]: typically, a reaction center is described by some quantum chemical methods while the environment is accounted by classical force fields. The second solution is based on the construction of schemes with linear or almost linear dependence of the required computational resources on the system's size (O(N)-methods) [,]. The localization of electronic degrees of freedom due to the exponential decay of the one-electron density matrix elements in the coordinate space [] and the "principle of nearsightedness" by Kohn [] provide physical grounds for these schemes. It is important that most of the O(N)-methods are targeted to the tight-binding model and the required scalability is achieved by a loss of accuracy.
In the semiempirical domain the growth of computational costs with the system's size scales as N3 due to necessity to diagonalize the Fock matrix. A series of approaches has been proposed to avoid this step: the divide-and-conquer scheme [,], the density matrix search method [] and methods based on the localized molecular orbitals in orthogonal [] and non-orthogonal [] formulations. The possibility to devise semiempirical O(N)-methods opens an access to the development of the QM/QM hybrid schemes, where the environment region is treated by semiempirical methods of quantum chemistry []. It allows to solve the problem of the intersubsystem junction in these methods by a sequential derivation based on the perturbation expansions [].
The main purpose of semiempirical methods is to provide a reliable quantum description for large and ultra-large molecular systems. The development perspectives for the semiempirical domain are usually considered [] from the viewpoint of improving the existing parameterization schemes, adding classical terms responsible for the dispersion interactions and introducing sophisticated orthogonalization correstions [] and effective core potentials [,]. These modifications are devised within the basic one-determinant approximation. An alternative route is based on the account of electron correlation given by several techniques employing perturbation expansions [], effective Hamiltonians [], valence bonds [], and multi-reference configuration interaction []. In the present paper we pursue the goal to improve both the wave function's structure and the scalability properties in the framework of the NDDO parameterization.
In a series of papers [,,] we proposed and developed a family of variational semiempirical methods based on the trial wave function in the form of the antisymmetrized product of strictly local geminals (SLG). It takes into account the intrabond ("left-right") electron correlation while the interbond electron transfers are totally neglected due to the strictly local (atom-centered) character of one-electron states. These methods (quite surprisingly) required only a minor re-parameterization of the standard parameters' sets and provide the numerical estimates of heats of formation and eqilibrium geometries [,], as well as vertical ionization potentials [], which are superior by accuracy with respect to the corresponding SCF procedures. It was also shown that the SLG methods have a great potential in derivation of classical force fields [] and development of hybrid quantum/classical schemes [,].
When the MINDO/3 semiempirical Hamiltonian [] is used, the SLG scheme leads to almost linear scaling of computational costs with the system's size (the quadratic term becomes dominating only for very large molecular systems) []. The NDDO type of the Hamiltonian assumes more detailed account of the two-center Coulomb interactions. In combination with the SLG wave function it leads to the necessity of numerous integral transformations induced by basis set transformations. This step dominates the whole procedure and predetermines the quadratic scalability of the SLG-NDDO methods [] with substantial coefficients at the quadratic terms. In the present paper we consider long-range Coulomb interactions in the standard NDDO methods (MNDO [], AM1 [], and PM3 []) and demonstrate that the scheme previously adopted for estimates of the two-center repulsion integrals is not optimal for a local description of electronic structure. In the next Section we describe the SLG-NDDO approximation for the electronic structures of molecules and show that a modified form of molecular integrals leads to significant simplification of the whole procedure. Then, we compare different schemes for the estimates of the two-center repulsion integrals and present numerical results demonstrating the scalability properties and the quality of the calculated heats of formation for the modified SLG-NDDO procedure. Finally, we draw several conclusions on the perspectives of fast semiempirical methods with a local description of both electron correlation and one-electron states.
In this Section we consider a sequence of basic steps leading to semiempirical
NDDO methods employing a trial wave function in the form of the antisymmetrized
product of strictly local geminals []. The first step in the wave
function construction consists of orthogonal 4×4 transformations of
atomic orbitals (AOs) for each atom A with an {sp} basis set producing
hybrid orbitals (HOs) tm:
| (0) |
The HOs are assigned to geminals representing chemical bonds and lone pairs:
each chemical bond is spanned by two orbitals r and l for the right and
left ends of the bond, each lone pair is described by only one HO (r). The
m-th geminal representing a lone pair is given by one configuration:
| (0) |
| (0) |
| (0) |
| (0) |
The electronic structure of geminals can be characterized by the elements of
one- and two-electron density matrices:
| (0) |
The total energy is the sum of the electronic energy and the core-core
repulsion. The form of the last term adopted in the MNDO, AM1, and PM3 methods
is given in respective Refs. [,,]. The electronic energy can be
expressed through the density matrices and molecular integrals. The
transformation of the one-electron basis set induces the transformation of all
molecular integrals (those of resonance, core attraction, intraatomic and
interatomic Coulomb repulsion). The electronic energy is a sum of five terms
according to types of interactions in the Hamiltonian:
| (0) |
| (0) |
| (0) |
| (0) |
| (0) |
| (0) |
The total energy is a function of two classes of electronic structure parameters (transformation matrices hA and geminal amplitudes um, vm, and wm) which are all determined variationally. It is important that the total number of these parameters is proportional to the system's size (for the traditional SCF scheme the number of electronic structure parameters - MO LCAO coefficients or elements of one-electron density scales as N2) and they describe local fragments of electronic structure. It is a necessary prerequisite for constructing a linearly scaling method. The limiting factor in calculations are summations over the two-center repulsion integrals, i.e., transformations of these integrals from the basis of AOs to the basis of HOs. Here we consider a direct way to accelerate this step and to construct a linearly scaling procedure for NDDO-calculations of molecular energies.
The energy of the Coulomb interaction for two non-bonded atoms A and B
can be extracted from the total SLG energy and re-written in a simple form:
| (0) |
| (0) |
The Coulomb interaction between non-bonded atoms given by Eq. (13) can
be formally re-written in the form:
| (0) |
| (0) |
| (0) |
| (0) |
| (0) |
The Poisson equation Df = 0 is, however, not valid for semiempirical
potentials f. Therefore, in the semiempirical context it is more convenient
to introduce a second order tensor with a non-vanishing trace instead of the
usual quadrupole moment:
| (0) |
| (0) |
The potentials acting between the multipoles are not those known from
electrostatics since they are based not on the Coulomb interaction of the form
of 1/R but rather on a semiempirical interaction potential approximating the
effective values of integrals in the region of intermediate interatomic
distances. The most popular semiempirical potential adopted in the NDDO methods
is that proposed by Dewar, Sabelli, and Klopman [,]:
| (0) |
| (0) |
| (0) |
The charge-charge, {l1l2} = {00}, contribution reads:
| (0) |
The charge-dipole, {l1l2} = {01} or {10}, interaction gives two
contributions:
| (0) |
The dipole-dipole, {l1l2} = {11}, interaction can be written as a sum of
contributions proportional to the scalar product of dipole moments and the
product of their projections on the line connecting atoms A and B:
| (0) |
The charge-quadrupole, {l1l2} = {02} or {20}, interactions for
semiempirical potentials are the following:
| (0) |
| (0) |
The dipole-quadrupole, {l1l2} = {12} or {21}, interaction in general
formulation can be written as:
| (0) |
| (0) |
The quadrupole-quadrupole, {l1l2} = {22}, interaction includes
derivatives of a semiempirical potential up to the fourth order:
| (0) |
| (0) |
The formulae given above allow for relatively fast estimates of the two-center contribution to the total energy. Moreover, derivatives of the total energy with respect to molecular geometry parameters can be easily calculated and used for an efficient optimization. We implemented the SLG-NDDO method employing the true multipole scheme for the two-center repulsion integrals. In the next Section we give some numerical results illustrating its potential in semiempirical electronic structure calculations.
In the present paper we develop a semiempirical method based on the SLG wave function. Special attention is paid to the two-center repulsion integrals and interactions between non-bonded atoms. These integrals are the most important part of the NDDO scheme. It has been shown more than 50 years ago [,] that multipole expansions can adequately reproduce analytical electron repulsion integrals for a wide range of interatomic distances. These ideas are used in the standard NDDO methods where point multipoles are modeled by appropriate charge configurations. The last approximation makes the integrals non-invariant with respect to rotations of the coordinate frame because the charge configurations have non-vanishing higher multipole moments. In actual calculations this non-invariance is masked by performing them in the diatomic coordinate frame (rotations of the coordinate axes induce rotation of the diatomic coordinate frame and the integrals are calculated identically). The formulae given in the previous Section restore the invariance because they are written in a tensor form. The change of the computation scheme leads to changes in numerical estimates although both schemes have the same asymptotic behaviour at R® ¥. We analyze the consequences of these modifications for the molecular integrals and the total energy.
First, we consider the two-center repulsion integrals on the example of two
carbon atoms. Eq. (22) shows that the Dewar-Sabelli-Klopman
semiempirical potential depends on three parameters (r0, r1, and
r2). In the standard formulation [] these parameters are
determined from the condition that at the limit R® 0 the one-center
integrals describing the interaction between two monopoles (gss), two
dipoles (hsp), and two quadrupoles (hpp) must be reproduced. This
condition is of course arbitrary and, for example, the one-center limit for the
integral (ss|zz) is not reproduced by this scheme. The reason is simple: in
the sp-basis there are five independent one-center integrals - Slater-Condon
parameters F0, F2, G1 - which in general case can not be reproduced
by only three parameters r. Moreover, these formulae are designed for
interatomic distances not approaching zero. In the case of small interatomic
distances the difference between integrals evaluated using the formulae based
on the charge configurations and based on the true multipoles becomes
significant. Therefore, when we change the computation scheme the one-center
limit values also change. Taking this into account we can propose two different
recipes: the first one is based on the standard values of r's while the
second one uses modified values of r's reproducing the one-center limits
(gss, hsp, and hpp). In the latter case the value of r0
is not modified ( = 2gss-1). The parameters r1 and r2 in
the standard scheme are determined numerically. When we use the point
multipoles and their interactions we can obtain analytical expressions for
these parameters. For example:
| (0) |
Figs. 1-3 illustrate the dependence of the integrals (ss|sz), (sz|sz), and (sx|sx) (z axis connects two atoms in the diatomic coordinate frame) for the scheme based on the charge configurations and for that based on the true multipoles with the standard and modified parameters r1. These figures show that the curves differ significantly in the region of small interatomic distances. By contrast, in the case of R > 2 Å all the curves are quite close, so that only the closest neighbors are touched by the modification of the procedure. The convergence rate strongly depends on the integral type. In general, the curves with the adjusted value of r1 seem to be a better approximation to the standard ones than those based on the point multipoles with r1 unchanged. At the same time, in the case of the integral (sz|sz) the better coincidence of the curve based on modified r1 with the standard curve in the region of small R leads to a worse agreement for larger values of R.
We tested the performance of two types of procedures tentatively possible for evaluating interactions involving quadrupoles. We considered the (ss|zz) and (ss|xx) integrals. The interaction of charges gives the main contribution to these integrals. It is not affected by our modifications of the computation scheme. Therefore, we analyze only the contribution having the form of the charge-quadrupole interactions (or, equivalently, the differences (ss|zz)-(ss|ss) and (ss|xx)-(ss|ss)). These interactions constitute about 5% of the integrals at the region of characteristic interatomic distances between bonded atoms and their contribution decreases as R-2 for larger R. Figs. 4 and 5 represent the dependence of these differences on the interatomic distance calculated using the standard formulae and also formulae based on Eqs. (28) and (29). For comparison we also plot these contributions for the usual Coulomb law 1/R. As expected the expressions which do not use the assumption of validity of the Poisson equation for the potential give a better approximation to the standard values (especially for small R). The difference between two approximations based on the multipole-multipole interactions converges to zero in both cases as R-2. In the case of the charge-quadrupole interactions given by Eq. (29) the one-center limit is exactly zero ((ss|pp) and (ss|ss) are equal). It illustrates the fact that all one-center integrals cannot be reproduced as limiting cases of the corresponding two-center ones. In other cases (standard scheme and that based on [^(S)]) the one-center limit for the difference between these integrals (which is equal to -0.76 eV for the one-center integrals used in the MNDO scheme) is significantly overestimated.
It is important to test how the proposed modification of the computation scheme affects the performance of the SLG-NDDO approach. The first step in this direction is to analyze the computation time as a function of the scheme and the system's size. As test objects we consider regular hydrocarbon chains CnH2n+2. Figs. 1-5 demonstrate that at larger R the change of the scheme can only slightly affect the values of the integrals. Therefore, if the scheme based on the atomic multipoles is applied only to well separated pairs of atoms, the modified SLG-NDDO method can be used even with the parameters given in Ref. [], i.e. without any additional re-parameterization. In subsequent calculations we adopt the atomic multipole scheme for R > 3 Å , while the short-range repulsion integrals are calculated using the charge configurations scheme. To further specify the computation scheme we use the MNDO semiempirical Hamiltonian with the standard values of the r parameters and the formulae based on the [^(S)] tensor for the interactions involving quadrupoles.
Fig. 6 demonstrates the ratio of computation times for two SLG-MNDO schemes (based on the charge configurations and multipoles, respectively) as a function of the number of carbon atoms n in CnH2n+2. It can be seen that the dependence has two regions: it is almost linear for smaller n's and it is close to a constant for larger n. It can be readily understood taking into account that in the case of the charge configurations scheme a qudratic contribution to the dependence of computation costs on the system's size dominates over linear contribution even for small molecules. At the same time the scheme proposed above significantly reduces namely the quadratic component of this dependence and for relatively small molecules a linear contribution dominates. It is important that in the limit of large n, where a quadratic contribution is again dominant, the true multipole scheme leads to more than 30-ple acceleration of SLG-NDDO calculations.
It is clear that simple replacement of the charge configurations by the atomic multipoles can not affect the scalability of the method. Indeed, even very fast estimation of the two-center Coulomb interactions leads to a quadratic dependence of computation costs on the system's size for larger molecules. To make the scheme truly linearly scaling it is necessary to neglect interactions between very distant atoms. Cut-off procedures of that sort are well justified only for local states. In the case of the SLG method it is particularly simple because one-electron states forming carrier spaces are atom-centered. Fig. 7 shows the dependence of the required computation time on the system's size n for the modified SLG-NDDO method where all interactions between atoms separated by more than 20 Å are totally neglected. This figure unequivocally demonstrates that the method belongs to the family of O(N)-methods. It is important that the cut-off procedure leads to very small modification of the calculated heats of formation (less than 0.03 kcal/mol per CH2 fragment). Of course, in the case of more polarized molecules with significant effective atomic charges the charge-charge interactions beyond 20 Å should be explicitly considered to obtain the same accuracy. We compare our results (9000 atoms for about 100 sec. on a 0.7 MHz Pentium-III computer) with those obtained in the framework of the LocalSCF method (120000 atoms for about 16000 sec. on a 2.4GHz Pentium-IV computer) [] and deduce that the modified SLG-NDDO method is about 2 orders of magnitude faster than the LocalSCF one.
A detailed discussion of the quality of the SLG-NDDO numerical estimates as compared with the SCF-NDDO ones is given in Ref. []. In the present paper we consider only the heats of formation on the same set of molecules as given in Ref. []. The difference between calculated and experimental values can be found for each of the molecules and for the calculation methods. This error can be considered as a random variable and empirical distribution functions of errors for both methods can be readily constructed. Figs. 8 and 9 represent these functions in the coordinates linearizing the normal distribution plotted for the SLG-MNDO and SCF-MNDO methods, respectively, as well as linear fits for both distributions. The assumption of a normal distribution law for the errors seems to be valid for both methods because the sets of points are close to the corresponding linear fits (values of R2 are 0.967 and 0.983, respectively). The abscissa for the crossing of a linear fit and the x axis gives the value of the a parameter (average of the error's distribution), while the slope of a linear fit is s-1. Our analysis shows that the methods have equal values of s (about 7.6 kcal/mol) but the values of the a parameter differ significantly. In the case of the SCF-MNDO method this value is -2.3 kcal/mol and certifies that heats of formation are significantly underestimated in this method while in the case of the SLG-MNDO method the average is positive and its magnitude is more than 5 times smaller demonstrating the practical absense of the systematic error in the SLG-MNDO calculations. According to the Student's criterion the average errors for these methods are statistically different with a probability larger than 90% .
In the present paper we further develop semiempirical methods based on the strictly local geminals form of the trial wave function contrasting to the traditional SCF-based semiempirical quantum chemistry. The main result is that using invariant multipolar forms of the two-center repulsion integrals leads to a significant (more than 30 times) acceleration of the computation procedure. As a consequence we propose the modified SLG-NDDO methods for electronic structure calculations of large molecular systems. These methods can be easily transformed into a linear scaling form by the applying a cut-off procedures for the two-center Coulomb interactions. It is particularly important that such a modification of calculation procedure does not imply any re-parameterization. The SLG-NDDO methods provide high-quality estimates for the heats of formation of molecules which are well described by two-electron two-center chemical bonds and by lone pairs.
An important component of the scheme proposed is the possibility to sequentially define and use in calculation the atomic multipoles. They can be further used to calculate electrostatic potentials inside and outside molecules []. The tentative prospects of this scheme can be considered from somewhat different point of view. The SLG-MINDO/3 method allowed to construct a route from semiempirical quantum chemistry to classical force fields []. A multipole scheme for two-center interactions opens an access to construction of an NDDO-based deductive molecular mechanics (DMM). The construction of a DMM procedure is only one of many ways to further develop the computation schemes of the SLG-NDDO method. It is possible to construct a method where all two-center repulsion interactions (including those between bonded atoms) are calculated using the multipole scheme. At the same time this modification will require some re-parameterization of the Hamiltonian. An important way to generalize this treatment is to implement a scheme with local electron groups of arbitrary form (numbers of electrons and orbitals). A step in this direction has been made in Ref. [], where the trial wave function in the form of the antisymmetrized product of strictly local geminals and molecular orbitals has been proposed and analyzed. On the other hand, it is clear that the SLG wave function totally neglects interbond electron transfers and dispersion interactions. They can be taken into consideration using perturbation expansions developed for the geminal-type wave functions [,]. These interactions are short-range and, therefore, their account will not destroy the linear scalability of the SLG-NDDO methods.
This work has been completed during the stay of A.M.T. at RWTH Aachen in the frame of the Alexander von Humboldt Postdoctoral Fellowship which is gratefully acknowledged as well as the kind hospitality of Prof. Dr. R. Dronskowski. This work was financially supported through the RFBR grants No. 04-03-32146, 05-03-08010, 05-03-08074, 05-03-33118, 05-07-90067.