This preprint can also be viewed on arXiv.
The utility of quantum chemistry is limited by the inevitable tradeoff between the runtime of a calculation and the accuracy of the results obtained. In many domains, the applicability of quantum chemical calculations is determined not by the intrinsic error of the simulation but by the speed at which these calculations can be conducted and the size of the systems addressable. Accordingly, developing new methods that balance speed and accuracy in a Pareto-efficient manner is a crucial challenge facing computational chemists today.
Molecular quantum chemical calculations typically describe electron density through the linear combination of atom-centered Gaussian basis functions, and the choice of this basis set is key to both the speed and accuracy of the resulting calculation.1 The size of a basis set is often described in terms of ζ (zeta; the symbol traditionally used to denote basis-function exponents): single-ζ "minimal" basis sets contain only a single basis function per atomic orbital, double-ζ basis sets contain two basis functions per atomic orbital, and so forth. Most basis sets today employ "contracted" Gaussians, in which a single basis function is actually a linear combination of individual "primitive" Gaussian functions designed to more closely mimic the shape of the true hydrogenic wavefunctions.1
Small basis sets typically suffer from various pathologies: the electron density can be poorly described (basis-set incompleteness error, or BSIE) and interaction energies are often overestimated as fragments "borrow" adjacent basis functions from each other (basis-set superposition error, or BSSE).2 These errors have been shown to cause dramatically incorrect predictions of thermochemistry, geometries, and barrier heights.3,4,5 Accordingly, conventional wisdom holds that triple-ζ basis sets or larger are required for accurate energy calculations, as demonstrated in this quote from a recent guide to best practices in computational chemistry:6
Therefore, DZ [double-ζ] basis sets (like 6-31G** or def2-SVP) are no longer sufficient [for high-quality results], and we strongly advise against using them, except if they are part of purpose-made composite schemes. However, even in combination with full counterpoise corrections... the residual BSSE and BSIE of DZ basis sets can be substantial. Thus, we generally recommend at least TZ [triple-ζ] basis sets, which often yield results reasonably close to the basis set limit.
Unfortunately, increasing the number of basis functions dramatically increases the runtime of the resulting calculation. Triple-ζ basis sets are substantially slower than double-ζ basis sets—in a recent benchmark study conducted by Folmsbee and Hutchison,7 increasing the basis set from double-ζ (def2-SVP) to triple-ζ (def2-TZVP) caused calculation runtimes to increase more than five-fold. As a result, many calculations run on large or conformationally flexible systems still employ double-ζ basis sets for practical reasons, despite the known loss in accuracy that results.
One resolution to this unfortunate dilemma is the development of "composite" density-functional-theory (DFT) methods which use highly optimized combinations of functional, basis set, and empirical corrections to achieve significant speed increases relative to typical methods. Since 2013, Stefan Grimme and co-workers have developed a suite of these methods, which have seen widespread adoption by the computational chemistry community.8,9,10,11,12,13 While early composite methods featured numerous fine-tuned empirical corrections—including a short-range basis correction for electronegative elements, a geometric counterpoise correction to correct for BSSE, and reparameterization of the underlying functional—the latest composite DFT method, ωB97X-3c, employs only the D4 dispersion correction and a specially developed double-ζ basis set.13 This basis set, vDZP, extensively uses effective core potentials to remove core elections and relies on deeply contracted valence basis functions optimized on molecular systems to minimize BSSE almost down to the triple-ζ level.14
We hypothesized that the benefits of the vDZP basis set might not be limited to the ωB97X-3c method, but instead could allow for efficient and low-cost calculations with a variety of other density functionals. In this work, we investigate the general applicability of vDZP by investigating the combination of vDZP with four additional functionals: B3LYP, M06-2X, B97-D3BJ, and r2SCAN. In every case, we find that vDZP can produce highly accurate methods without any reparameterization of the functional or additional corrections beyond the now-standard empirical dispersion correction. We then examine B97-D3BJ and r2SCAN functionals in more depth, and show that vDZP-based methods have speed and accuracy similar to existing composite methods in a variety of different benchmarks, while substantially outperforming conventional double-ζ basis sets.
All computations were conducted with Psi4 1.9.1.15 The default settings in Psi4 were modified somewhat: a (99,590) integration grid with "robust" pruning, the Stratmann–Scuseria–Frisch quadrature scheme was employed,16 and an integral tolerance of 10-14 was used throughout. Density fitting was employed for all calculations, and a level shift of 0.10 Hartree was applied to accelerate SCF convergence. For the ROT34 benchmark, geometry optimizations were run using geomeTRIC 1.0.2.17
Due to the documented absence of fluorine in Psi4's internal implementation of vDZP, a custom basis-set file was used which adds the missing basis functions for fluorine.
Timing studies were run on a dedicated "Premium CPU-Optimized" Digital Ocean Droplet with 8 Intel Cascade Lake processors and 16 GB of memory, and Psi4 was given 12 GB of memory.
To assess the general applicability of vDZP, we selected four commonly used functionals from the "Charlotte's Web" of possible combinations of exchange and gradient approximations, with dispersion corrections as applicable: B97-D3BJ (GGA), r2SCAN-D4 (meta-GGA), B3LYP-D4 (hybrid GGA), and M06-2X (hybrid meta-GGA). (We also included ωB97X-D4, the range-separated hybrid functional explored in the original ωB97X-3c paper.) We evaluated all functionals in combination with vDZP on the expansive GMTKN55 main-group thermochemistry benchmark set, which is now standard for quantifying the accuracy of new DFT methods, and compared these results to reference values obtained with the large (aug)-def2-QZVP basis set.18 (Due to certain documented errors in Psi4's effective-core-potential implementation, the subsets NBPRC, FH51, DC13, C60ISO, and HEAVY28 were omitted.)
Table 1: Weighted errors for various properties in GMTKN55.
Functional | Basis Set | Basic Properties | Isomerization | Barrier Heights | Inter-NCI | Intra-NCI | WTMAD2 |
---|---|---|---|---|---|---|---|
B97-D3BJ | def2-QZVP | 5.43 | 14.21 | 13.13 | 5.11 | 7.84 | 8.42 |
vDZP | 7.70 | 13.58 | 13.25 | 7.27 | 8.60 | 9.56 | |
r2SCAN-D4 | def2-QZVP | 5.23 | 8.41 | 14.27 | 6.84 | 5.74 | 7.45 |
vDZP | 7.28 | 7.10 | 13.04 | 9.02 | 8.91 | 8.34 | |
B3LYP-D4 | def2-QZVP | 4.39 | 10.06 | 9.07 | 5.19 | 6.18 | 6.42 |
vDZP | 6.20 | 9.26 | 9.09 | 7.88 | 8.21 | 7.87 | |
M06-2X | def2-QZVP | 2.61 | 6.18 | 4.97 | 4.44 | 11.10 | 5.68 |
vDZP | 4.45 | 7.88 | 4.68 | 8.45 | 10.53 | 7.13 | |
ωB97X-D4a | def2-QZVP | 3.18 | 6.04 | 3.75 | 2.84 | 3.62 | 3.73 |
vDZP | 4.77 | 7.28 | 5.22 | 5.44 | 5.80 | 5.57 |
The results of this study are shown in Table 1. In every case, the overall accuracy of methods employing vDZP is only moderately worse than the accuracy of methods using the much larger (aug)-def2-QZVP basis set, suggesting that vDZP is a generally applicable low-cost basis set. To assess whether vDZP was overly tailored for the ωB97X-3c composite method, as with many components of previous composite methods, we compared the difference in accuracy between vDZP and (aug)-def2-QZVP for each functional under study. We found that the difference in overall accuracy was in fact largest for ωB97X-D4, suggesting that vDZP is indeed well-suited outside of the ωB97X-3c composite method.
Although the above results suggest that vDZP is effective for main-group thermochemistry, we wanted to more rigorously compare the performance of vDZP-based methods to popular composite methods with bespoke features. Accordingly, we selected B97-D3BJ/vDZP and r2SCAN-D4/vDZP for detailed investigation, and compared the performance of these methods on GMTKN55 to the existing r2SCAN-3c and B97-3c composite methods.11,12 To benchmark vDZP against other double-ζ basis sets, we also evaluated three commonly employed double-ζ basis sets: 6-31G(d), def2-SVP, and pcseg-1.
Table 2: In-depth evaluation of basis-set effects for B97-D3BJ.
Basis Set | ζ | Basic Properties | Isomerization | Barrier Heights | Inter-NCI | Intra-NCI | WTMAD2 |
---|---|---|---|---|---|---|---|
def2-QZVP | 4 | 5.43 | 14.21 | 13.13 | 5.11 | 7.84 | 8.42 |
mTZVPa | 3 | 7.34 | 23.70 | 13.14 | 10.25 | 8.43 | 11.72 |
vDZP | 2 | 7.70 | 13.58 | 13.25 | 7.27 | 8.60 | 9.56 |
pcseg-1 | 2 | 9.71 | 15.18 | 17.31 | 18.71 | 19.78 | 15.58 |
6-31G(d) | 2 | 11.64 | 16.79 | 18.14 | 22.43 | 21.62 | 17.64 |
def2-SVP | 2 | 11.46 | 15.93 | 18.68 | 26.79 | 25.42 | 19.17 |
Table 3: In-depth evaluation of basis-set effects for r2SCAN-D4.
Basis Set | ζ | Basic Properties | Isomerization | Barrier Heights | Inter-NCI | Intra-NCI | WTMAD2 |
---|---|---|---|---|---|---|---|
def2-QZVP | 4 | 5.23 | 8.41 | 14.27 | 6.84 | 5.74 | 7.45 |
mTZVPPa | 3 | 6.44 | 6.85 | 13.86 | 6.41 | 5.57 | 7.36 |
vDZP | 2 | 7.28 | 7.10 | 13.04 | 9.02 | 8.91 | 8.34 |
pcseg-1 | 2 | 9.09 | 8.47 | 17.20 | 18.47 | 20.89 | 14.44 |
6-31G(d) | 2 | 10.63 | 12.37 | 18.42 | 19.62 | 21.97 | 16.16 |
def2-SVP | 2 | 10.89 | 11.40 | 18.77 | 21.77 | 24.73 | 17.12 |
The results of this study are shown in Tables 2 and 3. In every case, vDZP far outperforms the other double-ζ basis sets. While the overall error of the other double-ζ basis sets is approximately twice that of the underlying functional, vDZP closely approaches the underlying error of the functional. In general, the vDZP-based methods have similar performance to the fine-tuned composite methods: while r2SCAN-3c still outperforms r2SCAN-D4/vDZP, B97-D3BJ/vDZP is markedly superior to B97-3c, especially for large systems and intramolecular non-covalent interactions.
Since GMTKN55 focuses exclusively on main-group elements, we next evaluated the accuracy of vDZP-based methods on transition metals by evaluating the revMOBH35 benchmark set,19,20 which assesses the ability of computational methods to predict barrier heights in organometallic systems (Table 4). We found that vDZP-based methods had comparable accuracy to that of the congeneric composite methods, with slightly increased errors observed in both cases.
Table 4: revMOBH35 benchmark results.
Method | MAE (kcal/mol) |
---|---|
GFN2-xTB | 11.2 |
B97-D3BJ/vDZP | 4.2 |
B97-3c | 3.6 |
r2SCAN-D4/vDZP | 2.9 |
r2SCAN-3c | 2.8 |
ωB97X-3c | 3.2 |
ωB97X-D4/def2-QZVPP | 2.4 |
ωB97X-V/def2-QZVPP | 2.3 |
To assess the accuracy of vDZP-based methods for geometry optimizations, we employed the ROT34 rotational constant benchmark set,21 which compares the rotational constants of optimized structures to experimental values (Table 5). vDZP-based methods outperformed composite methods both in terms of mean deviation (MD) and mean absolute deviation (MAD), and r2SCAN-D4/vDZP performed similarly to high-quality PBE0/def2-TZVP results.
Table 5: ROT34 benchmark results. All values in %.
Method | MD | MAD | MAX | RMSD |
---|---|---|---|---|
GFN2-xTB | -1.5 | 2.9 | 24.8 | 6.6 |
B97-D3BJ/vDZP | 0.2 | 0.5 | 1.2 | 0.5 |
B97-3c | 0.4 | 0.5 | 1.7 | 0.6 |
r2SCAN-D4/vDZP | -0.2 | 0.3 | 0.7 | 0.3 |
r2SCAN-3c | 0.8 | 0.8 | 1.5 | 0.4 |
ωB97X-3c | 0.1 | 0.5 | 2.7 | 0.7 |
PBE0-D3/def2-TZVP | -0.2 | 0.2 | 0.8 | 0.3 |
We also investigated the accuracy of vDZP-based methods at computing torsional energy profiles for drug-like molecules, an important and well-studied task in computer-assisted drug design.22 We evaluated a variety of methods on the TorsionNet206 dataset,23 which scores energies against high-level CCSD(T)/def2-TZVP benchmarks (Table 6). vDZP-based methods gave mean absolute errors of 0.4–0.5 kcal/mol, comparable to composite methods and conventional hybrid functionals with triple-ζ basis sets. In contrast, a commonly used double-ζ DFT method (B3LYP-D3BJ/6-31G(d)) gave much worse performance, demonstrating the importance of high-quality basis sets even for "easy" calculations like torsional scans.
Table 6: TorsionNet206 benchmark results, ranked by MAE.
Method | MAE (kcal/mol) |
---|---|
GFN2-xTB | 0.78 |
B3LYP-D3BJ/6-31G(d) | 0.58 |
r2SCAN-D4/vDZP | 0.46 |
r2SCAN-3c | 0.43 |
B97-D3BJ/vDZP | 0.42 |
B3LYP-D3/def2-TZVP | 0.40 |
M06-2X/def2-TZVP | 0.39 |
B97-3c | 0.36 |
ωB97M-D3BJ/def2-TZVPPD | 0.15 |
Since the utility of low-cost DFT methods arises not only from accuracy but also from computational efficiency, we also examined the relative speed of vDZP-based methods and composite DFT methods. Unlike energetic results, timing results are inherently hardware- and software-dependent and are thus less universal, but nevertheless useful trends can often still be divined. We compared the speed of B97-D3BJ/vDZP and r2SCAN-D4/vDZP to the composite methods B97-3c, r2SCAN-3c, and ωB97X-3c by measuring timing on a series of n-alkanes.
We found that the vDZP-based methods were on average 40% slower than the corresponding composite methods (r2SCAN-3c and B97-3c; Figure 1). This is somewhat surprising, given that vDZP contains fewer basis functions per atom than the triple-ζ mTZVP and mTZVPP basis sets used for B97-3c and r2SCAN-3c, respectively. Two factors are likely responsible for this. First, vDZP extensively uses ECPs to describe core electrons, which simplifies computation of two-electron integrals but complicates computation of the one-electron Hamiltonian, and the overall effect on timing likely depends a great deal on the ECP implementation and nature of the system under study. Secondly, vDZP minimizes BSSE by using deeply contracted Gaussian functions, such that the highest angular momentum is small but the degree of contraction is high, so the total number of primitive basis functions remains large.13 Existing two-electron-integral algorithms may be optimized for higher values of and lower values of , and different algorithms like the Pople–Hehre axis-switch method may prove optimal for vDZP and other deeply contracted basis sets.24,25
Figure 1: Timings for single-point energies of n-alkanes.
We also note that the extensive use of ECPs in vDZP can lead to substantial rate accelerations for systems with large numbers of heavy elements, since substantially fewer electrons will be modeled. For the particularly dramatic case of perbromo-n-pentane, B97-3c is 3.4x slower than B97-D3BJ/vDZP, and r2SCAN-3c is 2.7x slower than r2SCAN-D4/vDZP. Overall, vDZP-based methods appear to have comparable efficiency to composite methods, and we anticipate that they can be made considerably faster if their use become commonplace.
Conventional wisdom in computational chemistry holds that existing basis sets are relatively optimal, and that that the tradeoff between speed and accuracy can only be resolved by tight coupling of methods, basis sets, and empirical corrections. This work suggests that this is false. Here, we demonstrate that the recently reported vDZP basis set is not limited to the specific ωB97X-3c method for which it was originally reported, and instead can be combined with many different density functionals to produce fast and high-accuracy computational methods with Pareto efficiency comparable to bespoke composite methods.
More abstractly, the successes of vDZP detailed here suggest that there are considerable advances yet to come in basis-set optimization. The atypical features of vDZP—extensive use of large-core ECPs, deep contraction for valence orbitals, and parameter optimization on molecules, not atoms—are here demonstrated to create a robust and highly general solution for problems common to all electronic-structure-theory-based approaches. We anticipate that continued research into basis-set optimization will yield still faster and more accurate basis sets, allowing accurate quantum-chemical computations to scale to larger systems and timescales than ever before.
C.C.W. thanks Peter M. W. Gill and Todd MartĂnez for helpful discussions about quantum chemistry, many of which have indirectly percolated into this work.
The underlying data for the GMTKN55, ROT34, and revMOBH35 test sets, plus timing data on n-alkanes and perbromo-n-alkanes, is available in spreadsheet form on arXiv.
Nagy, B.; Jensen, F. Reviews in Computational Chemistry; John Wiley & Sons, Ltd, 2017; Chapter 3, pp 93–149.
Huzinaga, S. Basis sets for molecular calculations. Computer Physics Reports 1985, 2, 281–339.
Papajak, E.; Zheng, J.; Xu, X.; Leverentz, H. R.; Truhlar, D. G. Perspectives on Basis Sets Beautiful: Seasonal Plantings of Diffuse Basis Functions. Journal of Chemical Theory and Computation 2011, 7, 3027–3034, PMID: 26598144.
Boese, A. D.; Martin, J. M. L.; Handy, N. C. The role of the basis set: Assessing density functional theory. The Journal of Chemical Physics 2003, 119, 3005–3014.
Kruse, H.; Goerigk, L.; Grimme, S. Why the Standard B3LYP/6-31G* Model Chemistry Should Not Be Used in DFT Calculations of Molecular Thermochemistry: Understanding and Correcting the Problem. The Journal of Organic Chemistry 2012, 77, 10824–10834, PMID: 23153035.
Bursch, M.; Mewes, J.-M.; Hansen, A.; Grimme, S. Best-Practice DFT Protocols for Basic Molecular Computational Chemistry. Angewandte Chemie International Edition 2022, 61, e202205735.
Folmsbee, D.; Hutchison, G. Assessing conformer energies using electronic structure and machine learning methods. International Journal of Quantum Chemistry 2021, 121, e26381.
Sure, R.; Grimme, S. Corrected small basis set Hartree-Fock method for large systems. Journal of Computational Chemistry 2013, 34, 1672–1685.
Grimme, S.; Brandenburg, J. G.; Bannwarth, C.; Hansen, A. Consistent structures 11 and interactions by density functional theory with small atomic orbital basis sets. The Journal of Chemical Physics 2015, 143, 054107.
Brandenburg, J. G.; Hochheim, M.; Bredow, T.; Grimme, S. Low-Cost Quantum Chemical Methods for Noncovalent Interactions. The Journal of Physical Chemistry Letters 2014, 5, 4275–4284, PMID: 26273974.
Brandenburg, J. G.; Bannwarth, C.; Hansen, A.; Grimme, S. B97-3c: A revised low-cost variant of the B97-D density functional method. The Journal of Chemical Physics 2018, 148, 064104.
Grimme, S.; Hansen, A.; Ehlert, S.; Mewes, J.-M. r2SCAN-3c: A “Swiss army knife” composite electronic-structure method. The Journal of Chemical Physics 2021, 154, 064103.
Müller, M.; Hansen, A.; Grimme, S. ωB97X-3c: A composite range-separated hybrid DFT method with a molecule-optimized polarized valence double-ζ basis set. The Journal of Chemical Physics 2023, 158, 014103.
Chan, B. Optimal Small Basis Set and Geometric Counterpoise Correction for DFT Computations.Journal of Chemical Theory and Computation 2023, 19, 3958–3965, PMID: 37288982.
Turney, J. M. et al. Psi4: an open-source ab initio electronic structure program. WIREs Computational Molecular Science 2012, 2, 556–565.
Stratmann, R.; Scuseria, G. E.; Frisch, M. J. Achieving linear scaling in exchange- correlation density functional quadratures. Chemical Physics Letters 1996, 257, 213– 223.
Wang, L.-P.; Song, C. Geometry optimization made simple with translation and rotation coordinates. The Journal of Chemical Physics 2016, 144, 214108. 12
Goerigk, L.; Hansen, A.; Bauer, C.; Ehrlich, S.; Najibi, A.; Grimme, S. A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions. Phys. Chem. Chem. Phys. 2017, 19, 32184–32215.
Iron, M. A.; Janes, T. Evaluating Transition Metal Barrier Heights with the Latest Density Functional Theory Exchange–Correlation Functionals: The MOBH35 Benchmark Database. The Journal of Physical Chemistry A 2019, 123, 3761–3781, PMID: 30973722.
Semidalas, E.; Martin, J. M. The MOBH35 Metal–Organic Barrier Heights Reconsidered: Performance of Local-Orbital Coupled Cluster Approaches in Different Static Correlation Regimes. Journal of Chemical Theory and Computation 2022, 18, 883–898, PMID: 35045709.
Risthaus, T.; Steinmetz, M.; Grimme, S. Implementation of nuclear gradients of range- separated hybrid density functionals and benchmarking on rotational constants for organic molecules. Journal of Computational Chemistry 2014, 35, 1509–1516.
Behara, P. K.; Jang, H.; Horton, J. T.; Gokey, T.; Dotson, D. L.; Boothroyd, S.; Bayly, C. I.; Cole, D. J.; Wang, L.-P.; Mobley, D. L. Benchmarking Quantum Mechanical Levels of Theory for Valence Parametrization in Force Fields. The Journal of Physical Chemistry B 2024, 128, 7888–7902, PMID: 39087913.
Xiao, J.; Chen, Y.; Zhang, L.; Wang, H.; Zhu, T. A machine learning-based high-precision density functional method for drug-like molecules. Artificial Intelligence Chemistry 2024, 2, 100037.
Pople, J. A.; Hehre, W. J. Computation of electron repulsion integrals involving contracted Gaussian basis functions. Journal of Computational Physics 1978, 27, 161–168. 13
Gill, P. M. In Molecular integrals Over Gaussian Basis Functions; Sabin, J. R., Zerner, M. C., Eds.; Advances in Quantum Chemistry; Academic Press, 1994; Vol. 25; pp 141–205.