作者此前工作表明, 在耦合簇CCSD (Coupled-Cluster approaches within the singles and doubles approximation)与CCSD(T) (CCSD approaches augmented by a perturbative treatment of triple excitations)计算中结合单精度数与消费型图形处理单元(GPU), 可以显著提高计算速度. 然而由于CCSD(T)计算对内存的巨大需求以及消费型GPU的内存限制, 在利用消费型GPU进行加速时, 不考虑利用空间对称性的情况下, 此前开发的CCSD(T)程序仅能用于计算300~400个基函数的体系. 利用密度拟合(Density-Fitting, DF)处理双电子积分可以显著降低CCSD(T)计算过程中的内存需求, 本工作发展了基于密度拟合近似并结合单精度数进行运算的DF-CCSD(T)程序, 该程序可用于包含700个基函数的无对称性体系的单点能计算, 以及包含1700个基函数的有对称性体系. 本工作所使用的计算节点配置了型号为Intel I9-10900k的CPU和型号为RTX3090的GPU, 与用双精度数在CPU上的计算相比, 利用单精度数结合GPU进行运算可以将CCSD的计算速度提升16倍, (T)部分可提升40倍左右, 而使用单精度数引入的误差可忽略不计. 在程序开发过程中, 作者发展了一套可利用GPU或CPU结合单精度数或双精度数进行含空间对称性的矩阵操作代码库. 基于该套代码库, 可以显著降低开发含空间对称性的耦合簇代码的难度.
It has been reported by our group that using single-precision data and consumer graphics processing units (GPUs) can significantly improve computation speed of CCSD (Coupled-Cluster approaches within the singles and doubles approximation) and CCSD(T) (CCSD approaches augmented by a perturbative treatment of triple excitations). However, CCSD(T) can only be employed for small molecules with about 300~400 basis functions when using consumer GPUs for acceleration without spatial symmetry due to the memory limitation of GPU. Using density-fitting approximation can significantly reduce the memory requirements in CCSD(T) calculations. In this paper, DF-CCSD(T) codes based on the density fitting approximation together with single precision data was developed. All the matrix contractions were performed employing GEMM in CUBLAS on GPU or in Intel MKL on CPU. The other operations such as matrix expansion and transpose were performed using OpenACC on GPU or OpenMP on CPU. Those codes can be applied to single point energies for systems with around 700 basis functions without spatial symmetry and to molecules with about 1700 basis functions with symmetry on a GPU with 24 Gb memory. The server employed in this work has an Intel I9-10900k CPU and a RTX3090 GPU. CCSD calculations with single-precision data on GPU are about 16 times faster and it is about 40 times faster for the (T) part compared with the calculations on CPU using double precision data on this server. Error introduced by single precision data is negligible. A code library that can employ GPU or CPU using either single precision or double precision data to perform matrix operations with spatial symmetry was also reported in this work. Direct product decomposition (DPD) method was employed to deal with spatial symmetry. Complexity of developing coupled cluster codes with spatial symmetry can be significantly reduced with this library. The computational accuracy of single precision DF-CCSD(T) was compared with CCSD(T)-F12a and DLPNO-CCSD(T). The results shown that DF-CCSD(T) would be more stable than the other two approaches in describing chemical properties.
Key words:
coupled-cluster,
density fitting,
graphics processing units (GPU),
single precision
王治钒, 何冰, 路艳朝, 王繁. 基于图形处理单元与密度拟合近似的单精度耦合簇CCSD和CCSD(T)程序[J].
化学学报
, 2022, 80(10): 1401-1409.
Zhifan Wang, Bing He, Yanzhao Lu, Fan Wang. Single-precision CCSD and CCSD(T) Calculations with Density Fitting Approximations on Graphics Processing Units[J]. Acta Chimica Sinica, 2022, 80(10): 1401-1409.
导出引用
EndNote
|
Reference Manager
|
ProCite
|
BibTeX
|
RefWorks
|
|
CCSD
|
|
(T)
|
|
Molecular
|
SP
|
SP
|
DP
|
|
SP
|
SP
|
DP
|
|
|
GPU
|
CPU
|
CPU
|
|
GPU
|
CPU
|
CPU
|
|
(H
2
O)
7
|
–1.9878328
|
–1.9878329
|
–1.9878329
|
|
–0.0585549
|
–0.0585566
|
–0.0585566
|
|
C
6
H
14
|
–1.1398594
|
–1.1398593
|
–1.1398594
|
|
–0.0440448
|
–0.0440449
|
–0.0440449
|
|
C
10
H
16
|
–1.7801039
|
–1.7801038
|
–1.7801038
|
|
–0.0798821
|
–0.0798823
|
–0.0798823
|
|
AtF
5
|
–2.2048644
|
–2.2048637
|
–2.2048637
|
|
–0.0941050
|
–0.0940953
|
–0.0941050
|
|
I
3
-
|
–1.3032696
|
–1.3032697
|
–1.3032697
|
|
–0.0498108
|
–0.0498108
|
–0.0498108
|
|
|
CCSD
|
|
(T)
|
|
Molecular
|
SP
|
SP
|
DP
|
|
SP
|
SP
|
DP
|
|
|
GPU
|
CPU
|
CPU
|
|
GPU
|
CPU
|
CPU
|
|
(H
2
O)
7
|
–1.9878328
|
–1.9878329
|
–1.9878329
|
|
–0.0585549
|
–0.0585566
|
–0.0585566
|
|
C
6
H
14
|
–1.1398594
|
–1.1398593
|
–1.1398594
|
|
–0.0440448
|
–0.0440449
|
–0.0440449
|
|
C
10
H
16
|
–1.7801039
|
–1.7801038
|
–1.7801038
|
|
–0.0798821
|
–0.0798823
|
–0.0798823
|
|
AtF
5
|
–2.2048644
|
–2.2048637
|
–2.2048637
|
|
–0.0941050
|
–0.0940953
|
–0.0941050
|
|
I
3
-
|
–1.3032696
|
–1.3032697
|
–1.3032697
|
|
–0.0498108
|
–0.0498108
|
–0.0498108
|
利用单精度数(single precision, SP)在GPU和CPU上进行计算和利用双精度数(double precision, DP)在CPU上进行计算得到的DF-CCSD相关能与(T)计算得到的相关能(单位: a.u.)
Table 1.
Correlation energy of DF-CCSD and (T) calculated on GPU and CPU with single precision (SP) data and calculated on CPU with double precision (DP) data (unit: a.u.)
利用单精度数(single precision, SP)在GPU和CPU上进行计算和利用双精度数(double precision, DP)在CPU上进行计算得到的DF-CCSD相关能与(T)计算得到的相关能(单位: a.u.)
Table 1.
Correlation energy of DF-CCSD and (T) calculated on GPU and CPU with single precision (SP) data and calculated on CPU with double precision (DP) data (unit: a.u.)
|
|
|
Wall-time of a single RI-CCSD iteration
|
|
Wall-time of (T) correction calculation
|
|
|
|
SP
|
SP
|
SP
|
DP
|
|
SP
|
SP
|
SP
|
DP
|
|
n
|
Number of basis
|
GPU Time
|
CPU Time
|
$\frac{\text{CPUTime}}{\text{GPUTime}}$
|
CPU Time
|
|
GPU Time
|
CPU Time
|
$\frac{\text{CPUTime}}{\text{GPUTime}}$
|
CPU Time
|
|
3
|
174
|
0.9 (0.4)
|
5.46 (2.24)
|
6.07 (5.6)
|
12.4 (5.2)
|
|
2.8 (1.8)
|
22.4 (13.9)
|
8 (7.7)
|
40 (33)
|
|
4
|
232
|
3.1 (1.4)
|
17.7 (9.3)
|
5.71 (6.64)
|
37.2 (19.0)
|
|
14 (6.7)
|
336 (105)
|
24 (15.7)
|
672 (244)
|
|
5
|
290
|
7.8 (3.7)
|
50.9 (24.6)
|
6.53 (6.65)
|
106 (52.1)
|
|
62 (23)
|
1514 (492)
|
24.4 (21.4)
|
2698 (1035)
|
|
6
|
348
|
18 (8)
|
129 (58)
|
7.17 (7.25)
|
243 (122)
|
|
189 (69)
|
4866 (1555)
|
25.7 (22.5)
|
8476 (3174)
|
|
7
|
406
|
35 (14)
|
283 (111)
|
8.09 (7.93)
|
523 (226)
|
|
542 (179)
|
12588 (4359)
|
23.2 (24.4)
|
22312 (8399)
|
|
8
|
464
|
66 (28)
|
508 (220)
|
7.7 (7.86)
|
1012 (436)
|
|
1514 (445)
|
31404 (11527)
|
20.7 (25.9)
|
53604 (19188)
|
|
9
|
522
|
131 (44)
|
913 (389)
|
6.97 (8.84)
|
1815 (774)
|
|
3601 (898)
|
65888 (22203)
|
18.3 (24.7)
|
113226 (40485)
|
|
10
|
580
|
529 (81)
|
1768 (656)
|
3.34 (8.1)
|
(1626)
|
|
7608 (2313)
|
119307 (46606)
|
15.7 (20.2)
|
(84115)
|
|
|
|
Wall-time of a single RI-CCSD iteration
|
|
Wall-time of (T) correction calculation
|
|
|
|
SP
|
SP
|
SP
|
DP
|
|
SP
|
SP
|
SP
|
DP
|
|
n
|
Number of basis
|
GPU Time
|
CPU Time
|
$\frac{\text{CPUTime}}{\text{GPUTime}}$
|
CPU Time
|
|
GPU Time
|
CPU Time
|
$\frac{\text{CPUTime}}{\text{GPUTime}}$
|
CPU Time
|
|
3
|
174
|
0.9 (0.4)
|
5.46 (2.24)
|
6.07 (5.6)
|
12.4 (5.2)
|
|
2.8 (1.8)
|
22.4 (13.9)
|
8 (7.7)
|
40 (33)
|
|
4
|
232
|
3.1 (1.4)
|
17.7 (9.3)
|
5.71 (6.64)
|
37.2 (19.0)
|
|
14 (6.7)
|
336 (105)
|
24 (15.7)
|
672 (244)
|
|
5
|
290
|
7.8 (3.7)
|
50.9 (24.6)
|
6.53 (6.65)
|
106 (52.1)
|
|
62 (23)
|
1514 (492)
|
24.4 (21.4)
|
2698 (1035)
|
|
6
|
348
|
18 (8)
|
129 (58)
|
7.17 (7.25)
|
243 (122)
|
|
189 (69)
|
4866 (1555)
|
25.7 (22.5)
|
8476 (3174)
|
|
7
|
406
|
35 (14)
|
283 (111)
|
8.09 (7.93)
|
523 (226)
|
|
542 (179)
|
12588 (4359)
|
23.2 (24.4)
|
22312 (8399)
|
|
8
|
464
|
66 (28)
|
508 (220)
|
7.7 (7.86)
|
1012 (436)
|
|
1514 (445)
|
31404 (11527)
|
20.7 (25.9)
|
53604 (19188)
|
|
9
|
522
|
131 (44)
|
913 (389)
|
6.97 (8.84)
|
1815 (774)
|
|
3601 (898)
|
65888 (22203)
|
18.3 (24.7)
|
113226 (40485)
|
|
10
|
580
|
529 (81)
|
1768 (656)
|
3.34 (8.1)
|
(1626)
|
|
7608 (2313)
|
119307 (46606)
|
15.7 (20.2)
|
(84115)
|
DF-CCSD(T)分别利用单精度或双精度数在CPU或GPU上进行(H2O)n团簇单点能计算的计算耗时, 其中括号内为含Cs空间对称性的计算耗时, 括号外为不含空间对称性的计算耗时(单位: s)
Table 2.
Wall-time for DF-CCSD and DF-CCSD(T) calculations on CPU and GPU with single-precision (SP) and double-precision (DP). Data inside brackets is the computation time with Cs spatial symmetry and data outside brackets is the computation time without spatial symmetry (unit: s)
DF-CCSD(T)分别利用单精度或双精度数在CPU或GPU上进行(H2O)n团簇单点能计算的计算耗时, 其中括号内为含Cs空间对称性的计算耗时, 括号外为不含空间对称性的计算耗时(单位: s)
Table 2.
Wall-time for DF-CCSD and DF-CCSD(T) calculations on CPU and GPU with single-precision (SP) and double-precision (DP). Data inside brackets is the computation time with Cs spatial symmetry and data outside brackets is the computation time without spatial symmetry (unit: s)
|
|
(H
2
O)
9
|
|
(H
2
O)
10
|
|
|
GPU
|
CPU
|
|
GPU
|
CPU
|
|
1) Initio
|
15.6
|
15.3
|
|
233.7
|
235
|
|
2) N
5
|
60
|
53.6
|
|
84.5
|
87.1
|
|
3) Iabci
|
3.5
|
22
|
|
7
|
37.6
|
|
4) Iabcd
|
37.7
|
623
|
|
66
|
1050
|
|
5) Wmbej
|
10.79
|
169.9
|
|
320.2
|
319.2
|
|
6) left
|
2.5
|
18.9
|
|
3.5
|
30
|
不含空间对称性的(H2O)9与(H2O)10水团簇单次CCSD迭代中, 图1程序框架图所定义各步所用时间(单位: s)
Table 3.
Wall-time of each step defined in Figure 1 in a single CCSD iteration for (H2O)9 and (H2O)10 without spatial symmetry (unit: s)
|
Reaction
|
cc-pVTZ
CCSD(T)
|
cc-pVTZ
DF-
CCSD(T)
|
cc-pVTZ
DLPNO-
CCSD(T)
|
cc-pVQZ
DF-
CCSD(T)
|
cc-pVQZ
DLPNO-
CCSD(T)
|
cc-pVDZ
CCSD(T)-
F12a
|
cc-pV5Z
DF-
CCSD(T)
|
cc-pV5Z
DLPNO-
CCSD(T)
|
cc-pVTZ
CCSD(T)-
F12a
|
Ref.
a
|
|
(1)
E
22
→
E
1
|
56.028
|
55.823
|
56.551
|
53.145
|
57.618
|
85.713
|
|
59.865
|
66.35
|
59.999
|
|
(2)
E
31
→
E
1
|
94.847
|
94.508
|
94.583
|
86.768
|
95.42
|
133.955
|
|
97.345
|
107.106
|
104.684
|
|
(3) (CH
3
)
3
CC(CH
3
)
3
→C
8
H
18
|
8.694
|
8.715
|
6.904
|
7.736
|
6.464
|
9.301
|
|
6.209
|
8.556
|
7.95
|
|
(4) C
6
H
14
+4CH
4
→5C
2
H
6
|
35.204
|
35.179
|
33.359
|
36.53
|
34.623
|
35.539
|
38.777
|
35.585
|
35.514
|
41.045
|
|
(5) C
8
H
18
+6CH
4
→7C
2
H
6
|
53.099
|
53.057
|
49.773
|
55.078
|
51.953
|
53.321
|
58.501
|
53.597
|
53.48
|
62.091
|
|
(6) adamantane→3C
2
H
4
+2C
2
H
2
|
819.855
|
818.754
|
817.298
|
821.675
|
820.608
|
890.255
|
822.089
|
825.968
|
840.009
|
811.654
|
|
(7) bicyclo_octane→3C
2
H
4
+C
2
H
2
|
530.996
|
530.251
|
527.661
|
532.824
|
530.464
|
581.706
|
531.841
|
535.088
|
547.024
|
532.288
|
|
Reaction
|
cc-pVTZ
CCSD(T)
|
cc-pVTZ
DF-
CCSD(T)
|
cc-pVTZ
DLPNO-
CCSD(T)
|
cc-pVQZ
DF-
CCSD(T)
|
cc-pVQZ
DLPNO-
CCSD(T)
|
cc-pVDZ
CCSD(T)-
F12a
|
cc-pV5Z
DF-
CCSD(T)
|
cc-pV5Z
DLPNO-
CCSD(T)
|
cc-pVTZ
CCSD(T)-
F12a
|
Ref.
a
|
|
(1)
E
22
→
E
1
|
56.028
|
55.823
|
56.551
|
53.145
|
57.618
|
85.713
|
|
59.865
|
66.35
|
59.999
|
|
(2)
E
31
→
E
1
|
94.847
|
94.508
|
94.583
|
86.768
|
95.42
|
133.955
|
|
97.345
|
107.106
|
104.684
|
|
(3) (CH
3
)
3
CC(CH
3
)
3
→C
8
H
18
|
8.694
|
8.715
|
6.904
|
7.736
|
6.464
|
9.301
|
|
6.209
|
8.556
|
7.95
|
|
(4) C
6
H
14
+4CH
4
→5C
2
H
6
|
35.204
|
35.179
|
33.359
|
36.53
|
34.623
|
35.539
|
38.777
|
35.585
|
35.514
|
41.045
|
|
(5) C
8
H
18
+6CH
4
→7C
2
H
6
|
53.099
|
53.057
|
49.773
|
55.078
|
51.953
|
53.321
|
58.501
|
53.597
|
53.48
|
62.091
|
|
(6) adamantane→3C
2
H
4
+2C
2
H
2
|
819.855
|
818.754
|
817.298
|
821.675
|
820.608
|
890.255
|
822.089
|
825.968
|
840.009
|
811.654
|
|
(7) bicyclo_octane→3C
2
H
4
+C
2
H
2
|
530.996
|
530.251
|
527.661
|
532.824
|
530.464
|
581.706
|
531.841
|
535.088
|
547.024
|
532.288
|
基于Minnesota数据库的分子结构(HC7), 利用不同的基组与CC方法计算得到的碳氢化合物的反应能(单位: kJ/mol)
Table 4.
Hydrocarbon reaction energies calculated with different basis sets and CC method based on the molecular structure of Minnesota database (HC7) (unit: kJ/mol)
Raghavachari, K.; Trucks, G. W.; Pople, J. A.; Head-Gordon, M.
Chem. Phys. Lett.
1989,
157
, 479.
doi:
10.1016/S0009-2614(89)87395-6
Řezáč, J.; Hobza, P.
J. Chem. Theor. Comput.
2013,
9
, 2151.
doi:
10.1021/ct400057w
(a) Vahtras, O.; Almlöf, J.; Feyereisen, M. W.
Chem. Phys. Lett.
1993,
213
, 514.
doi:
10.1016/0009-2614(93)89151-7
(c) Boström, J.; Pitoňák, M.; Aquilante, F.; Neogrády, P.; Pedersen, T. B.; Lindh, R.
J. Chem. Theor. Comput.
2012,
8
, 1921.
doi:
10.1021/ct3003018
(e) Epifanovsky, E.; Zuev, D.; Feng, X.; Khistyaev, K.; Shao, Y.; Krylov, A. I.
J. Chem. Phys.
2013,
139
, 134105.
doi:
10.1063/1.4820484
(f) DePrince, A. E.; Kennedy, M. R.; Sumpter, B. G.; Sherrill, C. D.
Mol. Phys.
2014,
112
, 844.
doi:
10.1080/00268976.2013.874599
(i) Shen, T.; Zhu, Z.; Zhang, I. Y.; Scheffler, M.
J. Chem. Theor. Comput.
2019,
15
, 4721.
doi:
10.1021/acs.jctc.8b01294
(a) Harvey, M. J.; De Fabritiis, G.
WIREs: Comput. Mol. Sci.
2012,
2
, 734.
doi:
10.1002/wcms.1101
(c) Bao, J. Z.; Feng, X. T.; Yu, J. G.
Acta Phys. Chim. Sin.
2011,
27
, 2019. (in Chinese)
doi:
10.3866/PKU.WHXB20110830
(b) Nitsche, M. A.; Ferreria, M.; Mocskos, E. E.; Gonzalez Lebrero, M. C.
J. Chem. Theor. Comput.
2014,
10
, 959.
doi:
10.1021/ct400308n
(d) Genovese, L.; Ospici, M.; Deutsch, T.; Mehaut, J. F.; Neelov, A.; Goedecker, S.
J. Chem. Phys.
2009,
131
, 034103.
doi:
10.1063/1.3166140
(a) Tornai, G. J.; Ladjanszki, I.; Rak, A.; Kis, G.; Cserey, G.
J. Chem. Theor. Comput.
2019,
15
, 5319.
doi:
10.1021/acs.jctc.9b00560
(d) Wang, Y.; Tian, Y. Q.; Jin, Z.; Suo, B. B.
Acta Chim. Sinica
2021,
79
, 653. (in Chinese)
doi:
10.6023/A21020044
Fales, B. S.; Curtis, E. R.; Johnson, K. G.; Lahana, D.; Seritan, S.; Wang, Y.; Weir, H.; Martínez, T. J.; Hohenstein, E. G.
J. Chem. Theor. Comput.
2020,
16
, 2021.
doi:
10.1021/acs.jctc.9b01257
Knizia, G.; Li, W.; Simon, S.; Werner, H. J.
J. Chem. Theor. Comput.
2011,
7
, 2387.
doi:
10.1021/ct200239p
Pokhilko, P.; Epifanovsky, E.; Krylov, A. I.
J. Chem. Theor. Comput.
2018,
14
, 4088.
doi:
10.1021/acs.jctc.8b00321
Wang, Z.; Guo, M.; Wang, F.
Phys. Chem. Chem. Phys.
2020,
22
, 25103.
doi:
10.1039/D0CP03800H
Knizia, G.; Adler, T. B.; Werner, H.-J.
J. Chem. Phys.
2009,
130
, 054104.
doi:
10.1063/1.3054300
(a) Guo, Y.; Riplinger, C.; Becker, U.; Liakos, D. G.; Minenkov, Y.; Cavallo, L.; Neese, F.
J. Chem. Phys.
2018,
148
, 011101.
doi:
10.1063/1.5011798
(b) Sun, Q.; Zhang, X.; Banerjee, S.; Bao, P.; Barbry, M.; Blunt, N. S.; Bogdanov, N. A.; Booth, G. H.; Chen, J.; Cui, Z.-H.; Eriksen, J. J.; Gao, Y.; Guo, S.; Hermann, J.; Hermes, M. R.; Koh, K.; Koval, P.; Lehtola, S.; Li, Z.; Liu, J.; Mardirossian, N.; McClain, J. D.; Motta, M.; Mussard, B.; Pham, H. Q.; Pulkin, A.; Purwanto, W.; Robinson, P. J.; Ronca, E.; Sayfutyarova, E. R.; Scheurer, M.; Schurkus, H. F.; Smith, J. E. T.; Sun, C.; Sun, S.-N.; Upadhyay, S.; Wagner, L. K.; Wang, X.; White, A.; Whitfield, J. D.; Williamson, M. J.; Wouters, S.; Yang, J.; Yu, J. M.; Zhu, T.; Berkelbach, T. C.; Sharma, S.; Sokolov, A. Y.; Chan, G. K.-L.
J. Chem. Phys.
2020,
153
, 024109.
doi:
10.1063/5.0006074
Dunning, T. H.
J. Chem. Phys.
1989,
90
, 1007.
doi:
10.1063/1.456153
Weigend, F.
Phys. Chem. Chem. Phys.
2002,
4
, 4285.
doi:
10.1039/b204199p
(a) Peterson, K. A.; Figgen, D.; Goll, E.; Stoll, H.; Dolg, M.
J. Chem. Phys.
2003,
119
, 11113.
pmid:
17181347
(b) Peterson, K. A.; Shepler, B. C.; Figgen, D.; Stoll, H.
J. Phys. Chem. A
2006,
110
, 13877.
pmid:
17181347
Figgen, D.; Rauhut, G.; Dolg, M.; Stoll, H.
Chem. Phys.
2005,
311
, 227.
doi:
10.1016/j.chemphys.2004.10.005
Stoychev, G. L.; Auer, A. A.; Neese, F.
J. Chem. Theor. Comput.
2017,
13
, 554.
doi:
10.1021/acs.jctc.6b01041
Peverati, R.; Truhlar, D. G.
Philos. T. R. Soc. A
2014,
372
, 20120476.
Werner, H.-J.; Knowles, P. J.; Manby, F. R.; Black, J. A.; Doll, K.; Heßelmann, A.; Kats, D.; Köhn, A.; Korona, T.; Kreplin, D. A.; Ma, Q.; Miller, T. F.; Mitrushchenkov, A.; Peterson, K. A.; Polyak, I.; Rauhut, G.; Sibaev, M.
J. Chem. Phys.
2020,
152
, 144107.
doi:
10.1063/5.0005081
Neese, F.
WIREs Comput. Mol. Sci.
2012,
2
, 73.
doi:
10.1002/wcms.81