title: 集群中各服务器的配置与性能 nav_order: 6 —
集群中各服务器的配置与性能
August 16, 2023, Xiang Li, Yuejia Zhang
CPU / Memory
评测程序 Source: PassMark Performance Testing Linux
版本: 10.2.1003
参数: (-i 3) Test Iterations: 3, (-d 2) Test Duration: Medium.
测试结果仅供参考, 有一定的波动. (?) 为未经有效确认的结果.
| CPU Info | loginNode |
bigMem0 |
bigMem1 |
bigMem2 |
bigMem3 |
|---|---|---|---|---|---|
| CPU Brand | Intel Xeon CPU E5-2670 v3 @ 2.30GHz | Intel Xeon Gold 6226R CPU @ 2.90GHz | Intel Xeon Gold 6226R CPU @ 2.90GHz | Hygon C86 7285 32-core Processor | AMD EPYC 9754 128-Core Processor |
| Num of CPUs on Board | 2 | 2 | 2 | 2 | 2 |
| Total Threads | 24 | 64 | 64 | 128 | 512 |
| Base Clock Speed (GHz) | 2.30 | 2.90 | 2.90 | 2.00 | 2.25 |
| Boost Clock Speed (GHz) | 2.30 | 3.90 | 3.90 | 3.00(?) | 3.10 |
| CPU Cache (MiB/CPU) | 30 | 22 | 22 | 64 | 256 |
| Lithography (Nanometer) | 22 | 14 | 14 | 14(?) | 5 |
| CPU Speed | loginNode |
bigMem0 |
bigMem1 |
bigMem2 |
bigMem3 |
| Integer Math (Million Operations/s) | 62,275 | 206,066 | 209,673 | 328,027 | 2,054,471 |
| Floating Point Math (Million Operations/s) | 48,122 | 129,819 | 130,277 | 146,501 | 1,149,133 |
| Prime Numbers (Million Primes/s) | 188 | 209 | 230 | 289 | 1,728 |
| Sorting (Thousand Strings/s) | 41,063 | 90,967 | 106,039 | 172,684 | 783,396 |
| Encryption (MB/s) | 5,959 | 25,616 | 27,013 | 99,718 | 492,442 |
| Compression (MB/s) | 315 | 817 | 865 | 1,330 | 6,933 |
| CPU Single Threaded (Million Operations/s) | 1,414 | 2,421 | 2,407 | 1,478 | 2,444 |
| Physics (Frames/s) | 1,229 | 2,336 | 3,002 | 7,420 | 22,446 |
| Extended Instructions (SSE) (Million Matrices/s) | 21,430 | 46,552 | 47,889 | 45,259 | 428,652 |
| CPU Final Mark | 19,471 | 42,705 | 46,047 | 53,223 | 138,716 |
| Memory Info | loginNode |
bigMem0 |
bigMem1 |
bigMem2 |
bigMem3 |
| Total Available RAM (GiB) | 125.5 | 1,006.4 | 1,006.3 | 485.6 | 1007.0 |
| Memory Frequency (MHz) | 2,133 | 2,666 | 2,933 | 3200 | 4800 |
| Memory Speed | loginNode |
bigMem0 |
bigMem1 |
bigMem2 |
bigMem3 |
| Memory Latency (Nanoseconds) | 50 | 53 | 52 | 62 | 70 |
| Memory Read Cached (MB/s) | 17,005 | 27,710 | 27,698 | 18,298 | 23,561 |
| Memory Read Uncached (MB/s) | 8,187 | 12,093 | 11,800 | 8700 | 23,334 |
| Memory Write (MB/s) | 7,665 | 10,100 | 8,890 | 7338 | 23,305 |
| Memory Threaded (MB/s) | 71,989 | 100,377 | 169,758 | 259,208 | 726,276 |
| Database Operations (Thousand Operations/s) | 10,649 | 18,742 | 19,791 | 16,254 | 29,555 |
| Memory Final Mark | 2,277 | 2,827 | 2,799 | 2,321 | 2,876 |
Reference:
Hygon IPO Specs pp.124-126
EPYC 9754 Specs(AMD) EPYC 9754 Specs(TechPowerUp)
GPU / DCU 计算加速卡
bigMem3 上无加速卡.
评测程序 Source: Mixbench
| GPU/DCU | loginNode |
bigMem0 |
bigMem1 |
bigMem2 |
|---|---|---|---|---|
| Device | NVIDIA GeForce GTX 1080 Ti | Tesla T4 | NVIDIA A30 | HYGON Z100 |
| CUDA driver version | 11.60 | 11.60 | 11.60 | - |
| GPU clock rate (MHz) | 1721 | 1590 | 1440 | 1319 |
| Memory clock rate | 2752 MHz | 2500 MHz | 607 MHz | - MHz |
| Memory bus width | 352 bits | 256 bits | 3072 bits | - bits |
| WarpSize | 32 | 32 | 32 | 64 |
| L2 cache size | 2816 KB | 4096 KB | 24576 KB | 8192 KB |
| Total global mem | 11178 MB | 14910 MB | 24068 MB | 32752 MB |
| ECC enabled | No | Yes | Yes | Yes |
| Compute Capability | 6.1 | 7.5 | 8.0 | - |
| Total SPs | 3584 (28 MPs x 128 SPs/MP) | 2560 (40 MPs x 64 SPs/MP) | 7168 (56 MPs x 128 SPs/MP) | 3840 (60 CUs x 64 SPs/CU) |
| Compute throughput (theoretical single precision FMAs) (GFlops) | 12336.13 | 8140.80 | 20643.84 | 10129.92 |
| Memory bandwidth (GB/sec) | 484.44 | 320.06 | 933.12 | 1024.00 |

磁盘读写速度
评测程序: fio-3.7
命令 $ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename={} --bs=4ki --iodepth=64 --size=400Mi --readwrite={method} --rwmixread={rwrate},
其中 method=randrw | rw, rwrate=0 | 75 | 100, filename 指定为对应硬盘下的位置.
| Spec | r/w | loginNode:/ | loginNode:/home, /scratch |
|---|---|---|---|
| randrw, 75% read + 25% write | read | 1.5 MiB/s | 3.4 MiB/s |
| write | 0.5 MiB/s | 1.1 MiB/s | |
| randrw, 100% read | read | 4.1 MiB/s | 12.5 MiB/s |
| randrw, 100% write | write | 1.4 MiB/s | 1.6 MiB/s |
| rw, 75% read + 25% write | read | 21.8 MiB/s | 27.5 MiB/s |
| write | 7.6 MiB/s | 9.4 MiB/s | |
| rw, 100% read | read | 176.0 MiB/s | 407.0 MiB/s |
| rw, 100% write | write | 161.0 MiB/s | 14.0 MiB/s |