Intel Core i9 (SKL-X) Review & Benchmarks – 4-channel @ 3200Mt/s Cache & Memory Performance

What is “SKL-X”?

“Skylake-X” (E/EP) is the server/workstation/HEDT version of desktop/mobile Skylake CPU – the 6-th gen Core/Xeon replacing the current Haswell/Broadwell-E designs. It naturally does not contain an integrated GPU but what does contain is more cores, more PCIe lanes and more memory channels (up to 6 64-bit) for huge memory bandwidth.

While it may seem an “old core”, the 7-th gen Kabylake core is not much more than a stepping update with even the future 8-th gen Coffeelake rumored again to use the very same core. But what it does do is include the much expected 512-bit AVX512 instruction set (ISA) that are are not enabled in the current desktop/mobile parts.

SKL-X does not only support DDR4 but also NVM-DIMMs (non-volatile memory DIMMs) and PMem (Persistent Memory) that should revolutionise future computing with no need for memory refresh or immediate sleep/resume (no need to save/restore memory from storage).

In this article we test CPU Cache and Memory performance; please see our other articles on:

Hardware Specifications

We are comparing the top-end desktop Core i9 with current competing architectures from both AMD and Intel as well as its previous version.

CPU Specifications Intel i9 7900X (Skylake-X) AMD Ryzen 1700X Intel i7 6700K (Skylake) Intel i7 5820K (Haswell-E) Comments
TLB 4kB pages
64 4-way / 64 8-way
1536 8-way
64 full-way
1536 8-way
64 4-way / 64 8-way
1536 6-way
64 4-way
1024 8-way
Ryzen has comparatively ‘better’ TLBs than all Intel CPUs.
TLB 2MB pages
8 full-way
1536 2-way
64 full-way
1536 2-way
8 full-way
1536 6-way
8 full-way
1024 8-way
Again Ryzen has ‘better’ TLBs than all Intel versions
Memory Controller Speed (MHz) 800-3300 600-1200 800-4000 1200-4000 Intel’s UNC clock runs higher than Ryzen
Memory Speed (Mhz) Max
3200 / 2667 2400 / 2667 2533 /2667 2133 / 2133 SKL-X can officially go as high as Ryzen and normal SKL @ 2667 but runs happily at 3200Mt/s.
Memory Channels / Width
4 / 256-bit (max 8 / 384-bit) 2 / 128-bit 2 / 128-bit 4 / 256-bit SKL-X has 2 memory controllers each with up to 3 channels each for massive memory bandwidth.
Memory Timing (clocks)
16-18-18-36 6-54-19-4 2T 14-16-16-32 7-54-18-9 2T 16-18-18-36 5-54-21-10 2T 14-15-15-36 4-51-16-3 2T SKL-X can run as tight timings as normal SKL or Ryzen.

Core Topology and Testing

Intel has dropped the (dual) ring bus(es) and instead opted for a mesh inter-connect between cores; on desktop parts this should not cause latency differences between cores (as with Ryzen) but on high-end server parts with many cores (up to 28) this may not be the case. The much increased L2 cache (1MB vs. old 256kB) should alleviate this issue – though the L3 cache seems to have been reduced quite a bit.

Native Performance

We are testing bandwidth and latency performance using all the available SIMD instruction sets (AVX, AVX2/FMA, AVX512) supported by the CPUs.

Results Interpretation: Higher values (GOPS, MB/s, etc.) mean better performance.

Environment: Windows 10 x64, latest AMD and Intel drivers. Turbo / Dynamic Overclocking was enabled on both configurations.

Native Benchmarks Intel i9 7900X (Skylake-X) AMD Ryzen 1700X Intel i7 6700K (Skylake) Intel i7 5820K (Haswell-E) Comments
CPU Multi-Core Benchmark Total Inter-Core Bandwidth – Best (GB/s) 87 [+85%] 47.7 39 46 With 10 cores SKL-X has massive aggregated inter-core bandwidth, almost 2x Ryzen or HSW-E.
CPU Multi-Core Benchmark Total Inter-Core Bandwidth – Worst (GB/s) 19 [+46%] 13 16 17 In worst-case pairs  SKL-X does well but not far away from normal SKL or HSW-E.
CPU Multi-Core Benchmark Inter-Unit Latency – Same Core (ns) 15.2 15.7 16 13.4 [-12%]
Within the same core all modern CPUs seem to have about 15-16ns latency.
CPU Multi-Core Benchmark Inter-Unit Latency – Same Compute Unit (ns) 80 45 [-43%] 49 58 Surprisingly we see massive latency increase almost 2x Ryzen or SKL.
CPU Multi-Core Benchmark Inter-Unit Latency – Different Compute Unit (ns) 131 Naturally Ryzen scores worst when going off-CCX.
It seems the mesh inter-connect between cores has decent bandwidth but much higher latency than the older HSW-E or even the current SKL.
Aggregated L1D Bandwidth (GB/s) 2200 [+3x] 727 878 1150 SKL-X has 512-bit data ports thus massive L1D bandwidth over 2x HSW-E and 3x over Ryzen.
Aggregated L2 Bandwidth (GB/s) 1010 [+81%] 557 402 500 The large L2 caches also have 2x more bandwidth than either HSW-E or Ryzen.
Aggregated L3 Bandwidth (GB/s) 289 392 [+35%] 247 205 The 2 Ryzen L3 caches have higher bandwidth than all Intel CPUs.
Aggregated Memory (GB/s) 69.3 [+2.4x] 28.5 31 42.5 With its 4 channels SKL-X reigns supreme with almost 2.5x more bandwidth than Ryzen.
The widened ports on the L1 and L2 caches allow SKL-X to demolish the competition with over 2x more bandwidth than either Ryzen or older HSW-E; only the smaller L3 cache falters. Its 4 channels running at 3200Mt/s yield huge memory bandwidth that greatly help streaming algorithms. SKL-X is a monster – we can only speculate what the server 6-channel version would score.
Data In-Page Random Latency (ns) 26 [1/2.84x] (4-13-33) 74 (4-17-36) 20 (4-12-21) 25 (4-12-26) SKL-X has comparable lantecy with SKL and HSW-E and much better than Ryzen.
Data Full Random Latency (ns) 75 [-21%] (4-13-70) 95 (4-17-37) 65 (4-12-34) 72 (4-13-52) Full random latencies are a bit higher than expected but on part with HSW-E and better than Ryzen.
Data Sequential Latency (ns) 5.4 [+28%] (4-11-13) 4.2 (4-7-7) 4.1 (4-12-13) 7 (4-12-13) Strangely SKL-X does not do as well as SKL here or Ryzen but at least it beats HSW-E.
If you were hoping SKL-E to match normal SKL that is sadly not the case even at similar Turbo clock they are higher across the board, even allowing Ryzen a win. Perhaps further platform optimisations are needed.
Code In-Page Random Latency (ns) 12 [-27%] (4-14-28) 16.6 (4-9-25) 10 (4-11-21) 15.8 (3-20-29) With code SKL-X performs better though not enough to catch normal SKL.
Code Full Random Latency (ns) 86 [-15%] (4-16-106) 102 (4-13-49) 70 (4-11-47) 85 (3-20-58) Out-of-page code latency takes a bigger hit but nothing to worry about.
Code Sequential Latency (ns) 6.5 [-27%] (4-7-12) 8.9 (4-9-18) 5.3 (4-9-20) 10.2 (3-8-16) Again nothing much changes here.
SKL-X again does not manage to match normal SKL but soundly trounces both Ryzen and its older HSW-E brother, delivering a good result overall. Code access seems to perform more consistently than data for some reason we need to investigate.
Memory Update Transactional (MTPS) 52.2 [+12x] HLE 4.23 32.4 HLE 7 SKL-X with working HLE is over 12-times faster than Ryzen and older HSW-E.
Memory Update Record Only (MTPS) 57.2 [+13.6x] HLE 4.19 25.4 HLE 5.47 SKL-X is king of the hill with nothing getting close.
Yes – Intel has finally fixed HLE/RTL which owners of HSW-E and BRW-E must feel very hard done by considering it was “working” before having it disabled due to the errata. Thus after so many years we have both HLE, RTL and AVX512! Great!

If there was any doubt, SKL-X does not disappoint – massive cache (L1D and L2) aggregate and memory bandwidths with server versions likely even more; the smaller L3 cache does falter though which is a bit of a surprise – the larger L2 caches must have forced some compromises to be made.

Latency is a bit disappointing compared to the “normal” SKL/KBL we have on desktop, but are still better than older HSW-E and also Ryzen competitor. Again the L1 and L2 caches (despite being 4-times bigger) clock latencies are OK with the L3 and memory controller being the source of the increased latencies.

SiSoftware Official Ranker Scores

Final Thoughts / Conclusions

After a strong CPU performance we did not expect the cache and memory performance to disappoint – and it does not. SKL-X is a big improvement over the older versions (HSW-E) and competition with few weaknesses.

The mesh interconnect does seem to exhibit higher inter-core latencies with small increase in bandwidth; perhaps this can be fixed.

The very much reduced L3 cache does disappoint both bandwidth and latency wise; the memory controllers provide huge bandwidth but at the expense of higher latencies.

All in all, if you can afford it, there is no question that SKL-X is worth it. But better wait to see what AMD’s Threadripper has in store before making your choice… 😉

Tagged , , , , , . Bookmark the permalink.

Comments are closed.