Zhaoxin’s 12- and 16-Core CPUs Tested: Centaur Lives On

Zhaoxin, a China-based CPU developer with an x86 license, has yet to formally introduce its next-generation KaiSheng KH-40000 processors with up to 16 cores for datacenters. However, it has already started to submit benchmark results to the Geekbench 5 database. The new CPUs show noticeable microarchitecture-related performance improvements over their predecessors but can barely catch up with modern CPUs from AMD and Intel.

Mysterious CPUs

Zhaoxin, co-owned by Via Technologies and the Shanghai Municipal Government, has been gradually leveraging microarchitectures designed by Via (or rather by Centaur) since the mid-2010s, and its upcoming KaiSheng KH-40000 series processors for datacenters are based on the CentaurHauls microarchitecture that some claim resembles Intel’s Haswell microarchitecture from 2013.  

The KaiSheng KH-40000/16 and KaiSheng KH-40000/12 CPUs run at 2.20 GHz, have 16 and 12 cores, and are equipped with 32MB and 24MB of L3 cache, respectively. In addition, the 16-core model seems to feature simultaneous multithreading technology (SMT), so it can process up to 32 threads concurrently, assuming that Geekbench 5 correctly reads its capabilities. Based on specifications of Zhaoxin’s KaiSheng KH-40000/16 and KaiSheng KH-40000/12 published in the Geekbench 5 database, these CPUs look very similar to Centaur’s never-released CHA processor unearthed earlier this year.  

There are differences though: CHA had eight cores, did not support SMT, and was architected for TSMC’s N16 node, whereas KaiSheng KH-40000 has up to 16 cores, seems to feature SMT, and is believed to be designed for TSMC’s N7 fabrication process. Furthermore, processor IDs of both KH-40000 CPUs read ‘CentaurHauls Family 7 Model 11 Stepping 3’ (12), whereas the processor ID of Centaur’s CHA is ‘CentaurHauls Family 6 Model 71 Stepping 2,’ so the CPUs in question use different silicon.  

What is odd, though, is that both CHA and KH-4000 operate at 2.20 GHz, so if we did not know CPU IDs, we could speculate that the model KH-4000/16 uses two eight-core CHA dies produced on TSMC’s N16 node and glued together using an interconnect.

Mediocre Performance

For Zhaoxin, CentaurHauls should be a significant microarchitectural advancement from its LuJiazui microarchitecture from 2019. Furthermore, the improved core count should make KaiSheng KH-40000 CPUs more competitive on the server market. So, let’s look at the performance numbers submitted by the CPU developer.

Zhaoxin KH-40000/16 Zhaoxin KH-40000/12 Centaur CHA Zhaoxin KX-U6780A AMD FX-8350 Core i9-12900K Ryzen 9 5950X
General specifications 16C/32T, 2.20GHz, 32MB L3 12C/12T, 2.20GHz, 24MB L3 8C/8T, 2.20GHz, 16MB L3 8C/8T, 2.70GHz, 8MB L3 4C/8T 8P, 8E, 3.20 ~ 5.10GHz, 30MB 16C, 3.40 ~ 5.0 GHz, 64MB General specifications
Microarchitecture CentaurHauls CentaurHauls CentaurHauls LuJiaZui Bulldozer/Piledriver Golden Cove + Gracemont Zen 3 Microarchitecture
OS UnionTech OS DT 20 Pro Windows 10 Pro Windows 10 Pro Windows 10 Pro ? Windows 11 Pro Windows 10 Pro OS
Single-Core | Integer 450 439 476 366 670 1830 1435 Single-Core | Integer
Single-Core | Float 559 538 541 318 607 2189 1881 Single-Core | Float
Single-Core | Crypto 1039 934 782 583 1040 6064 4089 Single-Core | Crypto
Single-Core | Score 512 493 511 362 670 2149 1702 Single-Core | Score
Multi-Core | Integer 9293 3452 3307 2364 3570 20631 16695 Multi-Core | Integer
Multi-Core | Float 11875 4176 3723 2089 3563 23205 18695 Multi-Core | Float
Multi-Core | Crypto 5233 2119 4825 3390 2431 17413 8145 Multi-Core | Crypto
Multi-Core | Score 9915 3603 3508 2333 3511 21242 16868 Multi-Core | Score
Link https://browser.geekbench.com/v5/cpu/15706425 https://browser.geekbench.com/v5/cpu/16875254 https://browser.geekbench.com/v5/cpu/12878360 https://browser.geekbench.com/v5/cpu/12878360 https://browser.geekbench.com/v5/cpu/15900997 https://browser.geekbench.com/v5/cpu/15911328 https://browser.geekbench.com/v5/cpu/9506672 Link

When it comes to single-threaded performance, Zhaoxin’s (or Centaur’s) CentaurHaul microarchitecture significantly outpaces the company’s previous generation LuJiazui microarchitecture both in integer (by 22%) and floating point (by 75%) workloads even though the new CPU operates at 2.20 GHz. In contrast, the older one works at 2.70 GHz. The FPU performance uplift seems rather dramatic, but one should remember that we are dealing with a synthetic benchmark.

While the new microarchitecture is significantly better than the preceding one, KaiSheng KH-40000 CPUs with 12 and 16 cores cannot compete against any modern CPUs. Moreover, their single-threaded performance is even lower than that of ill-fated AMD’s Bulldozer/Piledriver architecture from mid-2012.

As for multi-thread performance, we see a rather odd advantage that Zhaoxin’s 16-core KaiSheng KH-40000/16 with SMT has over 12-core KaiSheng KH-40000/12 CPU. While, in theory, the 16C/32T chip can process 2.66 times more threads than its 12C/12T brethren (and we have never seen this kind of SMT efficiency from any well-known CPU microarchitecture so far), its actual performance advantage is higher than even hypothetical 2.66X (2.69X in integer, 2.84X in float). As we are dealing with a situation when one CPU only has four more cores than its rival, yet its performance is almost three times higher, we believe that there are factors beyond the number of cores that have such an effect on performance. 

Keeping in mind that Windows 10/11 does not always work optimally with schedulers of unfamiliar multi-core CPUs, we believe that the 12-core KaiSheng KH-40000/12 CPU results obtained on Windows 10 Pro do not reflect its true potential. 

Yet, even under Windows 10 Pro and without SMT, CentaurHoals is substantially faster than LuJiazui in multi-threaded integer (by 40%) and multi-threaded floating point (78%) workloads. The problem is that absolute performance numbers demonstrated by both KaiSheng KH-40000 and Centaur CHA CPUs are deficient by today’s standards. 

Interestingly, multi-threaded performance numbers demonstrated by Zhaoxin’s 12-core KaiSheng KH-40000/12 under Windows and without SMT are comparable to AMD’s FX-8350 processor (four modules, eight threads), which the company once marketed as an eight-core CPU. We can hardly call the performance of a decade-old processor competitive by today’s standards, at least in Geekbench 5, which is not the best benchmark.

Some Thoughts

While 12-core and 16-core configurations seem okay for desktops and entry-level servers, 12 and 16 cores from Zhaoxin do not deliver performance comparable to that of 12-core or 16-core processors from AMD and Intel. Under Windows and judging only by Geekbench 5 scores, Zhaoxin seems to be a decade behind AMD and Intel regarding performance. Even if Zhaoxin enables SMT on its upcoming CentaurHoals-based CPUs (for client and server applications) and Windows ‘learns’ how to properly use those cores, KaiSheng KH-40000/16 will still be two times slower than 2021 processors from AMD and Intel with the same core count.