Zhaoxin, a China-based CPU developer with an x86 license, has yet to formally introduce its next-generation KaiSheng KH-40000 processors with up to 16 cores for datacenters. However, it has already started to submit benchmark results to the Geekbench 5 database. The new CPUs show noticeable microarchitecture-related performance improvements over their predecessors but can barely catch up with modern CPUs from AMD and Intel.
Zhaoxin, co-owned by Via Technologies and the Shanghai Municipal Government, has been gradually leveraging microarchitectures designed by Via (or rather by Centaur) since the mid-2010s, and its upcoming KaiSheng KH-40000 series processors for datacenters are based on the CentaurHauls microarchitecture that some claim resembles Intel’s Haswell microarchitecture from 2013.
The KaiSheng KH-40000/16 and KaiSheng KH-40000/12 CPUs run at 2.20 GHz, have 16 and 12 cores, and are equipped with 32MB and 24MB of L3 cache, respectively. In addition, the 16-core model seems to feature simultaneous multithreading technology (SMT), so it can process up to 32 threads concurrently, assuming that Geekbench 5 correctly reads its capabilities. Based on specifications of Zhaoxin’s KaiSheng KH-40000/16 and KaiSheng KH-40000/12 published in the Geekbench 5 database, these CPUs look very similar to Centaur’s never-released CHA processor unearthed earlier this year.
There are differences though: CHA had eight cores, did not support SMT, and was architected for TSMC’s N16 node, whereas KaiSheng KH-40000 has up to 16 cores, seems to feature SMT, and is believed to be designed for TSMC’s N7 fabrication process. Furthermore, processor IDs of both KH-40000 CPUs read ‘CentaurHauls Family 7 Model 11 Stepping 3’ (1, 2), whereas the processor ID of Centaur’s CHA is ‘CentaurHauls Family 6 Model 71 Stepping 2,’ so the CPUs in question use different silicon.
What is odd, though, is that both CHA and KH-4000 operate at 2.20 GHz, so if we did not know CPU IDs, we could speculate that the model KH-4000/16 uses two eight-core CHA dies produced on TSMC’s N16 node and glued together using an interconnect.
For Zhaoxin, CentaurHauls should be a significant microarchitectural advancement from its LuJiazui microarchitecture from 2019. Furthermore, the improved core count should make KaiSheng KH-40000 CPUs more competitive on the server market. So, let’s look at the performance numbers submitted by the CPU developer.
|Zhaoxin KH-40000/16||Zhaoxin KH-40000/12||Centaur CHA||Zhaoxin KX-U6780A||AMD FX-8350||Core i9-12900K||Ryzen 9 5950X|
|General specifications||16C/32T, 2.20GHz, 32MB L3||12C/12T, 2.20GHz, 24MB L3||8C/8T, 2.20GHz, 16MB L3||8C/8T, 2.70GHz, 8MB L3||4C/8T||8P, 8E, 3.20 ~ 5.10GHz, 30MB||16C, 3.40 ~ 5.0 GHz, 64MB||General specifications|
|Microarchitecture||CentaurHauls||CentaurHauls||CentaurHauls||LuJiaZui||Bulldozer/Piledriver||Golden Cove + Gracemont||Zen 3||Microarchitecture|
|OS||UnionTech OS DT 20 Pro||Windows 10 Pro||Windows 10 Pro||Windows 10 Pro||?||Windows 11 Pro||Windows 10 Pro||OS|
|Single-Core | Integer||450||439||476||366||670||1830||1435||Single-Core | Integer|
|Single-Core | Float||559||538||541||318||607||2189||1881||Single-Core | Float|
|Single-Core | Crypto||1039||934||782||583||1040||6064||4089||Single-Core | Crypto|
|Single-Core | Score||512||493||511||362||670||2149||1702||Single-Core | Score|
|Multi-Core | Integer||9293||3452||3307||2364||3570||20631||16695||Multi-Core | Integer|
|Multi-Core | Float||11875||4176||3723||2089||3563||23205||18695||Multi-Core | Float|
|Multi-Core | Crypto||5233||2119||4825||3390||2431||17413||8145||Multi-Core | Crypto|
|Multi-Core | Score||9915||3603||3508||2333||3511||21242||16868||Multi-Core | Score|
When it comes to single-threaded performance, Zhaoxin’s (or Centaur’s) CentaurHaul microarchitecture significantly outpaces the company’s previous generation LuJiazui microarchitecture both in integer (by 22%) and floating point (by 75%) workloads even though the new CPU operates at 2.20 GHz. In contrast, the older one works at 2.70 GHz. The FPU performance uplift seems rather dramatic, but one should remember that we are dealing with a synthetic benchmark.
While the new microarchitecture is significantly better than the preceding one, KaiSheng KH-40000 CPUs with 12 and 16 cores cannot compete against any modern CPUs. Moreover, their single-threaded performance is even lower than that of ill-fated AMD’s Bulldozer/Piledriver architecture from mid-2012.
As for multi-thread performance, we see a rather odd advantage that Zhaoxin’s 16-core KaiSheng KH-40000/16 with SMT has over 12-core KaiSheng KH-40000/12 CPU. While, in theory, the 16C/32T chip can process 2.66 times more threads than its 12C/12T brethren (and we have never seen this kind of SMT efficiency from any well-known CPU microarchitecture so far), its actual performance advantage is higher than even hypothetical 2.66X (2.69X in integer, 2.84X in float). As we are dealing with a situation when one CPU only has four more cores than its rival, yet its performance is almost three times higher, we believe that there are factors beyond the number of cores that have such an effect on performance.
Keeping in mind that Windows 10/11 does not always work optimally with schedulers of unfamiliar multi-core CPUs, we believe that the 12-core KaiSheng KH-40000/12 CPU results obtained on Windows 10 Pro do not reflect its true potential.
Yet, even under Windows 10 Pro and without SMT, CentaurHoals is substantially faster than LuJiazui in multi-threaded integer (by 40%) and multi-threaded floating point (78%) workloads. The problem is that absolute performance numbers demonstrated by both KaiSheng KH-40000 and Centaur CHA CPUs are deficient by today’s standards.
Interestingly, multi-threaded performance numbers demonstrated by Zhaoxin’s 12-core KaiSheng KH-40000/12 under Windows and without SMT are comparable to AMD’s FX-8350 processor (four modules, eight threads), which the company once marketed as an eight-core CPU. We can hardly call the performance of a decade-old processor competitive by today’s standards, at least in Geekbench 5, which is not the best benchmark.
While 12-core and 16-core configurations seem okay for desktops and entry-level servers, 12 and 16 cores from Zhaoxin do not deliver performance comparable to that of 12-core or 16-core processors from AMD and Intel. Under Windows and judging only by Geekbench 5 scores, Zhaoxin seems to be a decade behind AMD and Intel regarding performance. Even if Zhaoxin enables SMT on its upcoming CentaurHoals-based CPUs (for client and server applications) and Windows ‘learns’ how to properly use those cores, KaiSheng KH-40000/16 will still be two times slower than 2021 processors from AMD and Intel with the same core count.