128-core CPU comes out: 12-channel DDR5+PCIe5.0, super top performance Intel Xeon

Tachyum, a Slovakian startup founded only in 2016, threw a blockbuster today, announcing a new 128-core Prodigy processor, which claims to have “performance that surpasses Intel’s fastest Xeon, while power consumption is only one tenth”. .

First of all, let’s talk about the background of Tachyum: the company was established in Slovakia and received $17 million in investment from the Slovak government, but several of the founders are Americans and have a lot of background.

In particular, CEO Radoslav Danilak has 25 years of experience in the semiconductor industry. He founded the once-popular SSD controller giant SandForce. He also served as CEO himself, and was acquired by Seagate. Later, he founded Skyera to continue his research on SSD controller. technology, which was acquired by Western Digital in 2014.

Tachyum Prodigy claims to be the world’s first “universal processor” because it integrates general-purpose processors, high-performance computing, AI artificial intelligence, DML deep machine learning, explainable Artificial intelligence (Explainable AI), Bio AI (Bio AI), etc., are based on a parallel multi-processor environment, which can simplify the programming model and environment.

128-core CPU comes out: 12-channel DDR5+PCIe5.0, super top performance Intel Xeon

The latest top model is Prodigy T6128, which integrates 128 physical cores on a single chip, out-of-order execution architecture, 4 instructions per clock cycle, supports 64-bit addressing, 512-bit vector operations, AI/ML vector and matrix multiplication acceleration, Virtualization, advanced RAS, running at up to 4GHz.

In terms of cache, each core has a 32KB L1 instruction cache and a 32KB L1 data cache, all of which support ECC, and the last-level cache shares 64MB and supports DECTED ECC.

In terms of memory, it supports 12 channels of DDR4 and DDR5, and the highest frequency is DDR5-4800, but each channel can only have one memory stick, with a single maximum capacity of 512GB and a total maximum of 6TB, and supports advanced error correction and RAS.

In terms of IO, it integrates up to 36 PCIe 5.0 controllers with a maximum of 48 channels, and also integrates two sets of 400G (400 Gigabit) Ethernet controllers.

Even more amazing is that all these powerful specifications, under the blessing of TSMC’s 7nm process, the packaging area is only 85 × 85 square millimeters, slightly larger than the Intel LGA2066 Core, but smaller than the AMD SP3 thread tearer.

Tachyum did not disclose the specific architecture of Prodigy. It does not know whether it is based on RISC-V, MIPS, ARM or self-developed. It just said that whether it is single-threaded or multi-threaded, it has surpassed Intel Xeon, but it is smaller than ARM.

According to reports, the Prodigy T6128 processor is suitable for large-scale supercomputers, big data, and large-scale AI applications, and can provide 262TFlops AI training and inference performance and 16TFlops HPC high-performance computing performance.

In terms of development environment, Tachyum also provides a series of tools, including FPGA simulator, software simulator, binary translator, C/C++/Fortran compiler, debugger and configuration file, TensorFlow compiler, all under the Linux operating system.

If customers do not need such a tall specification as 128 cores, Tachyum also provides different configurations such as 64/32/24/16 cores.

There are two models with 64 cores, one is T864, which supports eight-channel DDR4/DDR5 memory, 72 PCIe 5.0 channels, two sets of 400G Ethernet, two sets of HBM3 (optional), 32MB fully consistent secondary/tertiary Level cache, operating frequency of 4GHz, core voltage of 0.8V, thermal design power consumption of 180W, core area of ​​290 square millimeters, and package area of ​​66 × 66 square millimeters, which can replace single-channel/dual-channel Xeon E7 and Xeon E5.

The other is TH24, dedicated to AI/HPC, quad-channel DDR5 and/or 32GB HBM3, the latter can be used as cache or independent memory, requiring high-precision water cooling.

The 32-core model is T432, four-channel DDR4, 32 PCIe 4.0, and two sets of 100G Ethernet. The 16-core is T216, dual-channel DDR4, 32 PCIe 4.0, and two sets of 50G Ethernet. Both are small packages with low cost and price, suitable for replacing the Xeon E5, Xeon E3, and Xeon D series.

Of course, most of these products are still on paper. At present, only the 64-core T864 has been successfully taped out, and it is expected to be put into mass production this year.

The Links:   MG75H2CL1 FP75R12KT4_B16 TFT-LCD