The Plus Things Change, The Plus Things Stay The Same
October 23, 2023 Timothy Prickett Morgan
Like nearly all of you, I have never read the autobiographical romance novels of Jean-Baptise Alphonse Karr, or read back issues of the satirical Le Figaro newspaper that were nearly two centuries old when Karr was editor of that paper, which is today aimed at the upper middle class and which is still one of the papers of record for France.
But like nearly all of you, I am very familiar with one of the witticisms that came out of Karr’s pen: “Plus ça change, plus c’est la même chose.” Which translates into American as something akin to: “The more things change, the more things stay the same.” I like a more literal translation myself, using my high school and college French: “The more it changes, the more it’s the same thing.” There is more of an exasperated tone in that one, which I am fairly sure, based on gut alone, was the feeling that Karr was trying to evoke.
But that is just a guess made mostly for the sake of humor and conversation.
In French, plus means “more,” and it is the word that has come to mean addition in mathematics as well as sizes for perfectly normal looking people. Clothing sizes are insane and are seemingly designed to cause women grief that they neither warrant nor deserve. Men get called “big” and “tall,” and while it is often true that some men are taller than others or bigger than others – or both at the same time – there is no connotation of the “F” word in these terms. As a man with a wife and three daughters, I object to this nonsense. And to the fact that women’s clothing is made of thinner material and often doesn’t have pockets. (I get that latter bit – women want things to fit a certain way and pockets interrupt the lines.) And I am, by the way, thin and tall and that can be as annoying as being big and tall must be. The assumption that you have to be big as well as tall has meant that my clothes have never properly fit for most of my life.
In the Power Systems arena, “plus” has a very important and useful connotation. A Power processor designated with a “+” after it – Power4+, Power5+, Power6+, Power7+, and the ill-fated Power8+ and the never seen Power9’ when IBM shifted from “plus” to “prime” as a designator – means a half step between CPU generations that nonetheless brings some price/performance advantages to Power Systems customers half-way through the product cycle between generations.
Historically, the “plus” versions of the Power CPUs were a designation that a process shrink as much as a tiny tweak to the microarchitecture had happened – what Intel called “ticking” and “tocking” on its two-cycle manner of updating its Xeon server chips for the past couple of decades before it forgot how to tick and got stuck tocking. IBM and its foundry partners – as well as the relatively small market for Power chips in terms of unit volumes – got stuck ticktocking at the same time. As chip makers often did in the 1980s and 1990s. You can see it all in this OpenPower roadmap from 2015:
With the PowerPC AS/400s from the 1990s as well as the RS/6000 workstation and server processors from the same time, IBM usually had a big change in CPU design and a process shrink at the same time. This entailed a certain amount of risk, particularly at a cadence of one or two years. In 2001, with the launch of the converged AS/400-RS/6000 Power4 processor, IBM did a lateral move with its own 180 nanometer manufacturing processes used with the PowerPC AS “S-Star” design (which is incorrectly called the RS64IV chip, giving credit to IBM Austin that is IBM Rochester’s due) and did a completely new implementation of the instruction set and a new core – and also put two of them on a die. The Power4+ was a shrink to 130 nanometers and a boost in clock speed from 1.1 GHz with the Power4 chip to 1.3 GHz to the Power4+ chip. Power5 was a lateral move to 130 nanometers with a whole new Power5 core, with a Power5+ shrink to 90 nanometers and a doubling up two chips per socket to provide a huge increase in performance.
Things got weird with the Power6 and Power6+ at the end of the 2000s, as we explained here in April 2009 in Come On Out, Power6+, You Win and IBM Launches Power6+ Servers–Again and in May 2009 in New Power6+ Iron: The Feeds and Speeds, IBM had a plan for a Power6 and then a process shrink for Power6+, but some features were pulled into Power6 and the expected performance bump and process shrink never happened as planned. It looks to us like IBM didn’t do the lateral 90 nanometer process from Power5+ to Power6 and tried to do an architecture change and a process change at the same time, and then only got 70 percent to 80 percent of the clock speed it expected from the 65 nanometer process it used for both Power6 and Power6+ chips. That’s ancient history, but it does illustrate that IBM was trying to push performance to coincide with the dwindling sales of RISC/Unix boxes by Sun Microsystems and Hewlett Packard and the rise of the Intel Xeon processors in the datacenter. The roadmap above does not say Power6+ but we can assure you that a lot of what was called Power6 was really Power6+.
With the Power7 lineup, IBM’s chip architects went nuts and made full use of the shrink to 45 nanometers and changed architecture – 4X the number of cores, more simultaneous multithreading per core, lots more L3 cache etched using embedded DRAM brought into the die – pretty radically to just crush Sun and HP and keep pace more or less with Intel’s X86 processors. Power7+ was a shrink to 32 nanometer processes with modest tweaks to the architecture. Something that Intel chief executive officer, Pat Gelsinger, who used to run Intel’s Data Center Group in the 2000s as well as being its first chief technology officer and who invented the tick-tock method of chip evolution at Intel, would have approved of.
With the Power8 and Power9, IBM did new designs and process shrinks and also had plus-style upgrades in the works but never really put them on the roadmaps so as to make a formal commitment to them. Power8+ was supposed to have enhanced GPU acceleration support and Power9’ was supposed to have an early edition of the differential memory used with the Power10 chip to radically boost memory throughput on the Power9 design. Both of these plus steps were engineered but never made it into the field.
With the Power10 chip, which IBM had a lot of time to do architectural changes because of the failure of GlobalFoundries to give IBM a working 10 nanometer or 7 nanometer process, forcing Big Blue to switch to Samsung as its foundry to get 7 nanometer processes (and giving Samsung the first complex, high-performance transistor design suitable for a server chip. And o IBM once again did a reimplementation of the ISA as well as the cores and has delivered excellent performance.
But Power10 was launched in July 2022, and that was more than a year ago, and we have all of 2024 and heaven only knows how much of 2025 to get through before there will be a Power11 chip that delivers new features and, importantly, better bang for the buck compared to Power10 and thus keeps pace with the ever-improving X86 and now Arm architectures out there in the datacenters of the world. (And thank you once again for calling it Power10 and not POWER10. There is no need to shout.)
Simply put, three or four years between processor generations is too long. We need to give customers something they want to buy and partners something they want to sell without disrupting the Power Systems server lineup. No one is suggesting anything crazy like requiring new sockets or new systems.
We think there are some interesting options. First, to make a Power10+ chip, IBM could sort through the bins and get higher core count processors out the door. There are 16 actual cores on a Power10 die, and it is an excellently designed chip, as we said here, any depending on which processor feature you are talking about, there is anywhere from 1 to 12 of them activated. In a lot of cases, it is only 4, 8, or 10 cores out of 16 cores, which is pretty low yield the way we do fractions. Say what you will, but this implies that the yield on Samsung’s 7 nanometer extreme ultraviolet (EUV) process could not have been very good in 2022, but still better than the non-existent 7 nanometer EUV processes from GlobalFoundries or Intel. If IBM could sell a 16-core Power10, you bet it would be doing so. As it is, Big Blue only talked about 15 cores in its roadmap, and it is delivering a max of 12 cores. IBM can put two of these chips in a single socket, or use a coreless second socket as an I/O extended, which is very clever indeed. But by the summer of next year, Samsung will have refinements to its very mature 7 nanometer 7LPP process and a mature 5 nanometer process (SF5E) as well as 4 nanometer (SF4E and SF4X), 3 nanometer (SF3E and SF3X), 2 nanometer, and 1.4 nanometer processes in the works for Power11 and Power12 chips, should they materialize. We think IBM will use SF3X for Power11, and cram a lot more stuff on the cores.
But we need a bridge between Power10 and Power11, and that is a Power10+ chip. So, IBM, sort through the bins and fine the cores that work between 12 and 16. And where you had four cores, now give six. Where they had six, now give eight or ten. If you want to be really generous, like Intel had to be during its “Cascade Lake” Xeon SP processor phase when it was stuck at 14 nanometers two and three years ago, give away the extra cores for free. If you can add cores and increase clock speeds, and still stay at 7 nanometers, do that. What is the real cost? It is not like spending a few hundred million dollars on new masks. It is like asking Samsung to do the job that it is supposed to be doing and letting customers benefit from whatever is left of Moore’s Law.
Make companies want to buy new iron. Make them an offer they can’t pass up. (Instead of making them an offer they can’t refuse.) Be generous and show the IBM i and AIX bases some love. There are on the order of maybe 160,000 IBM i and AIX customers in the world, and maybe close to 600,000 machines and something on the order of 2 million logical partitions. And at any given time, probably a sixth to a seventh of them want to do an upgrade. Make the money that is on the table for what is probably around 20 months before the Power11 launch. There is probably something on the order of 140,000 machines that need to be upgraded between now and then, or consolidated down to bigger iron with LPARs.
Next, add support for the CXL 3.0 protocol (largely from Intel) and the NVLink 4 protocol (from Nvidia) to the “BlueLink” I/O ports on the Power10 design.
Right now, the NVLink 4.0 ports might be the most important feature to add, given the craze over generative AI and the scarcity of Nvidia GPUs. Anything that can help improve the utilization of GPUs means companies will need fewer of them, and at $30,000 to $40,000 a pop for the “Hopper” H100 GPUs, this is big money. And the utilization of these GPUs is not as good as it could be because the GPUs have only 80 GB or 96 GB of HBM3e memory with somewhere north of 3 TB/sec of aggregate memory bandwidth. Nvidia has tightly coupled its “Grace” 72-core Arm server CPU with Hopper essentially to provide it with a DRAM memory buffer, but that only weighs in at 480 GB (of a total of 512 GB on the CPU card). IBM Power10 chips support 4 TB of memory. With DDR4 memory running at 3.2 GHz, IBM can deliver 410 GB/sec of memory bandwidth across the OpenCAPI Memory Interface (OMI) differential memory ports. If IBM shifted to DDR5 memory – which it can do because the protocol is on the memory stick, not in the OMI controller – running at 6.4 GHz, then it would be able to drive 820 GB/sec per socket. If IBM shifted a Power10 to use GDDR6 graphics memory, which it showed off in some Power10 presentations back in 2020, it could drive 800 GB/sec of bandwidth with memory that is a lot cheaper than HBM3e.
The Grace chip can only push 546 GB/sec out of its 480 GB of LPDDR5 memory.
You will remember that the Power9 chip supported NVLink 2, providing coherent memory across the CPUs and GPU complexes and allowing IBM to win big supercomputing deals at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory. With NVLink 4 support, IBM could tightly couple the Power10 CPU and the Nvidia H100 GPU into a shared memory compute complex – perhaps with four GPUs and one CPU – and then use the other BlueLink ports on the Power10 chip and its “memory inception” memory area network to tightly couple these units together to create a GPU cluster that doesn’t use InfiniBand or Ethernet networking at all and that has a lot more aggregate bandwidth and lower latency – possibly at a lower cost and a higher utilization of those GPUs that are so expensive.
It is time for IBM to start thinking about how it can beat Nvidia at its own game and help Nvidia – and possibly AMD – at the same time by also helping AI customers.
Last thing: A Power10+ chip should support the CXL 3.0 memory extension and memory pooling protocol. Those PCI-Express 5.0 ports on the Power5 chip can do this, and the BlueLink ports can probably be made to do it, too, which would allow more memory to be added to the socket (not all that important, given the 4 TB capacity) or for lower-capacity DIMMs to be used to add more capacity for a lot less money (which is very important).
IBM has lots of opportunities here to make some money with the systems architecture foundation it has already laid, and also to make existing IBM i and AIX customers happy, too.
RELATED STORIES
It’s A Good Thing For IBM That Samsung Makes Chips And Also Runs A Foundry
The Power10 Machines That Will Take IBM i To 2025 (July 2022)
IBM Reveals Power10 Rollout Plan, Begins Power11
IBM’s Possible Designs For Power10 Systems
Drilling Down Into The Power10 Chip Architecture
Power9 Prime Previews Future Power10 Memory Boost (September 2019)
The Road Ahead For Power Is Paved With Bandwidth
At Long Last, IBM i Finally Gets Power9 (February 2014)
IBM Readies Power8+ For OpenPower Push (The Next Platform, July 2015)
A Peek Inside IBM’s Merchant Power8 Processors
Invader II: New Power7+ Machines Take On Entry X86 Iron (February 2013)
IBM Power7+ Chips Give Servers A Double Whammy
Power7+ Chips Juiced With Faster Clocks, Memory Compression
Some Insight Into Those Future Power7+ Processors
New Power6+ Iron: The Feeds and Speeds (May 2009)