Power11 Takes Memory Bandwidth Up To, Well, Eleven
December 2, 2024 Timothy Prickett Morgan
Last week, we went over the roadmaps for the future Power11 processor from IBM and its follow-on, the Power Next chip that we presume will be called Power 12 because, you know, history. This week we want to take a little bit of a deeper dive into the Power11 strategy and what this might mean for the systems that Big Blue will be building in the future.
If IBM’s change in strategy with Power Systems since the Power9 generation has not been obvious to you, it perhaps bears pointing out. In the Power8 and Power9 generations, IBM was trying to preserve and extend its supercomputer business, get Power-based processors installed by hyperscalers and cloud builders for infrastructure and database workloads, as well as compete with future X86 and Arm machines in core count and throughput while also stretching out big NUMA boxes to support traditional relational database backends as well as new fangled SAP HANA in-memory databases.
That is a lot of “as wells,” and it is no wonder IBM’s system architects must have felt they were being pulled in a dozen different directions. IBM had to switch foundries for Power processors from GlobalFoundries (which bought its own IBM Microelectronics business) to Samsung at the same time that the HPC market became unattractive financially for Big Blue, and that gave the company a chance to wipe the slate clean with Power10, and it did just that, re-creating a new Power instruction set architecture implementation from scratch at the same time as reworking its memory subsystem and add matrix math units to support AI workloads and other HPC jobs that are amenable to such math.
And so, we got a Power10 chip that had 16 cores instead of 48 cores, and the cores do a lot more stuff and more efficiently than its predecessors, with a lot more memory bandwidth and memory capacity than any other processor can deliver with standard DDR memories – and with OpenCAPI Memory Interface (OMI) signaling that does not require DDR memory to spin at white-hot, upper-bin clock frequencies to reach that high bandwidth.
When talking about the Power11 processors, perhaps it is best to focus on memory, which is at least as important as compute in the Power architecture, and has become increasingly so with the past several generations. Bill Starke, distinguished engineer at Power Systems and the chief architect of the Power10 and Power11 processors, shared the following chart with The Four Hundred that shows the growth in compute from the Power4 generation in 2001 to the Power10 processor unveiled in 2021. (Both Power4 and Power10 started out in big iron machines first and then were rolled into entry and midrange machines a year later.)
Now, what this chart does not show is the memory bandwidth increase, and we are going to help with that. The Power4 processor had 10 GB/sec of bandwidth into an L3 cache that fed directly into DDR main memory one for one. These Power4 chips were put onto a quad-chip SMP package made with ceramic, and some of the I/O coming out of the Power4 chip was used for memory and the rest was used for NUMA links to other chips on that package and to reach across multiple packages creating a larger shared memory system.
With the Power10 processor in 2021, the memory bandwidth out of each OMI controller on the Powerwer10 chip was 51.2 GB/sec and a total of sixteen of these links were on the package for a total peak bandwidth of 819.2 GB/sec into and out of the OMI interfaces, which translated to 400 GB/sec of memory bandwidth for the Power10 socket. That is a factor of 40X increase in bandwidth.
With the Power11 chip, memory bandwidth is going to start catching up with compute as IBM moves from its “Explorer” DDR4 OMI memory cards with one port differential DIMM memories to the future “Odyssey” DDR5 OMI memory cards that have two-port differential DIMMs and that have OMI ports running at a much faster 76.8 GB/sec of bandwidth out of the next-gen OMI controllers. (We talked about the Explorer and Odyssey memory cards for Power10 and Power11 processors without knowing their code names back in August when the Odyssey cards using DDR5 memory were made available for Power10 machines)
“If you look forward to Power11 with the new memory, you see a 50 percent increase in the speed of the OMI channels,” Starke tells The Four Hundred. “And once you get out to the buffer, instead of one port out the back side of the buffer, we have two ports out. So all told, you get a 3X bandwidth increase just from going to DDR4 to DDR5.”
Across 16 OMI controllers, that is an aggregate of 1,228.8 GB/sec of bandwidth across the OMI ports, and IBM is able to boost the memory bandwidth per socket to 1,200 GB/sec, which is a factor of 3X higher than what was available on Power10.
That, therefore, is a factor of 120X higher per chip memory bandwidth compared to the Power4, which is more in line with the performance jump we expect to see moving from Power4 to Power11 processors at the chip level. With Power11 being etched using a more refined 7 nanometer process from IBM, we do not expect a huge performance increase per core or per socket moving from Power10 to Power11, but it might be as high as 20 percent.
(By the way, ignore the fact that there are only fifteen OMI ports shown on the chart above. It is just a mistake in the graphics. There are sixteen OMI ports on both Power10 and Power11.)
If you assume rPerf is a better proxy for raw performance than CPW, then the performance jump from Power4 to Power11 will be on the order of 135X against memory bandwidth increases of 120X. If anything, Power11 is what Power10 should have looked like if you wanted something akin to balanced performance.
To put that 1,200 GB/sec of bandwidth per Power11 socket using 3.2 GHz DDR5 memory into perspective, AMD’s “Genoa” Epyc 9004 processors deliver 460 GB/sec of memory bandwidth per socket using 4.8 GHz DDR5 memory. Intel’s “Granite Rapids” Xeon 6 processors deliver 614 GB/sec using 6.4 GHz DDR5 memory chips, and shifting to multiplexed combined rank (MCR) DDR5 memory, which is only available on Intel Xeon 6 processors and which runs at a very high 8.8 GHz, Intle can push that up to 844 GB/sec of memory bandwidth per socket.
“We don’t have to run DDR5 at the fastest, most screaming rates that drive unreliable things that others in the industry are trying to do coming off their processor socket,” brags Starke. “We get the fan out that gives us the luxury of saying, just use more ports. We are putting 32 DDR5 ports behind a single processor socket. I mean, who else in the industry can touch that? There is the constant struggle as compute grows faster than memory bandwidth –there’s this increasing gap. But during the same time that we went from Power4 to Power10 and Power 11, we have kept up increasing our memory bandwidth at the same rate we increased our compute.”
You would be hard pressed to find another architecture that has managed this and not sacrificed capacity, as HBM stacked DRAM memory does to jack up the bandwidth.
By the way, if you bought Odyssey DDR5 memory cards for Power10 machines now, you will be able to plug them into Power11 machines in the future and they will match better with Power11, which is tuned for DDR5 (and we presume perhaps DDR6).
Next up, we will discuss IBM’s approach to optimizing the Power11 system stack, top to bottom, to boost performance and applicability of these systems for modern workloads.
RELATED STORIES
IBM Raises The Curtain A Little On Future Power Processors
Power10 Keeps Plugging Along As Power11 Looms For 2025
The Long And IBM i Road That Leads To Your Door
An Update On Power From POWERUp 2023
It’s A Good Thing For IBM That Samsung Makes Chips And Also Runs A Foundry
The Power10 Machines That Will Take IBM i To 2025
The Big Iron Customers That The Power E1080 Is Aimed At
IBM Drops Power10 Into Big, Bad Iron First
Balancing Supply And Demand For Impending Big Power10 Iron
Awaiting The Power10 Rollout And The New Sales Cycle
IBM Versus GlobalFoundries: A Lawsuit Instead Of The Power Chips Planned
IBM Reveals Power10 Rollout Plan, Begins Power11
IBM’s Possible Designs For Power10 Systems
Drilling Down Into The Power10 Chip Architecture
Power Systems Slump Is Not As Bad As It Looks
The Path Truly Opens To Alternate Power CPUs, But Is It Enough?
IBM Gives A Peek Of The Future At POWERUp 2019
What Open Sourcing Power’s ISA Means For IBM i Shops
IBM’s Plan For Etching Power10 And Later Chips
The Road Ahead For Power Is Paved With Bandwidth
IBM Puts Future Power Chip Stakes In The Ground