Thanks For The Cheaper, Faster Memories

June 9, 2014 Timothy Prickett Morgan

The new Power8 systems started shipping last Friday, and this week, in our ongoing coverage of the new hardware and software technologies embodied in the systems, we are going to take a look at the memory subsystems in the new machines. There are a slew of new technologies that have been added to the Power8 machines so the memory can keep the processor cores well fed and humming along through their work.

The Power8 chip has two DDR3 main memory controllers, one on either side of the chip. Some of the DDR interfaces that were previously on the Power7+ controller have been moved out to a memory buffer chip called Centaur (because it is half main memory and half L4 cache). By moving the DDR interfaces out to the buffer chip, IBM can change the memory card and support DDR4 memory on these systems without having to change the processor. (This is smart, and eventually, all chip makers will probably go this route as they disaggregate components in the system.)

Each Power8 chip has eight memory channels, and one of these Centaur chips links to each one through a 9.6 GB/sec interface. The Centaur buffer chip has 16 MB of L4 cache implemented on it, and on a fully scaled socket with 12 cores, that yields 128 MB of L4 cache for the socket to play with. In the initial machines, IBM is supporting main memory cards with 16 GB, 32 GB, and 64 GB capacities, and the Centaur chip is welded onto these cards thus:

The memory cards for the Power8 systems do not mount on the processor cards, as they did with Power7 and Power7+ machines, or through riser cards, but rather have their own dedicated slots in the machine. Just because IBM has chopped the Power8 chips in half and created a dual chip module (DCM) instead of a real 12-socket chip (presumably because of yield issues with its 22 nanometer processes and the relatively large Power8 chip) does not change the memory configuration of each Power8 socket. Architecturally, IBM is able to support 128 GB memory sticks on the machines, pushing memory capacity up to 1 TB per socket, but for the first release of the Power8 machines, memory is capped at 512 GB per socket with 64 GB memory sticks. The memory cards run at 1.6 GHz, which is considerably faster than the 1.07 GHz memory used in Power7+ systems.

The way the math works out, with two bytes read and one byte written per channel per clock, the memory channels can sustain about 8 GB/sec of memory bandwidth per channel between the processors and the Centaur buffer chip, which works out to an aggregate of 192 GB/sec. The links between the L4 cache on the Centaur chip and the 32 total DDR3 memory ports have an aggregate of 410 GB/sec of bandwidth. This is a truly large amount of memory bandwidth, and is one of the features that IBM is bragging about comparing the Power8 processors to X86 alternatives.

The Power8 chips continue with the support of Active Memory Expansion compression that was available on prior generations of Power7 and Power7+ processors, and as has been the case in the past, this feature is only being made available on processors configured to run IBM’s AIX variant of Unix. IBM typically sees somewhere around a 2:1 compression ratio with Active Memory Expansion, and the Power7+ chips the memory compression algorithm was etched in the circuits of the chip to make it run even faster. The Power8 chip has memory compression circuits as well, but what it does not have is support for memory compression for IBM i and Linux workloads. This is annoying because IBM i customers could sure use a break on memory prices and IBM needs to have memory compression as a competitive advantage for the Power platform running Linux as it comes up against X86 servers running Linux.

The Power8 memory cards also have memory sparing, which helps with the resiliency of the system in the event a memory chip throws a rod.

Those Power8 memory controllers also support what is called transactional memory. This transactional memory made its debut with the zEnterprise EC12 mainframes announced in the fall of 2012. The BlueGene/Q massively parallel supercomputer created by IBM also supports transactional memory.

With regular memory, resources are locked down to avoid contention when transactions are pumped through the system. But with transactional memory, the processor does its work and assumes (correctly) that most of the time there is no contention and then if you do find contention, you back out and wait and redo the work. On mainframes, shifting to this transactional memory model gave the System z EC12 running DB2 databases and virtualized server workloads as much as a 45 percent performance boost. Here is the inside dope I could find thus far on transactional memory support with the Power8 systems:

I have looked high and low to find out what operating systems support transactional memory. By default, AIX must support transactional memory, but it is unclear if IBM i or Linux do. It is also not clear what, if anything, needs to be changed to make use of transactional memory in either systems software or application code.

On midrange and high-end systems, IBM usually has capacity on demand upgrades that allow for memory to be activated in a utility fashion as needed, but on the entry Power8 machines, all of the memory capacity is on by default in the cards. So you don’t have to price the card and memory activations any more. On the single-socket machine, you plug in memory as you see fit, but on the two-socket boxes, you have to plug memory cards into the system in pairs. (This is standard practice to keep systems balanced and well-behaving.) IBM strongly recommends that even on a single socket machine, because the chip is split into two inside the socket, that each half of the Power8 DCM be given its own memory stick. The memory in the 2U machines is packaged differently from that used in the 4U machines, because the former doesn’t have as much vertical space. You cannot mix different memory capacities within a pair of cards, but you can mix different pairs of capacities on a single server. The more memory sticks you add to the system, the more memory bandwidth available to the applications, and thus IBM recommends that at least half the memory slots be populated in any configuration. This leaves room to grow with fatter memory sticks later.

So that is the new memory, and we will be digging in to see how it affects system performance. For now, we can tell you all about the memory prices for the features on the new machines. Take a look at the list price comparisons between the Power7+ and Power8 machines:

There are a couple of things to note in the table. First, the entry 16 GB card is more expensive than the fatter 32 GB and 64 GB cards on the Power8 machines. That Centaur buffer chip is presumably not cheap and that is very likely the reason why the price of a 16 GB DDR3 card for the Power8 systems is 47.1 percent more expensive than a 16 GB card on Power7 and Power7+ machines. But at $78 per gigabyte, it is still less expensive than 64 GB cards (on a per gigabyte basis) on the older iron, so that is good. The fatter cards come in at $53 per gigabyte, the same price IBM was charging for less dense cards on Power7+ machines. You will also note at the bottom of the table that the memory was a little less than half the price on the PowerLinux machines based on Power7+ processors, but with the Linux-only Power S812L and S822L machines based on the Power8 chips the memory costs the same as on the plain vanilla servers that can support IBM i or AIX or both.

So memory prices are back in lockstep again, and you are paying the same per gigabyte for fat sticks as you were paying for skinny sticks on the earlier systems. While it would be better to have memory prices come down across the board, having the IBM i customer get the same price as the AIX and Linux customer is at least more fair. Of course, AIX shops have memory compression, which effectively cuts their memory prices in half–at least. It would be great if memory compression were brought to IBM i, of course.