Ryzen and AM4
It is pronounced ‘Rye-zen.’
As of this time AMD has released specs for only their highest end Ryzen SKU. Though pricing was not touched upon, we did get the confirmation that the chip is an eight core 16 thread chip, running at a minimum base clock of 3.4 GHz. AMD has not revealed what its max turbo clock will be yet. AMD also announced that the chip would have a TDP of 95 watts. The CPU also features 4mb of L2 cache and 8+8 MB of L3 victim cache.
We also know that Ryzen will use a new platform, AM4. Ryzen shares this platform with the OEM only Bristol Ridge. AM4 will operate, using a split IO design between the CPU and the chipset such that for minimal function, a chipset is not needed. However, AMD has pointed out that with Ryzen, AM4 with the right chipset will support USB 3.1 Gen 2 (10 Gbps), NVMe SSDs, SATA-Express, and offer ‘ultimate upgradability’. The ultimate upgradability most probably refers to the chipset offering numerous PCIe lanes.
AMD as of now has not revealed the actual number of PCIe lanes on Ryzen, what is the actual size of the micro – op cache in the core , if their is any limitations using an l3 victim cache, the quality of the DDR4 controller on the chip, power consumption and what is the single core performance / IPC of the chip . Though AMD did give a sneak peek about some of those questions.
Power, Performance and Pre-Fetch: AMD SenseMI
AMD used Handbrake video transcode, a multi-threaded CPU intensive workload, to demonstrate the power of its new Ryzen part. The Ryzen CPU without boost beat a retail 6900k, a $1100 CPU, in the test by about five secs. Next, they fired up Blender once again, and again this 2 CPUs were neck and neck, with Ryzen Narrowly getting a victory over the 6900k. AMD also fired up some power meters, showing that Ryzen power consumption in this test was a few watts lower than the Intel part, implying that AMD is meeting its targets for power, performance and as a result, efficiency. The 40%+ improvement in IPC/efficiency is still being thrown around, and AMD seems confident that this target has been surpassed.
Next up AMD showed what Ryzen can do in gaming . 2 identical systems , 1 comprising of a TITAN XP SLI with 6900k and the other comprising of the Ryzen CPU with the same TITAN XP SLI , booted into Battlefield 1 at 4k Ultra settings . Both the systems performed the same , as expected .
Mark Papermaster, CTO of AMD, explained that during the Zen design stages, up to 300 engineers were working on the core engine with an aggressive mantra of higher IPC for no power gain. This is not an uncommon strategy for core designs. Part of this will be down to two new power modes, which adjust and extend the power/frequency curve, which is part of AMD’s new 5-stage ‘SenseMI’ technology.
SenseMI Stage 1: Pure Power
Some recent microprocessor launches have revolved around silicon-optimized power profiles. We are now removed from the ‘one DVFS curve fits all’ application for high-end silicon and AMD’s solution in Ryzen will be called Pure Power. The short explanation is that using distributed embedded sensors in the design (first introduced in bulk with Carrizo) that monitor temperature, speed and voltage, and the control center can manage the power consumption in real time. The glue behind this technology comes in form of AMD’s new ‘Infinity Fabric’.
‘What is this new Infinity Fabric?’ I hear you say. It was only explained in the context of that it provides control and through the Infinity System Management Unit it can adjust power consumption while keeping in mind everything else that’s happening. The fact that it’s described as a fabric suggests that it goes through the entire processor, connecting various parts together as part of that control. Whether this is something wildly different to what we saw in Carrizo, aside from being the next-gen power adjustment and under a new name, is hard to determine at this point but we are probing for more details.
The upshot of Pure Power is that the DVFS curve is lower and more optimized for a given piece of silicon than a generic DVFS curve, which results in giving lower power at various/all levels of performance. This, in turn, benefits the next part of SenseMI, Precision Boost.
SenseMi Stage 2: Precision Boost
For almost a decade now, most commercial PC processors have invoked some form of boost technology to enable processors to use less power when idle and fully take advantage of the power budget when only a few elements of the core design is needed. We see processors that sit at 2.2 GHz that boost to 2.7 GHz when only one thread is needed, for example, because the whole chip still remains under the power limit. AMD is implementing Precision Boost for Ryzen, increasing the DVFS curve to better performance due to Pure Power, but also offering frequency jumps in 25 MHz steps which is new.
Precision Boost relies on the same Infinity Control Fabric that Pure Power does, but allows for adjustments of core frequency based on performance requirements and suitability/power given the rest of the core. The fact that it offers 25 MHz steps is surprising, however.
Current turbo control systems, on both AMD and Intel, are invoked by adjusting the CPU frequency multiplier. With the 100 MHz base clock on all modern CPUs, one step in frequency multiplier gives 100 MHz jump for the turbo modes, and any multiple of the multiplier can be used on the basis of whole numbers only.
With AMD moving to 25 MHz jumps in their turbo, this means either:
- The base frequency has reduced down to 25 MHz, and AMD can implement a 136x multiplier to reach 3.4 GHz, or
- AMD can implement fractional multipliers, similar to how processors in the early 2000s were able to negotiate 0.5x multiplier jumps, or
- Precision Boost only applies to internal clocks that the user doesn’t see or control but can assist with performance.
Without additional information, the second point in that list seems more in line with what would be possible. If we consider that Zen’s original chief designer was Jim Keller (and his team), known for some older generation of AMD processors, a similar technology might be in play here. If/when we get more information on it, we will let you know.
SenseMi Stage 3: Extended Frequency Range (XFR)
The main marketing points of on-the-fly frequency adjustment are typically down to low idle power and higher performance when needed. The current processors on the market have rated speeds on the box which are fixed frequency settings that can be chosen by the processor/OS depending on what level of performance is possible/required. AMD’s new XFR mode seems to do away with this, offering what sounds like an unlimited bound on performance.
The concept here is that, beyond the rated turbo mode, if there is sufficient cooling then the CPU will continue to increase the clock speed and voltage until a cooling limit is reached. This is somewhat murky territory, though AMD claims that a multitude of different environments can be catered for the feature. AMD was not clear if this limit is determined by power consumption, temperature, or if they can protect from issues such as a bad frequency/voltage setting.
By the sounds of it, this is a dynamic adjustment rather than just another embedded look-up table such as P-states. AMD states that XFR is a fully automated system with no user intervention, although I suspect it will still have an on/off switch in the BIOS. It also somewhat negates overclocking if your cooling can support it, which then brings up the issue for overclocking in general: casual users may not ever need to step into the overclocking world if the CPU does it all automatically.
I imagine that a manual overclock will still be king, especially for extreme overclockers competing with liquid nitrogen, as being able to personally fine tune a system might be better than letting the system do it. It can especially be true in those circumstances, as sensors on hardware can fail, report the wrong temperature, or may only be calibrated within a certain range.
It does raise the question as to how overclockable Ryzen will be, how many SKUs will be unlocked, or if XFR may only be on certain processors. As the Zen microarchitecture is destined for server and mobile as well, XFR will have different connotations for both of those markets (some of which might not be welcome).
SenseMi Stage 4+5: Neural Net Prediction and Smart Prefetch
Every generation of CPUs from the big companies come with promises of better prediction and better pre-fetch models. These are both important to hide latency within a core which might be created by instruction decode, queuing, or more usually, moving data between caches and main memory to be ready for the instructions. With Ryzen, AMD is introducing its new Neural Net Prediction hardware model along with Smart Pre-Fetch.
AMD is announcing this as a ‘true artificial network inside every Zen processor that builds a model of decisions based on software execution’. This can mean one of several things, ranging from actual physical modelling of instruction workflow to identify critical paths to be accelerated (unlikely) or statistical analysis of what is coming through the engine and attempting to work during downtime that might accelerate future instructions (such as inserting an instruction to decode into an idle decoder in preparation for when it actually comes through, therefore ends up using the micro-op cache and making it quicker).
Modern processors already do decent jobs when repetitive work is being used, such as identifying when every 4th element in a memory array is being accessed, and can pull that data in earlier to be ready in case it is used. The danger of smart predictors however is being overly aggressive – pulling in too much data that old data might be ditched because it’s never used (over prediction), pulling in too much data such that it’s already evicted by the time the data is needed (aggressive prediction), or simply wasting excess power with bad predictions (stupid prediction…).
AMD is stating that Zen implements algorithm learning models for both instruction prediction and prefetch, which will no doubt be interesting to see if they have found the right balance of prefetch aggression and extra work in prediction.
It is worth noting here that AMD will likely draw upon the increased L3 bandwidth in the new core as a key element to assisting the prefetch, especially as the shared L3 cache is a victim cache and designed to contain data already used/evicted to be used again at a later date.
AMD did confirm that the launch for Ryzen is still Q1, and Naples (the server counterpart for the Zen microarchitecture) is still on for Q2.