The Road to RDNA 4! A brief overview of RDNA 3!

TECHNOLOGY

Rejoice gamers! RDNA 4 is almost upon us as the latest advancement in Radeon’s technology. However, before we tackle this new giant, we need to establish the stage for it and for that we need to discuss RDNA 3! This is the architecture at the heart of the RX 7000 series of GPUs and some versions of this can be seen in integrated and mobile solutions.

If you want to, please check our previous articles covering GCN, RDNA 1, and RDNA 2. This too will help prepare you for RDNA 4!

RDNA 3 – Navi generation 3.0!

Seen in: Navi 31, Navi 32, Navi 33

RX 7900 XTX, 7900 XT, RX 7900 GRE, RX 7800 XT, RX 7700 XT, RX 7600 XT, RX 7600

 

 

The Sapphire NITRO+ AMD Radeon™ RX 7900 XTX Vapor-X 24GB is one very fast GPU!

Let us start with the big Kahuna - the most important change seen on several RDNA 3 models is the introduction of chiplets in mainstream gaming-oriented graphics processors! But what does that even mean?

In the case of the biggest Navi 31 chip, the RX 7900 XTX, this means that it has a single central compute die built on the modern 5nm node from TSMC. Alongside it, it is flanked by 6 smaller chips (usually referred to as chiplets) containing infinity cache built on the still potent but much cheaper 6nm node.

Graphics processors are ultimately billions of transistors and logic. The problem is that when you shrink said logic, not all of it shrinks equally. Unfortunately, while caches do benefit from smaller process nodes, the improvement is minor compared to how well computational parts scale. This is why it makes sense to use a cheaper and more mature node for caches specifically – they will still be pretty good on the older process, while the logic that actually does most of the work gets all of the benefits from a smaller process.

 

A peak at the beast’s insides. Credit to Locuza and Nemez!

Separating these chiplets from the main chip of course comes with tradeoffs. The biggest of which being that there will always be somewhat of a penalty in the theoretical effectiveness of these types of configurations. The fabrics they use to communicate do use extra power, and there is added latency on top. One can mention a more involved manufacturing process too. There are ways to counteract those cons of course, for example if the chiplets help make the whole package less thermally dense – you can use this headroom in heat to force higher clocks and de facto lower the penalty to latency. Still, like all things in engineering, it comes with some tradeoffs. Pros and cons! Though we can see that they seem to be worth it for the big RDNA 3 chips. Products such as the 7900 XT, XTX, GRE, and the Navi 32 based products like the 7800 XT would have cost more were they built in the traditional monolithic way, so the economic advantage of chiplets do matter a lot for some segments. For the relatively tiny Navi 33 chips serving as a foundation for the RX 7600 and 7600 XT though – its best to keep things monolithic.

Chiplets are not magic but they sure can be useful!

It remains to be seen how well this can scale when it isn’t just caches or memory on the chiplets but logic too. This hasn’t happened with RDNA 3 yet, but it is one of the next steps for high end graphics!

Latency hits can be mitigated!

Memory bandwidth is always important, especially now with the rise of high-resolution gaming and ray tracing. The new generation from AMD tackles this problem in a few ways. The most obvious one is the usage of newer, more performant memory chips. While for RDNA 1 we had 14 Gbps chips and for RDNA 2 we had 16 originally with 18Gbps coming up for the refreshes – RDNA 3 makes due with 18 to 20 Gbps ones out of the gate! That is a welcome boost for sure, but it won’t be enough by itself.

The second big change is in the new generation of infinity cache. It is inherently lower in latency than what was present in the previous architecture. It is also able to add around some 10-15% extra bandwidth when capacity is equal. That is to say, if a RX 6600 XT with its 32 mb of generation 1 infinity cache gets around 880 GB/s of bandwidth, then the RX 7600 with 32 mb of 2nd generation cache gets 980-990 GB/s of bandwidth, with slightly lower latency to boot.

Interesting test showcasing the RX 7600’s improved bandwidth versus the RX 6600 XT and other competitors! From Chipsandcheese and Chester Lam!

But even this is not enough, bandwidth requirements are sky high these days at higher resolutions. For the Navi 31 and 32 chips AMD decided to also increase their memory bus. The 7900 XTX has a massive 384-bit memory bus, 50% bigger than the 6900 XT/6950 XT’s 256-bit bus. Combined with the higher speed memory this almost doubles the VRAM bandwidth available to the GPU compared to Navi 21, 960 GB/s vs 512 GB/s. To top things off, it also increases the VRAM amount size from 16 to 24 GB and made the L2 cache go up to 6 MB of size while lowering its latency a bit too. These upgrades guarantee that the 7900 XTX is well fed even at 4K, 5K, and 8K, having more overall effective bandwidth than the already prodigious 6900 XT / 6950 XT.

Lastly, the L0 and L1 caches were also doubled with lower latency, further increasing RDNA 3’s capability. Though it’s important to note that the smaller RDNA 3 RX 7600 does not have all of the same upgrades as bigger RDNA 3 parts on this front, though it is still superior to its RDNA 2 predecessor.

One of the major changes in the RDNA 3 CU is its capability to issue doubled FP32 calculations for some tasks! This means that certain instructions or tasks are de facto doubled, meaning twice the compute throughput as compared to a similar RDNA 1 or 2 part! Now it should be noted that in real world tasks this is very rarely going to scale that high. Not all instructions can be processed in this manner and will simply put work as on RDNA 2. Other tasks that can be sped up may still see limits in bandwidth or other parts of the GPU pipeline. Still, a 15% advantage on average, all else being equal (and it isn’t) is nothing to scoff at!

The PULSE AMD Radeon™ RX 7600 XT 16GB from Sapphire is a good mainstream choice!

FP16 is starting to see more and more use in 3D graphics and here the 7900 XTX can score up to 123 Teraflops! Which is incredible performance, though it should be tempered a bit because that can only happen under very specific FP16 instructions.

Work smarter, not faster! Ray Tracing is complicated so extracting as much value from each operation as possible is paramount!

Ray Tracing and Path Tracing are the future of 3D graphics. Raster is not going anywhere, but it will be for sure supplanted by these techniques in the coming decades, especially for gaming. RDNA 2 introduced these techniques to AMD Radeon and proved capable of using them sparingly in games. A good first attempt – but a lot more is needed. This is where RDNA 3 makes one of its biggest leaps compared to the old architecture – in the form of new instructions being implemented in hardware to allow for faster processing of ray intersections. This alongside with the improved caches and compute helps alleviate GPU bottlenecks especially in the heavier ray traced scenes.

It should be noted that the actual performance jump here varies. In general, the heavier the scene the more RDNA 3 will leave RDNA 2 behind. For an example, when testing in Metro Exodus Enhanced Edition, the RX 7600 Pulse was 56% faster than the RX 6600 XT Nitro. This is in stark contrast to their difference in rasterization – usually around 10%.

One interesting detail in the RDNA 3 architecture is that the front end and back end of the GPU are decoupled in their clock speeds. The reason this was done was to allow the GPU to eke out as much performance as possible in the variable workloads seen in games. This is because many workloads are limited by the frontend of the GPU, not so much by what the shader engines can produce. So, this granularity allows the GPU to run certain parts of its pipeline faster or slower than others, saving power in the process.

The Navi 32 Sapphire PURE AMD Radeon™ RX 7800 XT 16GB achieves almost 6900 XT level performance with much lower resources!

Do remember though that RDNA 3 parts in general clock higher than their predecessors! While this depends a bit on the specific SKU and its power limits, you can expect higher clocks and/or the same clocks alongside a lower power usage with RDNA 3 products.

What about geometry processing? The 7900 XTX has an over 50% improvement over the already damn fast 6900 XT here. Higher clocks, extra geometry engines, and superior culling all add up here. RDNA 2 was a geometry cruncher, but RDNA 3 is outright monstrous. If only developers used the more advanced DX12_2 features though

The media engine has seen a big upgrade compared to what previous Radeon parts had – allowing for the simultaneous encoding and decoding for both AVC and HEVC. It can also handle AV1 at 8K/60. Overall, a good step up in capability.

AI is a big deal and RDNA 3 brings massive improvements here over RDNA 2. New mixed-precision capability is added and the Matrix output is doubled. We now have support for BF16 (Brain-Float 16-bit) and INT 4 Wave Matrix Multiply Accumulate operations. Add the general compute (FP32 and FP16) improvements and RDNA 3 is massively stronger in these workloads than any previous GPU from AMD!

And last but definitely not least – support for the new DisplayPort 2.1 standard is here at last! Though not quite perfect (it is limited to 54 Gbps) this is still a notable step up from previous DisplayPort standards!

The future is bright – bring on RDNA 4!

An overview of RDNA 3 at a glance!

In the next article we will cover RDNA 3.5 and the brand new RDNA 4!  One thing to remember about GPU architectures is that there is no such thing as a full, finished and done architecture. They are constantly evolving and any change or upgrade begets further tweaks and enhancements and that eventually spawns new technology and new possible routes to develop the architecture. In this grand story, RDNA 3 is just one more step for Radeon – but it is an important step indeed!

 

The articles content, opinions, beliefs and viewpoints expressed in SAPPHIRE NATION are the authors’ own and do not necessarily represent official policy or position of SAPPHIRE Technology.

Alexander Yordanov
My name is Alexander and I am an enthusiastic PC Gamer from Sofia, Bulgaria. Video games have been my go-to hobby for as long as I can remember. I started with good old DOOM and Warcraft 1 and also had a Terminator console. In time my often outdated hardware has made me read up Tech Guides and try to understand what goes within a game as well as how to appreciate it or understand it better.

JOIN THE NATION

SIGN UP
JoinSapphireYoutube_logo

COMMENTS