Following our previous articles covering both GCN and RDNA 1, we are now going to look into AMD’s latest and greatest gaming architecture – RDNA2. This is the architecture that is at the heart of the current RX 6000 series of GPUs and versions of it are also fueling the current generation of gaming consoles. Suffice it to say – this is one important piece of engineering!
RDNA 2 – Navi 2.0 is here!
Seen in: Navi 21, Navi 22, Navi 23
RX 6900 XT (BIG NAVI), RX 6800 XT, RX 6800, RX 6700 XT; modified for PlayStation 5, Xbox Series X and S
The pinnacle of performance – Sapphire’s AMD Radeon RX 6900 XT TOXIC Extreme Edition
Remember how Hawaii (GCN 2.0) was one hell of a jump over Tahiti (GCN 1.0) on the same node? Well, RDNA 2 is perhaps an even bigger jump than that!
Starting with the process, all current RDNA 2 GPUs still use the exact same 7nm process as what the RDNA1-based RX 5700 XT used. Now of course, production processes naturally improve over time but they can never fully equate to a new node. This meant that AMD had to somehow improve performance per watt without relying on a more efficient process. This is no easy task, but suffice it to say, AMD succeeded and delivered a monumental 65% performance per watt increase from the Navi 10 XT to the Navi 21 XTX designs!
How to improve performance per watt on the same process is quite tricky, and the features listed later on in the article do actually help that goal. The rest really comes down mostly to architecture as well as smart area optimization and transistor usage.
This slide shows some of the perf per watt improvements. It is 54% if we use the 6800 XT and 65% if we use the 6900 XT
The basic level of the architecture is largely unchanged from RDNA 1. However, there are some key differences still. Each RDNA 2 Compute Unit (CU) can clock noticeably higher than an RDNA1 CU or use a lot less power at the same clocks. It also has increased mixed-precision capabilities, something that will come in handy with future complex reconstruction techniques as well as low-level APIs. However, there is a catch – the CUs (or to be exact their SIMD32 unit, a part that works with them) can keep track of slightly fewer wavefronts per clock, only 16 rather than 20. This change to the queue depth was very likely made to help increase the clock speed of the machine and is almost never a problem in the real world. In all cases, the higher clock speeds more that completely offset the minor architectural loss, and 16 wavefronts is usually more than enough for the majority of 2D and 3D workloads. The wavefront width remains unchanged at 32/64 though.
Geometry performance has deceptively not improved much, at least in theory. In practice the RX 6900 XT has the same quad-geometry engine design as the 5700 XT but the improved NGG pipeline and higher clock speeds both add up to yet another notable jump in tessellation and geometry performance. Make no mistake – RDNA 1 was a monster at geometry. However, RDNA 2 is even better. In fact, the new paths are so efficient that the RX 6700 XT generally equalizes with the 5700 XT in 3D geometry throughput despite having only sixtyish percent of the theoretical geometry capability of the RDNA 1 GPU.
For all intents and purposes, the WGP is the new real workhorse within RDNA 1 and RDNA 2 – it is made up of 2 CUs.
The render output units (ROPs) have also doubled on the high-end parts, at last! That is not the only upgrade though, since the GPU's Rasterizer is now capable of more efficient batching so it can pack workloads in a manner that guarantees fewer internal stalls. This means that utilization is kept high at all times.
At the heart of the biggest of big Navi GPUs is the monstrous Navi 21 XTX(H) chip – a 80 CU behemoth at almost 520 mm^2, fueling the RX 6900 XT. This is much bigger than the comparatively small RX 5700 XT from RDNA1. To top it all off the monster (and all other RDNA 2 chips) clocks noticeably higher too! ! The 6900 XT is over twice the speed of the top RDNA 1 part and this absolutely staggering increase in performance requires bandwidth to be driven – lots of bandwidth. Alas, due to overall costs, HBM2 would be a sub-optimal choice for most RDNA 2 products. This means that AMD was mostly stuck with the more traditional GDDR6. Of course, upgrading the GDDR6 chips to the improved higher-speed 16G Gbps ones that are now readily available is an obvious upgrade over RDNA1, but it would not be enough alone.
The RX 6800 XT NITRO+ is excellent at 1440p and 4K gaming! It has 64 CUs available for pixel shading operations and 72 for compute operations!
A different way to solve the bandwidth situation would be a wider memory bus. The problem with having a very wide GDDR6 bus is that it consumes a fair bit more power, plus the extra costs on the PCB, the GPU die, and the memory chips themselves do add up. Having to upgrade the cooler and general electrical capabilities of the design adds yet another complexity to the whole product too. This is why AMD decided on something a tad more innovative. Inspired by the awesome performance of Zen 2 and Zen 3’s L3 caches, AMD engineers created a massive GPU L3 cache and dubbed it “Infinity Cache”.
Saving overall power and increasing bandwidth at the same time!
The implementation of this new L3 cache (alongside the VRAM and L0, L1, L2 caches of course) on high-end RDNA 2 products like the RX 6900 XT is truly impressive. This technology is a game changer since it has lower latency than standard GDDR memory and is much faster. Even with “low” hit rates by cache standards, at let’s say 50% at 5K, it still allows for raw bandwidth competitive to what the massive 4096-bit bus of the Radeon VII provides!
For another example, in terms of raw bandwidth, the 6700 XT actually has more of it than the 5700 XT, despite having half the L2 cache and lower VRAM bandwidth than the RDNA 1 chip. Its potent 96mb of Infinity Cache is more than enough to blow past the aforementioned limitations even at 4K. As for the highest end monsters like the 6900 XT and 6800 XT – even at 4K and 5K resolutions they are almost never actually bandwidth-limited. The rest of the GPU runs out of steam way before the caches and VRAM run out of bandwidth.
One under looked but notable improvement is the upgraded display engine! Adding native HDMI2.1 is a great addition that allows more bandwidth than even DisplayPort 1.4. It finally allows for full, 120 Hz HDR gameplay on the current best 4K TVs. Oh, and let’s not forget that RDNA 2 also features AV1 decode!
These features are a very big deal for the future of 3D graphics!
Among the new features within DirectX12 Ultimate we have: Sampler Feedback (helps with VRAM utilization), Real-time Ray Tracing (Advances general visuals), Variable Rate Shading (performance optimization), and Mesh Shading (superior geometry processing). These new features are the key to even better and/or more efficient future graphics. We will cover what they and other advanced Vulkan or DirectX functions can offer in a different article though.
The final new additions with RDNA 2 are the yet to be demonstrated DirectStorage feature and the new Smart Access Memory (SAM) feature. The way SAM works is that it allows the CPU to more easily access the GPU’s memory – boosting performance slightly in both GPU and CPU limited scenarios. It is based on a previously unused but actually quite neat PCIE feature called Resizable BAR. Unfortunately, this feature requires some OS and driver maturation to fully shine, but that is to be expected. In the here and now, some games show reasonable gains while others may actually lose out on a bit of performance. However, every single driver release from AMD improves the situation little by little and future games will be made with this feature already in mind. There is already a stark difference between SAM on the old day one drivers and SAM with the current drivers! I can personally truly recommend the feature now and it thankfully works with Ryzen 3000 and 5000 CPUs, as well as Intel’s 10th and 11th gen processors.
As for DirectStorage – its ramifications are major too. It should change the way developers make games long term. Obviously, we are in the early days of this feature on PC, so we would cover it in a separate article once it is out. To give you the gist of this tech – it allows for even faster loading times in games and much bigger and more detailed game worlds! The only caveat is that you need an NVME Gen 3 or 4 SSD to enable it.
The RX 6700 XT Pulse is a sensible and fast product!
RDNA 2 proves that AMD’s focus on making architectures that are smarter and more efficient is the right move to make. After all, the saying goes “work smarter, not harder” and that is the mentality at the heart of the RDNA architecture(s). If we compare the powerful 5700 XT to its replacement, we can observe that the new RDNA 2-fueled RX 6700 XT achieves a lot more with specifications that are at times actually weaker than its predecessor, at least on paper. It also achieves that at a similar power use and with 50% more VRAM to boot!
The Future of Gaming graphics
Currently we do not know much about the upcoming architecture called RDNA 3. We do know it is coming for sure and that AMD are going for another 50% improvement in performance-per-watt. Knowing their recent success, they may even go above that, or so we hope!
Either way, expect another article covering RDNA 3 when it comes out, as well as CDNA eventually.
I want to thank @Nemez for helping with some architectural details for this article.
The articles content, opinions, beliefs and viewpoints expressed in SAPPHIRE NATION are the authors’ own and do not necessarily represent official policy or position of SAPPHIRE Technology.