AMD’s Graphics Core Next (shortened to GCN) architecture has been a mainstay for several years now. It has proven itself to be quite the tough and scalable design, and versions of it are in use with the Playstation and Xbox One as well as many computers.
Obviously, though, no single architecture can survive without refinements and changes being made over time. So today we will be taking a look at just such improvements, with node shrinks and process optimizations left aside. While those matter too, they aren’t strictly architectural per se.
A good overview of the general improvements GCN has received.
GCN 1 – The Original
Seen in: Oland, Cape Verde, Tahiti, Pitcairn
Basically the entire 7000 series from the HD 7730 and up and barring the HD 7790 and then R9 280 and 280X.
Debuting with the first 28nm video card, the AMD HD 7970, this was a pivotal moment for AMD and GPUs in general. The chip was manufactured based on a new process, featuring the first entirely new architecture since 2006. It was also the first DX11.1 GPU (which makes it work fine with Vulkan and DX12, might I add) and AMD’s entrance into the Compute market. The very core of GCN 1 is still in use in refined form, and can be found in all subsequent versions.
The generation that preceded GCN, VLIW was a graphics powerhouse, but it had real issues with compute-heavy workloads. Sure, it could do some of them proficiently, but the more serial or complex tasks would run very slowly. It also had underutilization issues with some newer game titles (partially solved in its last iteration). Graphics Core Next, however, fully addressed the compute issues whilst being a great gaming design as well. On release, it was more efficient and faster in every way to Cayman’s VLIW4, but that speed came with a price: drivers couldn’t keep pace, which lessened the anticipated gains. GCN is a very complex and powerful architecture, so software-wise it took (and is still taking with each new release) a long time to fully mature.
GCN 2
Seen in: Bonaire, Hawaii
HD 7790, R7 260, R7 260X, R9 290, 290X, R9 295X2, R9 390, R9 390X
This is the optimization phase for GCN. Many new features were added and revamped, chip leakage was reduced and die-size optimization reached its peak. One of the many additions was True Audio, which might as well be called ‘Awesome audio’, even if it is, unfortunately, seeing limited use in current games. The theoretical geometry performance was almost doubled with four geometry engines (instead of two in GCN 1) and the number of render output units (ROPs) was also increased. Many software and BIOS updates as well as improved cache all somewhat aid tessellation performance and extend the staying power of second-generation GCN cards. Other, more minor changes include a modified ALU design so the FP64 ratio could be increased to 1:2, while the memory subsystem is no longer decoupled from the ROP partitions, which helps boost compute workloads and fully utilize the GPU in gaming and other 2D/3D scenarios.
AMD’s PowerTune also received an update with better, fine-grained control, which helps keep the clocks up and power consumption in check.
Finally, GCN 2 marks the support for FreeSync and the as yet unreleased FreeSync 2, which will be supported on GCN 2 and up. All versions of Graphics Core Next after this one also support this technology, including the current RX 500 series Polaris Refresh and future Vega.
GCN 3
Seen in: Fiji and Tonga
R9 285, R9 380, R9 380X, R9 Fury, R9 Fury X, R9 Nano, R9 Fury Pro Duo
Featuring all the advancements of the previous generations, the newest and probably biggest addition to GCN to date is the new Lossless Delta Colour Compression technology. For a small penalty to die size, it improves the bandwidth of the GPU by around 20% on average, allowing for lesser power usage and improved, smaller and more efficient memory busses as well as better-fed ROPs. Along with this technology, better caching and improved tessellation were introduced. By reusing certain primitives in a scene, GCN 3 GPUs see a reasonable uplift in some tough-on-tessellation scenarios, such as in Fallout 4 with its God Rays. God ray simulation is tessellation-heavy, but by reusing certain information the GPU can more easily deal with the tiny triangles while losing no fidelity. The Unified Video Decoder was also enhanced.
GCN 3’s crowning achievement, though, is the world’s first use of High Bandwidth Memory (HBM) on a consumer-grade product. Debuted with the Fury X and also used on the premium R9 Fury and R9 Nano, it is a revolutionary technology. It offers unprecedented amounts of raw bandwidth with lower access latency and much improved power draw. Its only problem was that in its first generation it was limited to 4GBs in total, something that the second and third generations of this memory technology have fixed (and can be seen on the upcoming RX Vega and GP100). Memory amount is usually not an issue though, due to AMD’s good memory allocation, which became a priority for the company after Fiji’s release.
GCN 4 (Polaris)
Seen in: Polaris 10, 11, 12, 20
RX 460, RX 470, RX 480, RX 540, RX 550, RX 560, RX 570, RX 580
This is currently the most advanced version of GCN. It features an improved Delta Color Compression system (around 36% effective extra memory bandwidth) and better L2 cache which helps the smaller RX 480 to beat out the powerful R9 290/290X despite being equipped with much smaller bus-width, fewer cores and only half the ROPs.
The other major addition is the so-called Primitive Discard Accelerator, which discards Zero area triangle primitives and greatly helps with some Anti-Aliasing methods and extreme tessellation scenarios. Things such as Hairworks in the Witcher 3 run better on Polaris than they did on previous GCN versions.
Last but certainly not least, Polaris has full support for Display Port 1.4 and HDMI 2.0b. This makes it a very viable choice for both home theater PCs and high refresh-rate gaming at high resolutions.
With the Polaris refresh seen in the 500 series, AMD included a small but neat little update – an extra power state for memory which can help a bit with overall everyday power consumption.
GCN 5, the future with RX Vega
Seen in: Vega 11, Vega 10
Vega Frontier Edition, RX Vega and more!
This new design is built around the ‘Next-Generation Compute Unit’ (NCU) and is expected to increase instructions per clock as well as offer higher clock speeds in general. The NCU is a refinement of GCN’s compute units and not a completely new design. Vega also supports High Bandwidth Memory 2 (HBM2), which can offer (if used with 4 stacks!) a mind-boggling 1 terabytes per second of pure bandwidth and 32 GBs of VRAM. Whether its full capacity will be used on new cards isn’t yet known. A new, larger memory address space, and the High Bandwidth Cache Controller are also slated to appear.
Additionally, the new GPUs are expected to include improvements in the Rasterization and Render output units. This should in theory greatly help Virtual Reality and high resolution and refresh rate gamers. The stream processors are heavily modified from the previous generations to support packed math Rapid Pack Math technology for 8-bit, 16-bit, and 32-bit numbers. With this, there is a significant performance advantage when lower precision is acceptable (for example: processing two half-precision numbers at the same rate as a single precision number). In fact, it almost doubles the effective GPU power if used correctly and can lead to tremendous gains in performance!
This technology is for now unused in games, but it will be of great service to gamers down the line as well as content creators.
A look at the future
There are many unknowns about Vega, which is set to be the biggest revision GCN has ever gotten. We also have almost no idea what AMD will be introducing with Navi. GCN is a storied architecture with impressive engineering prowess. Sure, the core behind the beast is the same, but many enhancements and improvements have added up over time, making for a faster, cheaper, more efficient and feature-rich product.