What is cache, and why do CPUs, GPUs, and other kinds of processors have it? Here’s everything you need to know
Although cache isn’t talked about as much as cores, RAM (Random Access Memory), or VRAM, you’ve probably heard about it before, especially recently. AMD proudly advertises gaming performance for its Ryzen CPUs with 3D V-Cache as a result of using cache, and one of Intel’s biggest improvements with its 13th-generation Raptor Lake CPUs was adding more cache.
But how can cache improve performance when it measures in terms of megabytes? Even the cheapest RAM kits come with 16GB these days, so how can adding just a few extra megabytes of cache make such a big difference to performance? Well, cache isn’t your normal type of memory.
Cache: A tiny amount of high-speed memory
Cache is actually a pretty recent development in processors, dating back to the 1990s, and it was invented because of RAM. RAM is a key component in computers that stores a significant amount of data that processors (like CPUs and GPUs) are expected to need fairly often. For a long time, improvements in RAM performance kept pace with improvements in CPU performance, but by the 1990s, it was becoming obvious that RAM wouldn’t be able to keep up with the latest CPUs. RAM had lots of capacity, but the transfer speeds were just too slow.
Here’s where cache comes in. It’s not nearly as large as RAM either physically or by capacity, but it’s inside the processor itself and can transfer data very quickly and at very low latency. As long as cache is storing the data the processor actually needs, it can save time because asking RAM for the same data is many times slower. It was a great solution to the RAM problem and allowed CPU designers to continue making faster CPUs and RAM designers to keep making larger capacities of RAM without needing to worry as much about performance. Today, cache is in pretty much every kind of processor.
You might be wondering why cache is so tiny though. Well, it mostly has to do with space and money. Even 32MB of cache can take up quite a bit of space on a processor, and modern chips are limited to roughly 600mm2 in total area, which has to be used wisely. That means dedicating more area to cache can get pretty expensive, and that situation is actually getting worse, not better. The latest manufacturing processes are resulting in smaller and smaller improvements in cache density, and TSMC failed to reduce the size of cache at all in the first iteration of its 3nm process.
Cache levels and the memory hierarchy
The invention of cache meant there was a new layer to all data storage devices in a computer. These layers form what is called the memory hierarchy, which you can see in the image above, and it details what memory goes where in a typical system within a CPU (though other kinds of processors will look very similar). Today, the modern memory hierarchy doesn’t just include cache, RAM, and permanent storage devices but also a memory hierarchy within the cache itself.
Most processors have different levels of cache for various purposes. The first and smallest level of cache is L1, which is given individual cores for processing data that is needed immediately. L1 cache is often measured in kilobytes, with the latest Ryzen 7000 CPUs having 64KB of L1 cache per core. Additionally, the modern L1 cache is often further divided into L1I (for instructions) and L1D (for data).
Next up is L2, which is for a group of cores rather than individual ones. Naturally, the L2 cache is larger than the L1 cache, often by an order of magnitude, but being much larger and having to service more cores means it’s slower and has higher latency. Some processors, particularly GPUs and slower CPUs, will only go up to L2 cache.
The next step is L3, which is generally used by all cores on the chip. Its size can vary from a few times bigger than the L2 cache to more than an order of magnitude larger, depending on the processor. This means it’s even slower than L2 cache but still outperforms RAM. Additionally, the L3 cache also often acts as a “victim cache,” which is where data evicted from L1 and L2 cache go. It might be further evicted from the L3 cache if it’s unneeded. Today, the L3 cache is particularly important to AMD on account of its chiplet technology. Ryzen 3D V-Cache chips contain 64MB of L3 cache, and RX 7000 Memory Cache Dies (or MCDs) contain 16MB of L3 cache each.
The highest level of cache seen on most processors is L4, which is often so big that it’s effectively RAM. In fact, the latest CPUs to use L4 cache are Intel’s Sapphire Rapids Xeon chips, which use HBM2 as an L4 cache on top-end models. AMD, on the other hand, has never used an L4 cache and instead is content to enlarge its L3 cache to high capacities by adding more CPU and V-Cache chiplets. An L4 cache typically benefits integrated GPUs more, as it’s an on-die solution that can share data between the CPU and the integrated GPU.
In some chipsets, primarily mobile ones, there is another type of cache: the system-level cache (SLC). This cache is then used across the entire chipset, such as the GPU, NPU, and CPU. A cache can replace the need for requests to main memory, so an SLC benefits the entire SoC.
Cache is necessary but doesn’t improve performance on its own
Despite all the hype surrounding recent innovations in cache, it’s not a silver bullet for performance. After all, there’s no processing capability in cache; it just stores data, and that’s it. Although every processor can absolutely benefit from having more cache, it’s often too expensive to add more than exactly the amount needed. Adding more cache might not even improve performance depending on the workload, which is a further incentive not to put a ton on a processor.
That being said, being able to add a large amount of cache can be desirable in certain situations. CPUs with lots of cache tend to perform better in games, for example. AMD’s Ryzen CPUs with 3D V-Cache are pretty fast for gaming despite having a lower frequency than chips without V-Cache, and Intel’s 13th-generation CPUs are significantly faster than 12th-generation chips, with the only major improvement being an enlarged cache.
Ultimately, cache exists so processors can bypass RAM as often as possible and performance can be as unrestrained as possible. CPU designers have to balance cache capacity with size and, by extension, cost, which is getting more difficult with each generation of new manufacturing processes. Even though new ways of adding cache to processors are being introduced decades after cache was invented, it’s hard to imagine the purpose of this key component of processors will ever change.