CACHE MEMORY EXPLAINED: A DEEP DIVE INTO L1, L2, AND L3 CACHES

Table of Contents

Introduction

In the race for faster computing, the CPU needs quick access to frequently used data. However, retrieving data directly from RAM (main memory) is too slow compared to the CPU’s processing speed. This is where cache memory steps in as the high-speed bridge between the CPU and RAM.

Memory

Cache Memory

Cache memory is a high speed memory which reside in CPU that stores frequently used data to speed up processing. CPU checks the cache before accessing RAM. RAM is very slow compared to cache memory.

Typically cache consist of L1, L2, L3 caches. Among these three caches L1 is faster with 1 to 5 cycles latency ( Latency is the time delay between when an operation is requested and when it is completed. Lower latency = Faster processing = Higher performance. ), L2 have 5 to 20 cycles latency and L3 have 30 to 100 cycle latency .

Typically RAM has latency of about 100 to 200 cycles. you must be wondering about what is cycles. cycles is generally represent CPU cycle. It is like a pulse of the processor. CPU operates at a speed which is measured in hertz(Hz). \[1Hz = 1 cycle per second\]  In our example intel core i9 processor runs at typically 5GHz(5 billion cycles per second). So, \[1 cycle =\frac{1}{5,000,000,000} seconds = 0.2 nanoseconds\] \[1 nanosecond = 1 billionth of a second = 10^{-9} seconds\]  let suppose L1 cache take \[4 cycles =  4 \times (\frac{1}{5,000,000,000}) seconds\]  \[1 second = 10^9 nanosecond\]  then, \[4 cycles =  4 \times (\frac{1}{5,000,000,000}) \times 10^9 nanosecond  = 0.8 nanosecond\]  A 5GHz CPU means it completes 5 cycles in 1 nanoseconds. To convert CPU cycle to nanosecond \[time (ns)=\frac{(cycles \times 10^9)}{(CPU frequency in Hz)}\].

L1 Cache

L1 cache is called Level 1 cache it is fast but provide only 64KB per core. It is divided into 2 cache L1D (L1 data cache) and L1I (L1 instruction cache). L1D used to store actual program data and L1I is used to store instructions for the CPU which is processing right now. Both provide 32 KB per core so total 64KB in service per core from L1 cache.

L2 Cache

L2 cache is level 2 cache slower than L1 but larger than L1, 2 MB per performance core(p-core) and 4 MB shared per 4 Efficiency core (e-core). It store recent data and instruction that are likely to be used again. core are simply an independent processing unit inside a CPU that can execute instruction. Modern CPU has multiple cores used for smoothen and fasten increased task like gaming, high end workstation and servers. In intel core i9 processor there are 24 cores from which 8 are performance cores(P-cores) and 16 are efficiency cores (E-cores).

L3 Cache

L3 or level 3 cache used as a backup for L1 and L2 caches. It store commonly used data from all cores. In intel core i9 processor L3 cache is of 36 MB, which is shared across all 8 p-cores and 16 e-cores.

When CPU needs data it checks for L1 cache, if data is found it is called cache is hit and data is used immediately. If data is not in L1, it checks for L2, if data still not in L2 it goes for L3 if it is not found there it goes for RAM then SSD/HDD.

Cache Mapping

Cache mapping techniques are there to decide which cache is suitable to store different data from RAM so that it will run efficiently. Generally there are 3 main types mapping are there to define how a memory block from RAM is placed in the cache so that CPU can access it efficiently.
1. Direct-mapped cache
2. Fully associative cache
3. Set-associative cache

Direct Mapped Cache

In this technique each block of RAM can only map to one fixed location in cache. cache index determined by remainder after division of memory block address and number of cache block. Cache thrashing occurs if multiple memory blocks map to the same location. It is fast but inefficient because only one block can exist per cache index. It provide low cost cache design.

Fully Associative Cache

Here any block from RAM can map into any cache block. No fixed mapping is happening so less cache conflict occurs. Hit rate remain higher than direct mapped cache. CPU have to check all the blocks so speed is less and implementation of hardware is expensive. This is used in L1 cache.

Set-Associative Cache

It is a hybrid of direct mapped cache and fully associative cache. Here cache is divided into sets, and each sets have multiple blocks. A RAM block can be stored in any block within a set but not anywhere in the entire cache. Suppose we have 4 way set associative cache, each set has 4 blocks. A ram block can go any of the 4 blocks in its set and cache set is determined by remainder of division of memory block address and number of sets. Reduce cache conflict and faster than fully associative cache. Typically used in L2 and L3 cache.

Cache Organization

Cache entry address is divide into three part to compare if data is valid, determines location and find exact data. There is unique identifier for data block which identifies the RAM block is called TAG, in a 32 bit address, TAG is range from bit 31 to bit 12 in whole 32 bit. Bit range 11 to 6 is called INDEX which identifies which set the data belongs to or location of set in cache. Bit 5 to 0 is called OFFSET which identifies the exact byte within the block or find exact data. If a new memory block needs to be stored and there is no space available in cache then CPU must replace the block that has not been used for longest time called LRU(least recently used). Other techniques like FIFO and RANDOM are there but LRU used by most CPU to improve performance. There are two types of cache write policies,  write through and write back. In write through data is written in both cache and RAM simultaneously. In write back data is written only to cache first and later into RAM. Most modern CPUs use write back policy.

Cache Metrics

\[Cache hit ratio = \frac{cache hit}{ total accesses} \times 100\]

high hit ratio gives better performance.
Cache miss penalty = Time wasted during fetching from RAM when cache miss occurs.
Lower latency = Faster cache
High bandwidth = More data transferred per second.

Conclusion

High cache hit rate improve speed, larger cache reduce ram dependency. improve frame rate in games faster matrix multiplication in machine learning, speed up 3d rendering. If you find this article helpful, please write to us.

Leave a Reply

Scroll to Top

Discover more from Orlyset

Subscribe now to keep reading and get access to the full archive.

Continue reading