# Cache performance and associativity

# Write through vs write back

Write through: every time data is changed in cache, change is done to lower level in hierarchy

Pro: Data are kept consistent (and don't have to write on evict)

Con: Seems slower (potentially lots of writes to main mem)

Write back: changes to lower level in hierarchy are only done when data is evicted from cache

Pro: Potentially fewer writes to main memory

Con: Consistency/complexity issues

# Cache controller

Address (showing bit positions) 63 62 .... 13 12 11 .... 2 1 0 offset 10 52 Tag Data Index Valid Tag Data Index 2 1021 1022 1023 52 32

P&H fig. 5.10

What happens to the pipeline when a cache miss occurs?

#### Write buffers

A way to hide the cost of writing to lower level of hierarchy

Example: instruction sw t1, 4(a0) in write-through cache

- 1. Write to cache and write to write buffer happen immediately (simultaneously, 1 cycle)
- 2. Rest of execution can happen at the same time that write to main memory is happening from buffer

What happens if data that has been evicted from the cache is waiting in the write buffer and a read instruction for that address executes?

How many physical bits of space do we need to store our IKB cache?

How do we measure the performance of a processor that uses caching?

### Formulas from P&H 4.3

CPU time = (CPU execution clock cycles + Memory-stall clock cycles)
× Clock cycle time

Memory-stall clock cycles = (Read-stall cycles + Write-stall cycles)

 $Read-stall\ cycles = \frac{Reads}{Program} \times Read\ miss\ rate \times Read\ miss\ penalty$ 

$$\begin{aligned} \text{Write-stall cycles} &= \left( \frac{\text{Writes}}{\text{Program}} \times \text{Write miss rate} \times \text{Write miss penalty} \right) \\ &+ \text{Write buffer stalls} \end{aligned}$$

 $Memory\text{-stall clock cycles} = \frac{Memory\ accesses}{Program} \times Miss\ rate \times Miss\ penalty$ 



#### Effect of algorithm on CPU time

128 256 512 1024 2048 4096

Size (K items to sort)

b.



P&H fig. 5.19

## Increasing block size has limited effects



### Set-associative caches

#### One-way set associative (direct mapped)

| Block | Tag | Data |
|-------|-----|------|
| 0     |     |      |
| 1     |     |      |
| 2     |     |      |
| 3     |     |      |
| 4     |     |      |
| 5     |     |      |
| 6     |     |      |
| 7     |     |      |

#### Two-way set associative

| Set | Tag | Data | Tag | Data |
|-----|-----|------|-----|------|
| 0   |     |      |     |      |
| 1   |     |      |     |      |
| 2   |     |      |     |      |
| 3   |     |      |     |      |
| 9   |     |      |     |      |

#### Four-way set associative

| Set | Tag | Data | Tag | Data | Tag | Data | Tag | Data |
|-----|-----|------|-----|------|-----|------|-----|------|
| 0   |     |      |     |      |     |      |     |      |
| 1   | 15  |      |     |      |     |      |     |      |

#### Eight-way set associative (fully associative)

| Tag | Data |
|-----|------|-----|------|-----|------|-----|------|-----|------|-----|------|-----|------|-----|------|
|     |      |     |      |     |      |     |      |     |      |     |      |     |      |     |      |

P&H fig. 5.15