# Caches: Coherence, Synchronization, Consistency

## **Outline for Today**

- Revisit the cache hierarchy
- Describing snooping coherence protocol
- Implementing fences in hardware
- Highlighting relaxed memory models







### Chat with your neighbor!

- What about this cache hierarchy works?
- What about this cache hierarchy doesn't work?

## hierarchy works? hierarchy doesn't work?

















### Definitions

- Coherence: what value should be returned by a read?
- Synchronization: how can software reason about the state of data?
- Consistency: in what order can memory operations occur?

### Cache A

0x1000

shared bus



### Cache B

0x1000





### Cache B

0x1000

shared bus





### Cache B

shared bus



### Cache A

0x1000

shared bus



#### Cache B

0x1000











#### Cache B

0x1000

shared bus

#### Memory (super slow!)





### Memory (super slow!)







shared bus

#### Memory (super slow!)



### Memory (super slow!)

#### Cache B

0x1000

shared bus

- Scheme is called *write-invalidate*
- The consistency model from this scheme is called *write serialization*
- "Snoops" are messages across the shared bus



## **Snooping Coherence (from CPU)**

modified

read data in local cache

place read miss on bus; replace data; writeback on bus

write data in local cache

writeback block; place write miss on bus

read hit

read miss

write hit

write miss

shared invalid read data in local n/a cache place read miss on place read miss on bus; replace data bus place invalidate on bus; transition to n/a modified state place write miss on place write miss on bus bus



modified

read data and p on bus; writeba data; change shared state

n/a

abort other mer operation; writeback da change to inva state

read miss (snoop)

invalidate

write miss (snoop)

| e (from bus)            |                                                    |         |
|-------------------------|----------------------------------------------------|---------|
|                         | shared                                             | invalid |
| olace<br>ack<br>to<br>e | allow shared cache<br>or memory to<br>service miss | n/a     |
|                         | change to invalid<br>state                         | n/a     |
| mory                    |                                                    |         |
| ;<br>ata;<br>valid      | change to invalid<br>state                         | n/a     |
|                         |                                                    |         |







achine

- Protocol called MSI (Modified, Shared, Invalid)
- Less bus traffic in MESI (Modified, Exclusive, Shared, Invalid) protocol
- Even less bus traffic in MOESI (Modified, Owned, Exclusive, Shared, Invalid) protocol



## Chat with your neighbor!

- What are the advantages of snooping?
- What are the disadvantages of snooping?

How would you describe the consistency model of snooping coherence?

### Coherence, Synchronization, Consistency

- We have a protocol to maintain multiple copies of data coherently
- We have described how to use coherence to implement write serialization
- We do not have a way for the processor to use the memory hierarchy to implement synchronization features (fences, atomic ops, etc.)
- We do not have any other way to reason about consistency in the memory hierarchy

### **Synchronization: Fences**

sw x0, O(x10) # dcache miss, l2 miss, l3 hit sw x0, O(x11) # dcache hit sw x0, 0(x12) # dcache miss, 12 hit

### If we use coherence for consistency, in what order are these operations going to appear in memory?

### **Synchronization: Fences**

sw x0, 0(x10) # cache miss, I2 miss, I3 hit sw x0, O(x11) # dcache hit

**FENCE** 

sw x0, 0(x12) # dcache miss, 12 hit

Ensures that all memory operations before the fence happen before the memory operations after the fence



### Synchronization: Fences



| hue |
|-----|
| DU3 |
|     |
|     |

## Synchronization

- We can leverage protocols in the memory hierarchy to expose high-level programming semantics to the application
  - Fences
  - Conditional operations
  - Atomic operations
  - Increment operations
  - etc...

## Synchronization: Fences (Summary)

- Memory operations can appear out of order in the cache hierarchy due to the cache state (e.g., miss then hit, etc.)
- Fences are instructions (software) that provide order to memory operations (hardware)
- We can use coherence with the store buffer to implement fences! Not how it's done in gem5...

## **Memory Consistency**

- Programs may not be written with the most efficient memory ordering...
- What if the hardware could just finish operations whenever they were ready?
- Software could do all coordination (i.e., fences everywhere its needed, etc.)

- May this is a bit extreme...
- Relax consistency defines different ways in which hardware will re-order operations! Software developers will write architecture specific applications based on
- consistency models

## Memory Consistency

- Operations: R->W, R->R, W->W
- What if we allow some of these orderings to occur out of order?
- Total store order (TSO): W->R may appear as R->W
- Partial store order (PSO): TSO and W1->W2 may appear W2->W1
- Weak ordering: all operations may appear in any order

### Memory Consistency



### Summary

- We covered the cache hierarchy for multiprocessors
- We defined coherence
- We described snooping, and built a snooping-based coherence protocol
- We used fences as a case study for how the processor and memory hierarchy allows software to implement synchronization
- We defined different memory consistency models, which allow for varying degrees of reordering