WTF?/AHA! < 1. Let me know in the issue tracker of the repository for this article how we did.
- Reads are not reordered with other reads.
- Writes are not reordered with older reads.
- Writes to memory are not reordered with other writes (with some exceptions)
- Reads may be reordered with older writes to different locations but not with older writes to the same location.
SeqCstmemory ordering later on, just make a note of it for now.
Sharedhow do the other cores know that their L1 cache is invalid?
Sharedstate) and is modified there. To actually inform the other cores that their cached data is invalid, there must be some way for the cores to communicate with each other, right?
Atomicfor example), I find it useful to imagine an observer-core. This observer core is interested in the same memory as we perform atomic operations on for some reason, so we'll divide each model in two: how it looks on the core it's running and how it looks from an observer-core.
Relaxedload/stores with each other. The observer core might observe operations in a different order than the "program order" (as we wrote them). They will however, always see
Relaxedoperation-B if we wrote them in that order.
Relaxedis therefore the weakest of the possible memory orderings. It implies that the operation does not do any specific synchronization with the other CPUs.
Acquireaccess stays after it. It's meant to be paired with a
Releasememory ordering flag, forming a sort of a "memory sandwich". All memory access between the load and store will be synchronized with the other CPUs.
Acquireoperation, which forces the current core to process all messages in its mailbox (many CPUs have both serializing and memory ordering instructions). A memory fence will most likely also be implemented to prevent the CPU from reordering memory access before the
Acquireload. The Acquire operation will therefore be synchronized with modifications on memory done on the other CPUs.
loadoperation, it doesn't modify memory, there is nothing to observe.
Acquireload, it will be able to see all memory operations happening from the
Acquireload, and to the
Releasestore (including the store). This means there must be some global synchronization going on. Let's discuss this a bit more in when we talk about
Acquireis often used to write locks, where some operations need to stay after the successful acquisition of the lock. For this exact reason,
Acquireonly makes sense in load operations. Most
atomicmethods in Rust that involve stores will panic if you pass in
Acquireas the memory ordering of a
Acquire, any memory operation written before the
Releasememory ordering flag stays before it. It's meant to be paired with an
Acquirememory ordering flag.
Releaseoperation so that they happen after it. There is also a guarantee that all other cores which do an
Acquiremust see all memory operations after the
Acquireand before this
Acquireaccess, or after a
Releasestore. This basically leaves us with two choices:
Acquireload must ensure that it processes all messages and if any other core has invalidated any memory we load, it must fetch the correct value.
Releasestore must be atomic and invalidate all other caches holding that value before it modifies it.
SeqCstbut also more performant.
Acquireload of the memory. If it does, it will see all memory which has been modified between the
Release, including the
Releaseis often used together with
Acquireto write locks. For a lock to function, some operations need to stay after the successful acquisition of the lock, but before the lock is released. For this reason, and opposite of the
loadmethods in Rust will panic if you pass in a
AtomicBool::compare_and_swapis one such operation. Since this operation both loads and stores a value, this could matter on weakly ordered systems in contrast to
SeqCststands for Sequential Consistency and gives the same guarantee that
Acquire/Releasedoes but also promises to establish a single total modification order.
SeqCstmight fail in upholding it's guarantees:
SeqCstand not the theoretical foundations of it from now on.
Acquire/Releasewill cover most of your challenges, at least on a strongly ordered CPU.
SeqCstin contrast to an
Acquire/Releasewill look like this:
Releasememory ordering is
movb %al, example::X.0.0(%rip). We know that on a strongly ordered system, this is enough to make sure that this is immediately set as
Invalidin other caches if they contain this data.
The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.
Releasere-iterates on that and states:
188.8.131.52 Loads May Be Reordered with Earlier Stores to Different LocationsThe Intel-64 memory-ordering model allows a load to be reordered with an earlier store to a different location. However, loads are not reordered with stores to the same location.
Acquire/Releasedoesn't prevent this.
xchgb %al, example::X.0.0(%rip). This is an atomic operation (
xchghas an implicit
xchginstruction is a locked instruction (when it refers to memory) it will make sure that all cache lines on other cores referring to the same memory are locked when the memory is fetched and then invalidated after the modification. In addition it works as a full memory fence which we can derive from chapter 184.108.40.206 in the Intel Developer Manual:
220.127.116.11 Loads and Stores Are Not Reordered with Locked InstructionsThe memory-ordering model prevents loads and stores from being reordered with locked instructions that execute earlier or later. The examples in this section illustrate only cases in which a locked instruction is executed before a load or a store. The reader should note that reordering is prevented also if the locked instruction is executed after a load or a store.
SeqCstis the strongest of the memory orderings. It also has a slightly higher cost than the others.
std::sync::atomicmodule gives access to some important CPU instructions we normally don't see in Rust:
AtomicUsize, the compiler actually changes the instructions it emits for the addition of the two numbers on the CPU. The assembly would instead look something like (an AT&T dialect)
lock addq ..., ...instead of
addq ..., ...which we'd normally expect.
lockinstruction prefix do?
Modifiedalready when the memory is fetched from the cache.
Modified. The processor uses its cache coherence mechanism to make sure the state is updated to
Invalidon all other caches where it exists - even though they've not yet processed all of their messages in their mailboxes yet.
lockedinstruction (and other memory-ordering or serializing instructions) involves a more expensive and more powerful mechanism which bypasses the message passing, locks the cache lines on the other caches (so no load or store operation can happen when while it's in progress) and sets them as invalid accordingly, which forces the cache to fetch an updated value from memory.
std::sync::atomicmodule at all in your daily life.