E=MC^2
.WTF?/AHA! < 1
. Let me know in the issue tracker of the repository for this article how we did.
Reads are not reordered with other reads. Writes are not reordered with older reads. Writes to memory are not reordered with other writes (with some exceptions)
Reads may be reordered with older writes to different locations but not with older writes to the same location.
SeqCst
memory ordering later on, just make a note of it for now.&
shared references, and &mut
exclusive references.mutable
and immutable
references, since this is not true all the time. Atomics and types which allow for interior mutability do indeed break this mental model. Think of them as exclusive
and shared
instead.E
and S
, two of the possible states data can have in the L1 Cache. Modelling this in the language can (and does) provide the possibility of optimizations which languages without these semantics can't do.Exclusive
can be modified by default.Shared
references, all Rust programs can assume that the L1 cache on the core they're running on is up to date and does not need any synchronization.Shared
how do the other cores know that their L1 cache is invalid?Shared
state) and is modified there. To actually inform the other cores that their cached data is invalid, there must be some way for the cores to communicate with each other, right?Shared
.Shared
again.Shared
cache line, it forces the other cores to invalidate their corresponding cache line before the value is actually modified. such CPUs often have a cache coherency mechanism which changes the state of the caches on each core.Atomic
for example), I find it useful to imagine an observer-core. This observer core is interested in the same memory as we perform atomic operations on for some reason, so we'll divide each model in two: how it looks on the core it's running and how it looks from an observer-core.Atomic
, using a pointer is easier.Relaxed
load/stores with each other. The observer core might observe operations in a different order than the "program order" (as we wrote them). They will however, always see Relaxed
operation-A before Relaxed
operation-B if we wrote them in that order.Relaxed
is therefore the weakest of the possible memory orderings. It implies that the operation does not do any specific synchronization with the other CPUs.Relaxed
and get the same result as you do with Acquire/Release
this is the reason. However, it's important to try to understand the "abstract machine" model and not only rely on experience you get by running experiments on a strongly ordered CPU. Your code might break on a different CPU.Acquire
access stays after it. It's meant to be paired with a Release
memory ordering flag, forming a sort of a "memory sandwich". All memory access between the load and store will be synchronized with the other CPUs.Acquire
operation, which forces the current core to process all messages in its mailbox (many CPUs have both serializing and memory ordering instructions). A memory fence will most likely also be implemented to prevent the CPU from reordering memory access before the Acquire
load. The Acquire operation will therefore be synchronized with modifications on memory done on the other CPUs.Program synchronization can also be carried out with serializing instructions (see Section 8.3). These instructions are typically used at critical procedure or task boundaries to force completion of all previous instructions before a jump to a new section of code or a context switch occurs. Like the I/O and locking instructions, the processor waits until all previous instructions have been completed and all buffered writes have been drained to memory before executing the serializing instruction.The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way of ensuring load and store memory ordering between routines that produce weakly-ordered results and routines that consume that data.MFENCE — Serializes all store and load operations that occurred prior to the MFENCE instruction in the program instruction stream.
MFENCE
is such an instruction. You'll find documentation for it throughout chapter 8.2 and 8.3 of Intels Developer Manual. You won't see these instructions too often on strongly ordered processors since they're rarely needed, but on weakly ordered systems they'll be critical to be able to implement the Acquire/Release
model used in the abstract machine.Acquire
is a load
operation, it doesn't modify memory, there is nothing to observe.Acquire
load, it will be able to see all memory operations happening from the Acquire
load, and to the Release
store (including the store). This means there must be some global synchronization going on. Let's discuss this a bit more in when we talk about Release
.Acquire
is often used to write locks, where some operations need to stay after the successful acquisition of the lock. For this exact reason, Acquire
only makes sense in load operations. Most atomic
methods in Rust that involve stores will panic if you pass in Acquire
as the memory ordering of a store
operation.Acquire
operation to happen before the Acquire
operation.Acquire
, any memory operation written before the Release
memory ordering flag stays before it. It's meant to be paired with an Acquire
memory ordering flag.Release
operation so that they happen after it. There is also a guarantee that all other cores which do an Acquire
must see all memory operations after the Acquire
and before this Release
.Acquire
access, or after a Release
store. This basically leaves us with two choices:Acquire
load must ensure that it processes all messages and if any other core has invalidated any memory we load, it must fetch the correct value.Release
store must be atomic and invalidate all other caches holding that value before it modifies it.SeqCst
but also more performant.Acquire
load of the memory. If it does, it will see all memory which has been modified between the Acquire
and the Release
, including the Release
store itself.Release
is often used together with Acquire
to write locks. For a lock to function, some operations need to stay after the successful acquisition of the lock, but before the lock is released. For this reason, and opposite of the Acquire
ordering, most load
methods in Rust will panic if you pass in a Release
ordering.Shared
value are invalidated in all L1 caches where the value is present before it is modified. That means that the Acquire
load will already have an updated view of all relevant memory, and a Release
store will instantly invalidate any cache lines which contain the data on other cores.AtomicBool::compare_and_swap
is one such operation. Since this operation both loads and stores a value, this could matter on weakly ordered systems in contrast to Relaxed
operations.Acquire
and Release
paragraphs, the same applies here.SeqCst
stands for Sequential Consistency and gives the same guarantee that Acquire/Release
does but also promises to establish a single total modification order.SeqCst
has been critiqued for being promoted as the recommended ordering to use and I've seen several objections about using it since it seems to be hard to prove that you have a good reason for using it. It has also been criticized for being slightly broken.SeqCst
might fail in upholding it's guarantees:SeqCst
and not the theoretical foundations of it from now on.Acquire/Release
will cover most of your challenges, at least on a strongly ordered CPU.SeqCst
in contrast to an Acquire/Release
operation.Acquire/Release
will look like this:Release
memory ordering is movb %al, example::X.0.0(%rip)
. We know that on a strongly ordered system, this is enough to make sure that this is immediately set as Invalid
in other caches if they contain this data.The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.
Release
re-iterates on that and states:... In particular, all previous writes become visible to all threads that perform anAcquire
(or stronger) load of this value.
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different LocationsThe Intel-64 memory-ordering model allows a load to be reordered with an earlier store to a different location. However, loads are not reordered with stores to the same location.
Acquire/Release
doesn't prevent this.xchgb %al, example::X.0.0(%rip)
. This is an atomic operation (xchg
has an implicit lock
prefix).xchg
instruction is a locked instruction (when it refers to memory) it will make sure that all cache lines on other cores referring to the same memory are locked when the memory is fetched and then invalidated after the modification. In addition it works as a full memory fence which we can derive from chapter 8.2.3.9 in the Intel Developer Manual:8.2.3.9 Loads and Stores Are Not Reordered with Locked InstructionsThe memory-ordering model prevents loads and stores from being reordered with locked instructions that execute earlier or later. The examples in this section illustrate only cases in which a locked instruction is executed before a load or a store. The reader should note that reordering is prevented also if the locked instruction is executed after a load or a store.
load
happens after, for example, the release of a flag, we could observe that it actually changed before the Release
operation if we used Acquire/Release
semantics. At least in theory.Acquire/Release
guarantees, it also guarantees that no other memory operations, reads or writes, will happen in between.SeqCst
also gives some guarantees which we get by default on a strongly ordered CPU, most importantly a single total modification order.SeqCst
operations in the same order. Acquire/Release
does not give this guarantee. Observer-1 could see the two changes in a different order than observer-2 (remember the mailbox analogy).compare_and_swap
using Acquire ordering, and core-2 does the same on Y. Both do the same operations and then change the flag value back using Release
store.SeqCst
prevents this from happening.store
is immediately visible on all other cores, so the modification order is not a real issue there.SeqCst
is the strongest of the memory orderings. It also has a slightly higher cost than the others.std::sync::atomic
module gives access to some important CPU instructions we normally don't see in Rust:fetch_add
on AtomicUsize
, the compiler actually changes the instructions it emits for the addition of the two numbers on the CPU. The assembly would instead look something like (an AT&T dialect) lock addq ..., ...
instead of addq ..., ...
which we'd normally expect.load data
, modify it
and store data
.lock cmpxchgb %cl, example::LOCKED(%rip)
is the atomic operation we do in compare_and_swap
. lock cmpxchgb
is a locking operation; it reads a flag and changes its value if a certain condition is met.Synchronization mechanisms in multiple-processor systems may depend upon a strong memory-ordering model. Here, a program can use a locking instruction such as the XCHG instruction or the LOCK prefix to ensure that a read-modify-write operation on memory is carried out atomically. Locking operations typically operate like I/O operations in that they wait for all previous instructions to complete and for all buffered writes to drain to memory (see Section 8.1.2, “Bus Locking”).
lock
instruction prefix do?Modified
already when the memory is fetched from the cache.Modified
. The processor uses its cache coherence mechanism to make sure the state is updated to Invalid
on all other caches where it exists - even though they've not yet processed all of their messages in their mailboxes yet.locked
instruction (and other memory-ordering or serializing instructions) involves a more expensive and more powerful mechanism which bypasses the message passing, locks the cache lines on the other caches (so no load or store operation can happen when while it's in progress) and sets them as invalid accordingly, which forces the cache to fetch an updated value from memory.std::sync::atomic
module at all in your daily life.