Defined in header
<atomic> | ||
---|---|---|
enum memory_order { memory_order_relaxed, memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel, memory_order_seq_cst }; | (since C++11) |
std::memory_order
specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation. Absent any constraints on a multi-core system, when multiple threads simultaneously read and write to several variables, one thread can observe the values change in an order different from the order another thread wrote them. Indeed, the apparent order of changes can even differ among multiple reader threads. Some similar effects can occur even on uniprocessor systems due to compiler transformations allowed by the memory model.
The default behavior of all atomic operations in the library provides for sequentially consistent ordering (see discussion below). That default can hurt performance, but the library's atomic operations can be given an additional std::memory_order
argument to specify the exact constraints, beyond atomicity, that the compiler and processor must enforce for that operation.
Defined in header <atomic> |
|
---|---|
Value | Explanation |
memory_order_relaxed | Relaxed operation: there are no synchronization or ordering constraints, only atomicity is required of this operation (see Relaxed ordering below) |
memory_order_consume | A load operation with this memory order performs a consume operation on the affected memory location: no reads or writes in the current thread dependent on the value currently loaded can be reordered before this load. Writes to data-dependent variables in other threads that release the same atomic variable are visible in the current thread. On most platforms, this affects compiler optimizations only (see Release-Consume ordering below) |
memory_order_acquire | A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread (see Release-Acquire ordering below) |
memory_order_release | A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below). |
memory_order_acq_rel | A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable. |
memory_order_seq_cst | Any operation with this memory order is both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order (see Sequentially-consistent ordering below) |
Inter-thread synchronization and memory ordering determine how evaluations and side effects of expressions are ordered between different threads of execution. They are defined in the following terms:
Within the same thread, evaluation A may be sequenced-before evaluation B, as described in evaluation order.
Within the same thread, evaluation A that is sequenced-before evaluation B may also carry a dependency into B (that is, B depends on A), if any of the following is true.
std::kill_dependency
&&
, ||
, ?:
, or ,
operators.All modifications to any particular atomic variable occur in a total order that is specific to this one atomic variable.
The following four requirements are guaranteed for all atomic operations:
After a release operation A is performed on an atomic object M, the longest continuous subsequence of the modification order of M that consists of.
is known as release sequence headed by A.
Between threads, evaluation A is dependency-ordered before evaluation B if any of the following is true.
Between threads, evaluation A inter-thread happens before evaluation B if any of the following is true.
Regardless of threads, evaluation A happens-before evaluation B if any of the following is true:
The implementation is required to ensure that the happens-before relation is acyclic, by introducing additional synchronization if necessary (it can only be necessary if a consume operation is involved, see Batty et al).
If one evaluation modifies a memory location, and the other reads or modifies the same memory location, and if at least one of the evaluations is not an atomic operation, the behavior of the program is undefined (the program has a data race) unless there exists a happens-before relationship between these two evaluations.
The side-effect A on a scalar M (a write) is visible with respect to value computation B on M (a read) if both of the following are true:
If side-effect A is visible with respect to the value computation B, then the longest contiguous subset of the side-effects to M, in modification order, where B does not happen-before it is known as the visible sequence of side-effects. (the value of M, determined by B, will be the value stored by one of these side effects).
Note: inter-thread synchronization boils down to defining which side effects become visible under what conditions.
Atomic load with memory_order_consume
or stronger is a consume operation. Note that std::atomic_thread_fence
imposes stronger synchronization requirements than a consume operation.
Atomic load with memory_order_acquire
or stronger is an acquire operation. The lock() operation on a Mutex
is also an acquire operation. Note that std::atomic_thread_fence
imposes stronger synchronization requirements than an acquire operation.
Atomic store with memory_order_release
or stronger is a release operation. The unlock() operation on a Mutex
is also a release operation. Note that std::atomic_thread_fence
imposes stronger synchronization requirements than a release operation.
Atomic operations tagged memory_order_relaxed
are not synchronization operations; they do not impose an order among concurrent memory accesses. They only guarantee atomicity and modification order consistency.
For example, with x
and y
initially zero,
// Thread 1: r1 = y.load(memory_order_relaxed); // A x.store(r1, memory_order_relaxed); // B // Thread 2: r2 = x.load(memory_order_relaxed); // C y.store(42, memory_order_relaxed); // D
is allowed to produce r1 == r2 == 42
because, although A is sequenced-before B within thread 1 and C is sequenced before D within thread 2, nothing prevents D from appearing before A in the modification order of y, and B from appearing before C in the modification order of x. The side-effect of D on y could be visible to the load A in Thread 1 while the side effect of B on x could be visible to the load C in Thread 2.
Even with relaxed memory model, out-of-thin-air values are not allowed to circularly depend on their own computations, for example, with // Thread 1: r1 = x.load(memory_order_relaxed); if (r1 == 42) y.store(r1, memory_order_relaxed); // Thread 2: r2 = y.load(memory_order_relaxed); if (r2 == 42) x.store(42, memory_order_relaxed); is not allowed to produce | (since C++14) |
Typical use for relaxed memory ordering is incrementing counters, such as the reference counters of std::shared_ptr
, since this only requires atomicity, but not ordering or synchronization (note that decrementing the shared_ptr counters requires acquire-release synchronization with the destructor).
#include <vector> #include <iostream> #include <thread> #include <atomic> std::atomic<int> cnt = {0}; void f() { for (int n = 0; n < 1000; ++n) { cnt.fetch_add(1, std::memory_order_relaxed); } } int main() { std::vector<std::thread> v; for (int n = 0; n < 10; ++n) { v.emplace_back(f); } for (auto& t : v) { t.join(); } std::cout << "Final counter value is " << cnt << '\n'; }
Output:
Final counter value is 10000
If an atomic store in thread A is tagged memory_order_release
and an atomic load in thread B from the same variable is tagged memory_order_acquire
, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B, that is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.
The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.
On strongly-ordered systems (x86, SPARC TSO, IBM mainframe), release-acquire ordering is automatic for the majority of operations. No additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from moving non-atomic stores past the atomic store-release or perform non-atomic loads earlier than the atomic load-acquire). On weakly-ordered systems (ARM, Itanium, PowerPC), special CPU load or memory fence instructions have to be used.
Mutual exclusion locks (such as std::mutex
or atomic spinlock) are an example of release-acquire synchronization: when the lock is released by thread A and acquired by thread B, everything that took place in the critical section (before the release) in the context of thread A has to be visible to thread B (after the acquire) which is executing the same critical section.
#include <thread> #include <atomic> #include <cassert> #include <string> std::atomic<std::string*> ptr; int data; void producer() { std::string* p = new std::string("Hello"); data = 42; ptr.store(p, std::memory_order_release); } void consumer() { std::string* p2; while (!(p2 = ptr.load(std::memory_order_acquire))) ; assert(*p2 == "Hello"); // never fires assert(data == 42); // never fires } int main() { std::thread t1(producer); std::thread t2(consumer); t1.join(); t2.join(); }
The following example demonstrates transitive release-acquire ordering across three threads.
#include <thread> #include <atomic> #include <cassert> #include <vector> std::vector<int> data; std::atomic<int> flag = {0}; void thread_1() { data.push_back(42); flag.store(1, std::memory_order_release); } void thread_2() { int expected=1; while (!flag.compare_exchange_strong(expected, 2, std::memory_order_acq_rel)) { expected = 1; } } void thread_3() { while (flag.load(std::memory_order_acquire) < 2) ; assert(data.at(0) == 42); // will never fire } int main() { std::thread a(thread_1); std::thread b(thread_2); std::thread c(thread_3); a.join(); b.join(); c.join(); }
If an atomic store in thread A is tagged memory_order_release
and an atomic load in thread B from the same variable is tagged memory_order_consume
, all memory writes (non-atomic and relaxed atomic) that are dependency-ordered-before the atomic store from the point of view of thread A, become visible side-effects within those operations in thread B into which the load operation carries dependency, that is, once the atomic load is completed, those operators and functions in thread B that use the value obtained from the load are guaranteed to see what thread A wrote to memory.
The synchronization is established only between the threads releasing and consuming the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.
On all mainstream CPUs other than DEC Alpha, dependency ordering is automatic, no additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from performing speculative loads on the objects that are involved in the dependency chain).
Typical use cases for this ordering involve read access to rarely written concurrent data structures (routing tables, configuration, security policies, firewall rules, etc) and publisher-subscriber situations with pointer-mediated publication, that is, when the producer publishes a pointer through which the consumer can access information: there is no need to make everything else the producer wrote to memory visible to the consumer (which may be an expensive operation on weakly-ordered architectures). An example of such scenario is rcu_dereference.
See also std::kill_dependency
and [[carries_dependency]] for fine-grained dependency chain control.
Note that currently (2/2015) no known production compilers track dependency chains: consume operations are lifted to acquire operations.
The specification of release-consume ordering is being revised, and the use of | (since C++17) |
This example demonstrates dependency-ordered synchronization for pointer-mediated publication: the integer data is not related to the pointer to string by a data-dependency relationship, thus its value is undefined in the consumer.
#include <thread> #include <atomic> #include <cassert> #include <string> std::atomic<std::string*> ptr; int data; void producer() { std::string* p = new std::string("Hello"); data = 42; ptr.store(p, std::memory_order_release); } void consumer() { std::string* p2; while (!(p2 = ptr.load(std::memory_order_consume))) ; assert(*p2 == "Hello"); // never fires: *p2 carries dependency from ptr assert(data == 42); // may or may not fire: data does not carry dependency from ptr } int main() { std::thread t1(producer); std::thread t2(consumer); t1.join(); t2.join(); }
Atomic operations tagged memory_order_seq_cst
not only order memory the same way as release/acquire ordering (everything that happened-before a store in one thread becomes a visible side effect in the thread that did a load), but also establish a single total modification order of all atomic operations that are so tagged.
Formally,
Each memory_order_seq_cst
operation B that loads from atomic variable M, observes one of the following:
memory_order_seq_cst
and does not happen-before A memory_order_seq_cst
If there was a memory_order_seq_cst
std::atomic_thread_fence
operation X sequenced-before B, then B observes one of the following:
memory_order_seq_cst
modification of M that appears before X in the single total order For a pair of atomic operations on M called A and B, where A writes and B reads M's value, if there are two memory_order_seq_cst
std::atomic_thread_fence
s X and Y, and if A is sequenced-before X, Y is sequenced-before B, and X appears before Y in the Single Total Order, then B observes either:
For a pair of atomic modifications of M called A and B, B occurs after A in M's modification order if.
memory_order_seq_cst
std::atomic_thread_fence
X such that A is sequenced-before X and X appears before B in the Single Total Order memory_order_seq_cst
std::atomic_thread_fence
Y such that Y is sequenced-before B and A appears before Y in the Single Total Order memory_order_seq_cst
std::atomic_thread_fence
s X and Y such that A is sequenced-before X, Y is sequenced-before B, and X appears before Y in the Single Total Order. Note that the means that:
memory_order_seq_cst
enter the picture, the sequential consistency is lostSequential ordering may be necessary for multiple producer-multiple consumer situations where all consumers must observe the actions of all producers occurring in the same order.
Total sequential ordering requires a full memory fence CPU instruction on all multi-core systems. This may become a performance bottleneck since it forces the affected memory accesses to propagate to every core.
This example demonstrates a situation where sequential ordering is necessary. Any other ordering may trigger the assert because it would be possible for the threads c
and d
to observe changes to the atomics x
and y
in opposite order.
#include <thread> #include <atomic> #include <cassert> std::atomic<bool> x = {false}; std::atomic<bool> y = {false}; std::atomic<int> z = {0}; void write_x() { x.store(true, std::memory_order_seq_cst); } void write_y() { y.store(true, std::memory_order_seq_cst); } void read_x_then_y() { while (!x.load(std::memory_order_seq_cst)) ; if (y.load(std::memory_order_seq_cst)) { ++z; } } void read_y_then_x() { while (!y.load(std::memory_order_seq_cst)) ; if (x.load(std::memory_order_seq_cst)) { ++z; } } int main() { std::thread a(write_x); std::thread b(write_y); std::thread c(read_x_then_y); std::thread d(read_y_then_x); a.join(); b.join(); c.join(); d.join(); assert(z.load() != 0); // will never happen }
volatile
Within a thread of execution, accesses (reads and writes) to volatile objects cannot be reordered past observable side-effects (including other volatile accesses) that are sequenced-before or sequenced-after within the same thread, but this order is not guaranteed to be observed by another thread, since volatile access does not establish inter-thread synchronization.
In addition, volatile accesses are not atomic (concurrent read and write is a data race) and do not order memory (non-volatile memory accesses may be freely reordered around the volatile access).
One notable exception is Visual Studio, where, with default settings, every volatile write has release semantics and every volatile read has acquire semantics (MSDN), and thus volatiles may be used for inter-thread synchronization. Standard volatile
semantics are not applicable to multithreaded programming, although they are sufficient for e.g. communication with a std::signal
handler.
C documentation for memory order |
© cppreference.com
Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.
http://en.cppreference.com/w/cpp/atomic/memory_order