Why does atomic operation need exclusive cache access?Why do we need virtual functions in C++?atomic operation costC++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?How do I achieve the theoretical maximum of 4 FLOPs per cycle?Why does changing 0.1f to 0 slow down performance by 10x?Atomic Compare Operator (No swap)Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsHow expensive are atomic operations?C++11: does atomic::compare_exchange_weak support none-primitive types?Why are mutexes different from atomic operations in that the former is OS level and the latter is processor level?
What are the characteristics of a typeless programming language?
Critique of timeline aesthetic
What is the most expensive material in the world that could be used to create Pun-Pun's lute?
Which big number is bigger?
How to denote matrix elements succinctly?
Size of electromagnet needed to replicate Earth's magnetic field
"You've called the wrong number" or "You called the wrong number"
Don’t seats that recline flat defeat the purpose of having seatbelts?
How to stop co-workers from teasing me because I know Russian?
Pre-plastic human skin alternative
How can I practically buy stocks?
Could the terminal length of components like resistors be reduced?
Rivers without rain
Why do games have consumables?
Extension of 2-adic valuation to the real numbers
What does ゆーか mean?
Check if a string is entirely made of the same substring
Is there a way to generate a list of distinct numbers such that no two subsets ever have an equal sum?
Is the claim "Employers won't employ people with no 'social media presence'" realistic?
Providing evidence of Consent of Parents for Marriage by minor in England in early 1800s?
A strange hotel
How to fry ground beef so it is well-browned
A Note on N!
What term is being referred to with "reflected-sound-of-underground-spirits"?
Why does atomic operation need exclusive cache access?
Why do we need virtual functions in C++?atomic operation costC++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?How do I achieve the theoretical maximum of 4 FLOPs per cycle?Why does changing 0.1f to 0 slow down performance by 10x?Atomic Compare Operator (No swap)Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsHow expensive are atomic operations?C++11: does atomic::compare_exchange_weak support none-primitive types?Why are mutexes different from atomic operations in that the former is OS level and the latter is processor level?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
In my understanding atomic operation (c++ atomic for example) first locks the cache line and then performs atomic operation. I have two questions: 1. if let's say atomic compare and swap is atomic operation itself in hardware why we need to lock cache line and 2. when cache line is locked how another cpu is waiting for it? does it use spin-lock style waiting?
thanks
c++ multithreading cpu atomic
add a comment |
In my understanding atomic operation (c++ atomic for example) first locks the cache line and then performs atomic operation. I have two questions: 1. if let's say atomic compare and swap is atomic operation itself in hardware why we need to lock cache line and 2. when cache line is locked how another cpu is waiting for it? does it use spin-lock style waiting?
thanks
c++ multithreading cpu atomic
2
Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention
– Klaus
Mar 9 at 9:11
With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does providestd::atomic
to give access to that H/W feature if available). Please, note thatstd::atomic
may fall back to other locking if H/W lock is not available for the locked type.std::atomic
.
– Scheff
Mar 9 at 9:13
That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.
– Hans Passant
Mar 9 at 10:25
I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...
– shota silagadze
Mar 10 at 11:03
add a comment |
In my understanding atomic operation (c++ atomic for example) first locks the cache line and then performs atomic operation. I have two questions: 1. if let's say atomic compare and swap is atomic operation itself in hardware why we need to lock cache line and 2. when cache line is locked how another cpu is waiting for it? does it use spin-lock style waiting?
thanks
c++ multithreading cpu atomic
In my understanding atomic operation (c++ atomic for example) first locks the cache line and then performs atomic operation. I have two questions: 1. if let's say atomic compare and swap is atomic operation itself in hardware why we need to lock cache line and 2. when cache line is locked how another cpu is waiting for it? does it use spin-lock style waiting?
thanks
c++ multithreading cpu atomic
c++ multithreading cpu atomic
asked Mar 9 at 9:03
shota silagadzeshota silagadze
185
185
2
Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention
– Klaus
Mar 9 at 9:11
With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does providestd::atomic
to give access to that H/W feature if available). Please, note thatstd::atomic
may fall back to other locking if H/W lock is not available for the locked type.std::atomic
.
– Scheff
Mar 9 at 9:13
That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.
– Hans Passant
Mar 9 at 10:25
I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...
– shota silagadze
Mar 10 at 11:03
add a comment |
2
Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention
– Klaus
Mar 9 at 9:11
With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does providestd::atomic
to give access to that H/W feature if available). Please, note thatstd::atomic
may fall back to other locking if H/W lock is not available for the locked type.std::atomic
.
– Scheff
Mar 9 at 9:13
That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.
– Hans Passant
Mar 9 at 10:25
I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...
– shota silagadze
Mar 10 at 11:03
2
2
Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention
– Klaus
Mar 9 at 9:11
Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention
– Klaus
Mar 9 at 9:11
With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide
std::atomic
to give access to that H/W feature if available). Please, note that std::atomic
may fall back to other locking if H/W lock is not available for the locked type. std::atomic
.– Scheff
Mar 9 at 9:13
With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide
std::atomic
to give access to that H/W feature if available). Please, note that std::atomic
may fall back to other locking if H/W lock is not available for the locked type. std::atomic
.– Scheff
Mar 9 at 9:13
That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.
– Hans Passant
Mar 9 at 10:25
That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.
– Hans Passant
Mar 9 at 10:25
I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...
– shota silagadze
Mar 10 at 11:03
I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...
– shota silagadze
Mar 10 at 11:03
add a comment |
1 Answer
1
active
oldest
votes
First of all: It depends!
1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!
2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.
3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!
4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.
5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.
As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.
There is much more than locking under the hood. I believe your question scratches only on the surface!
For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075685%2fwhy-does-atomic-operation-need-exclusive-cache-access%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First of all: It depends!
1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!
2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.
3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!
4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.
5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.
As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.
There is much more than locking under the hood. I believe your question scratches only on the surface!
For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.
add a comment |
First of all: It depends!
1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!
2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.
3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!
4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.
5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.
As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.
There is much more than locking under the hood. I believe your question scratches only on the surface!
For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.
add a comment |
First of all: It depends!
1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!
2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.
3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!
4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.
5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.
As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.
There is much more than locking under the hood. I believe your question scratches only on the surface!
For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.
First of all: It depends!
1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!
2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.
3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!
4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.
5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.
As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.
There is much more than locking under the hood. I believe your question scratches only on the surface!
For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.
edited Mar 9 at 9:35
answered Mar 9 at 9:30
KlausKlaus
11.3k12960
11.3k12960
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075685%2fwhy-does-atomic-operation-need-exclusive-cache-access%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention
– Klaus
Mar 9 at 9:11
With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide
std::atomic
to give access to that H/W feature if available). Please, note thatstd::atomic
may fall back to other locking if H/W lock is not available for the locked type.std::atomic
.– Scheff
Mar 9 at 9:13
That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.
– Hans Passant
Mar 9 at 10:25
I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...
– shota silagadze
Mar 10 at 11:03