Why does atomic operation need exclusive cache access?Why do we need virtual functions in C++?atomic operation costC++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?How do I achieve the theoretical maximum of 4 FLOPs per cycle?Why does changing 0.1f to 0 slow down performance by 10x?Atomic Compare Operator (No swap)Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsHow expensive are atomic operations?C++11: does atomic::compare_exchange_weak support none-primitive types?Why are mutexes different from atomic operations in that the former is OS level and the latter is processor level?

What are the characteristics of a typeless programming language?

Critique of timeline aesthetic

What is the most expensive material in the world that could be used to create Pun-Pun's lute?

Which big number is bigger?

How to denote matrix elements succinctly?

Size of electromagnet needed to replicate Earth's magnetic field

"You've called the wrong number" or "You called the wrong number"

Don’t seats that recline flat defeat the purpose of having seatbelts?

How to stop co-workers from teasing me because I know Russian?

Pre-plastic human skin alternative

How can I practically buy stocks?

Could the terminal length of components like resistors be reduced?

Rivers without rain

Why do games have consumables?

Extension of 2-adic valuation to the real numbers

What does ゆーか mean?

Check if a string is entirely made of the same substring

Is there a way to generate a list of distinct numbers such that no two subsets ever have an equal sum?

Is the claim "Employers won't employ people with no 'social media presence'" realistic?

Providing evidence of Consent of Parents for Marriage by minor in England in early 1800s?

A strange hotel

How to fry ground beef so it is well-browned

A ​Note ​on ​N!

What term is being referred to with "reflected-sound-of-underground-spirits"?



Why does atomic operation need exclusive cache access?


Why do we need virtual functions in C++?atomic operation costC++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?How do I achieve the theoretical maximum of 4 FLOPs per cycle?Why does changing 0.1f to 0 slow down performance by 10x?Atomic Compare Operator (No swap)Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviationsHow expensive are atomic operations?C++11: does atomic::compare_exchange_weak support none-primitive types?Why are mutexes different from atomic operations in that the former is OS level and the latter is processor level?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















In my understanding atomic operation (c++ atomic for example) first locks the cache line and then performs atomic operation. I have two questions: 1. if let's say atomic compare and swap is atomic operation itself in hardware why we need to lock cache line and 2. when cache line is locked how another cpu is waiting for it? does it use spin-lock style waiting?



thanks










share|improve this question

















  • 2





    Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention

    – Klaus
    Mar 9 at 9:11











  • With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide std::atomic to give access to that H/W feature if available). Please, note that std::atomic may fall back to other locking if H/W lock is not available for the locked type. std::atomic.

    – Scheff
    Mar 9 at 9:13












  • That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.

    – Hans Passant
    Mar 9 at 10:25











  • I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...

    – shota silagadze
    Mar 10 at 11:03

















0















In my understanding atomic operation (c++ atomic for example) first locks the cache line and then performs atomic operation. I have two questions: 1. if let's say atomic compare and swap is atomic operation itself in hardware why we need to lock cache line and 2. when cache line is locked how another cpu is waiting for it? does it use spin-lock style waiting?



thanks










share|improve this question

















  • 2





    Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention

    – Klaus
    Mar 9 at 9:11











  • With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide std::atomic to give access to that H/W feature if available). Please, note that std::atomic may fall back to other locking if H/W lock is not available for the locked type. std::atomic.

    – Scheff
    Mar 9 at 9:13












  • That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.

    – Hans Passant
    Mar 9 at 10:25











  • I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...

    – shota silagadze
    Mar 10 at 11:03













0












0








0








In my understanding atomic operation (c++ atomic for example) first locks the cache line and then performs atomic operation. I have two questions: 1. if let's say atomic compare and swap is atomic operation itself in hardware why we need to lock cache line and 2. when cache line is locked how another cpu is waiting for it? does it use spin-lock style waiting?



thanks










share|improve this question














In my understanding atomic operation (c++ atomic for example) first locks the cache line and then performs atomic operation. I have two questions: 1. if let's say atomic compare and swap is atomic operation itself in hardware why we need to lock cache line and 2. when cache line is locked how another cpu is waiting for it? does it use spin-lock style waiting?



thanks







c++ multithreading cpu atomic






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 9 at 9:03









shota silagadzeshota silagadze

185




185







  • 2





    Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention

    – Klaus
    Mar 9 at 9:11











  • With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide std::atomic to give access to that H/W feature if available). Please, note that std::atomic may fall back to other locking if H/W lock is not available for the locked type. std::atomic.

    – Scheff
    Mar 9 at 9:13












  • That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.

    – Hans Passant
    Mar 9 at 10:25











  • I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...

    – shota silagadze
    Mar 10 at 11:03












  • 2





    Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention

    – Klaus
    Mar 9 at 9:11











  • With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide std::atomic to give access to that H/W feature if available). Please, note that std::atomic may fall back to other locking if H/W lock is not available for the locked type. std::atomic.

    – Scheff
    Mar 9 at 9:13












  • That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.

    – Hans Passant
    Mar 9 at 10:25











  • I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...

    – shota silagadze
    Mar 10 at 11:03







2




2





Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention

– Klaus
Mar 9 at 9:11





Have you read that one: fgiesen.wordpress.com/2014/08/18/atomics-and-contention

– Klaus
Mar 9 at 9:11













With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide std::atomic to give access to that H/W feature if available). Please, note that std::atomic may fall back to other locking if H/W lock is not available for the locked type. std::atomic.

– Scheff
Mar 9 at 9:13






With multiple cores, the atomic change has to become effective for every core. As another core may have cached the same storage, it has to invalidate the storage to grant that it will "see" the change. IMHO, this is a H/W issue how in detail it is done. I'm not sure whether C++ is relevant for this (except that it does provide std::atomic to give access to that H/W feature if available). Please, note that std::atomic may fall back to other locking if H/W lock is not available for the locked type. std::atomic.

– Scheff
Mar 9 at 9:13














That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.

– Hans Passant
Mar 9 at 10:25





That mental model is a bit too simplistic to really make headway, but processor manufacturers treat their memory controllers as a trade secret so it's not like you have many ways to make it more accurate. Every processor has a way to atomically update memory with a specific set of instructions. Which is all that std::atomic does, using those instructions. Other cores certainly can be stalled when such an update is in progress, you'd have to be a bit unlucky. Or write non-optimal code.

– Hans Passant
Mar 9 at 10:25













I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...

– shota silagadze
Mar 10 at 11:03





I just wanted to know why exclusive cache line access is needed when the atomic operation is atomic in itself for hardware ...

– shota silagadze
Mar 10 at 11:03












1 Answer
1






active

oldest

votes


















1














First of all: It depends!



1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!



2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.



3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!



4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.



5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.



As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.



There is much more than locking under the hood. I believe your question scratches only on the surface!



For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.






share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075685%2fwhy-does-atomic-operation-need-exclusive-cache-access%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    First of all: It depends!



    1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!



    2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.



    3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!



    4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.



    5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.



    As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.



    There is much more than locking under the hood. I believe your question scratches only on the surface!



    For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.






    share|improve this answer





























      1














      First of all: It depends!



      1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!



      2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.



      3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!



      4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.



      5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.



      As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.



      There is much more than locking under the hood. I believe your question scratches only on the surface!



      For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.






      share|improve this answer



























        1












        1








        1







        First of all: It depends!



        1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!



        2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.



        3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!



        4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.



        5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.



        As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.



        There is much more than locking under the hood. I believe your question scratches only on the surface!



        For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.






        share|improve this answer















        First of all: It depends!



        1.) If a system locks a cache line has nothing to do with c++. It is a question how a cache is organized and especially how assembler instructions acts with cache. That is a question for cpu architecture!



        2.) How a compiler performs a atomic operation is implementation depended. Which assembler instructions will be generated to perform a atomic operation can vary from compiler to compiler and even on different versions.



        3.) As I know, a full lock of a cache line is only the fall back solution if no "more clever" notification/synchronization of other cores accessing the same cache lines can be performed. But there are not only a single cache involved typically. Think of multi level cache architecture. Some caches are only visible to a single core! So there is a need of performing also more memory system operations as locking a line. You also have to move data from different cache levels also if multiple cores are involved!



        4.) From the c++ perspective, a atomic operation is not only a single operation. What really will happen depends on memory ordering options for the atomic operation. As atomic operations often used for inter thread synchronization, a lot more things must be done for a single atomic RMW operation! To get an idea what all has to be done you should give https://www.cplusplusconcurrencyinaction.com/ a chance. It goes into the details of memory barriers and memory ordering.



        5.) Locking a cache line ( if this really happens ) should not result in spin locks or other things on other cores as the access for the cache line itself took only some clock cycles. Depending on the architecture it simply "holds" the other core for some cycles. It may happen that the "sleeping" core can do in parallel other things in a different pipe. But hey, that is very hardware specific.



        As already given as a comment: Take a look on https://fgiesen.wordpress.com/2014/08/18/atomics-and-contention/, it gives some hints what can happen with cache coherency and locking.



        There is much more than locking under the hood. I believe your question scratches only on the surface!



        For practical usage: Don't think about! Compiler vendors and cpu architects have done a very good job. You as a programmer should measure your code performance. From my perspective: No need to think about of what happens if cache lines are locked. You have to write good algorithms and think about good memory organization of your program data and less interrelationships between threads.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Mar 9 at 9:35

























        answered Mar 9 at 9:30









        KlausKlaus

        11.3k12960




        11.3k12960





























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55075685%2fwhy-does-atomic-operation-need-exclusive-cache-access%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            1928 у кіно

            Захаров Федір Захарович

            Ель Греко