How does Linux perf calculate the cache-references and cache-misses eventsWhy doesn't perf report cache misses?Cache misses on macOSHardware cache events and perfHow does perf use the offcore events?How can I use linux perf and interpret its output to understand CPU cache misses?How to control web page caching, across all browsers?How to force browser to reload cached CSS/JS files?How to catch the L3-cache hits and misses by perf tool in LinuxLinux perf command for cache referencesintel xeon hardware cache events not supportedperf reports misses larger than total accessesSky high iTLB-load-missesperf.data to text or csvTwice as many page faults when reading from a large malloced array instead of just storing?Why modifying an instruction cause huge i-cache and i-TLB misses on x86?

What the heck is gets(stdin) on site coderbyte?

Can I run 125khz RF circuit on a breadboard?

Can I say "fingers" when referring to toes?

Ways of geometrical multiplication

If the only attacker is removed from combat, is a creature still counted as having attacked this turn?

Deciphering cause of death?

How to get directions in deep space?

Has the laser at Magurele, Romania reached a tenth of the Sun's power?

Anime with legendary swords made from talismans and a man who could change them with a shattered body

I'm just a whisper. Who am I?

Check if object is null and return null

Why does a 97 / 92 key piano exist by Bösendorfer?

How to make a list of partial sums using forEach

Is there a reason to prefer HFS+ over APFS for disk images in High Sierra and/or Mojave?

Do people actually use the word "kaputt" in conversation?

Is there anyway, I can have two passwords for my wi-fi

Grepping string, but include all non-blank lines following each grep match

Is there a RAID 0 Equivalent for RAM?

Why can't the Brexit deadlock in the UK parliament be solved with a plurality vote?

What does "tick" mean in this sentence?

How to leave product feedback on macOS?

Cumulative Sum using Java 8 stream API

How can I, as DM, avoid the Conga Line of Death occurring when implementing some form of flanking rule?

What is the smallest number n> 5 so that 5 ^ n ends with "3125"?

How does Linux perf calculate the cache-references and cache-misses events

Why doesn't perf report cache misses?Cache misses on macOSHardware cache events and perfHow does perf use the offcore events?How can I use linux perf and interpret its output to understand CPU cache misses?How to control web page caching, across all browsers?How to force browser to reload cached CSS/JS files?How to catch the L3-cache hits and misses by perf tool in LinuxLinux perf command for cache referencesintel xeon hardware cache events not supportedperf reports misses larger than total accessesSky high iTLB-load-missesperf.data to text or csvTwice as many page faults when reading from a large malloced array instead of just storing?Why modifying an instruction cause huge i-cache and i-TLB misses on x86?

I am confused by the perf events cache-misses and L1-icache-load-misses,L1-dcache-load-misses,LLC-load-misses. As when I tried to perf stat all of them, the answer doesn't seem consistent:

%$: sudo perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations,L1-dcache-load-misses,L1-dcache-loads,L1-dcache-stores,L1-icache-load-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetches ./my_app

 523,288,816 cache-references (22.89%)
 205,331,370 cache-misses # 39.239 % of all cache refs (31.53%)
 10,163,373,365 cycles (39.62%)
 13,739,845,761 instructions # 1.35 insn per cycle (47.43%)
 2,520,022,243 branches (54.90%)
 20,341 faults
 147 migrations
 237,794,728 L1-dcache-load-misses # 6.80% of all L1-dcache hits (62.43%)
 3,495,080,007 L1-dcache-loads (69.95%)
 2,039,344,725 L1-dcache-stores (69.95%)
 531,452,853 L1-icache-load-misses (70.11%)
 77,062,627 LLC-loads (70.47%)
 27,462,249 LLC-load-misses # 35.64% of all LL-cache hits (69.09%)
 15,039,473 LLC-stores (15.15%)
 3,829,429 LLC-store-misses (15.30%)

The L1-* and LLC-* events are easy to understand, as I can tell they are read from the hardware counters in CPU.

But how does perf calculate cache-misses event? From my understanding, if the cache-misses counts the number of memory accesses that cannot be served by the CPU cache, then shouldn't it be equal to LLC-loads-misses + LLC-store-misses? Clearly in my case, the cache-misses is much higher than the Last-Level-Cache-Misses number.

The same confusion goes to cache-reference. It is much lower than L1-dcache-loads and much higher then LLC-loads+LLC-stores

My Linux kernel and CPU info:

%$: uname -r

4.10.0-22-generic

%$: lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz
Stepping: 9
CPU MHz: 885.754
CPU max MHz: 4200.0000
CPU min MHz: 800.0000
BogoMIPS: 7584.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

edited Mar 7 at 22:47

asked Mar 7 at 2:51

LouisYe

133

Stack Overflow is for programming questions, not questions about using or configuring Unix and its utilities.. Unix & Linux or Super User would be better places for questions like this.

– Barmar
Mar 7 at 2:53

3

@Barmar The question is not about configuring anything. perf is a tool for measuring performance-related metrics and the question is about what do some of these metrics mean. The Linux tag may not be very relevant to the question, but still perf is a Linux tool, so it's at least marginally relevant.

– Hadi Brais
Mar 7 at 4:04

@HadiBrais I said "using or configuring Unix and its utilities", and he's "using its utilities" (it's a canned comment, I don't tailor it to each question). Actually, the question seems to be more about the design of Linux. But it's not about programming (he didn't post any code).

– Barmar
Mar 7 at 15:34

@Barmar thanks for providing the links. But I don't think StackOverflow should be limited to just "programming questions". My question here is about CPU architecture and related tools. It is about how programmers collect performance usage, and Linux is just happened to be the most popular platform. I believe any good programmers, especially those who program in C/C++, should be aware of features provided by CPU, especially CPU cache, in order to produce programs with good performance. It is definitely worth posting if any of the related tools is confusing.

– LouisYe
Mar 7 at 19:00

BTW, I made this post cuz I don't find the answer from another related StackOverflow post

– LouisYe
Mar 7 at 19:00

add a comment |

I am confused by the perf events cache-misses and L1-icache-load-misses,L1-dcache-load-misses,LLC-load-misses. As when I tried to perf stat all of them, the answer doesn't seem consistent:

%$: sudo perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations,L1-dcache-load-misses,L1-dcache-loads,L1-dcache-stores,L1-icache-load-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetches ./my_app

 523,288,816 cache-references (22.89%)
 205,331,370 cache-misses # 39.239 % of all cache refs (31.53%)
 10,163,373,365 cycles (39.62%)
 13,739,845,761 instructions # 1.35 insn per cycle (47.43%)
 2,520,022,243 branches (54.90%)
 20,341 faults
 147 migrations
 237,794,728 L1-dcache-load-misses # 6.80% of all L1-dcache hits (62.43%)
 3,495,080,007 L1-dcache-loads (69.95%)
 2,039,344,725 L1-dcache-stores (69.95%)
 531,452,853 L1-icache-load-misses (70.11%)
 77,062,627 LLC-loads (70.47%)
 27,462,249 LLC-load-misses # 35.64% of all LL-cache hits (69.09%)
 15,039,473 LLC-stores (15.15%)
 3,829,429 LLC-store-misses (15.30%)

The L1-* and LLC-* events are easy to understand, as I can tell they are read from the hardware counters in CPU.

The same confusion goes to cache-reference. It is much lower than L1-dcache-loads and much higher then LLC-loads+LLC-stores

My Linux kernel and CPU info:

%$: uname -r

4.10.0-22-generic

%$: lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz
Stepping: 9
CPU MHz: 885.754
CPU max MHz: 4200.0000
CPU min MHz: 800.0000
BogoMIPS: 7584.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

edited Mar 7 at 22:47

asked Mar 7 at 2:51

LouisYe

133

Stack Overflow is for programming questions, not questions about using or configuring Unix and its utilities.. Unix & Linux or Super User would be better places for questions like this.

– Barmar
Mar 7 at 2:53

3

@Barmar The question is not about configuring anything. perf is a tool for measuring performance-related metrics and the question is about what do some of these metrics mean. The Linux tag may not be very relevant to the question, but still perf is a Linux tool, so it's at least marginally relevant.

– Hadi Brais
Mar 7 at 4:04

@HadiBrais I said "using or configuring Unix and its utilities", and he's "using its utilities" (it's a canned comment, I don't tailor it to each question). Actually, the question seems to be more about the design of Linux. But it's not about programming (he didn't post any code).

– Barmar
Mar 7 at 15:34

@Barmar thanks for providing the links. But I don't think StackOverflow should be limited to just "programming questions". My question here is about CPU architecture and related tools. It is about how programmers collect performance usage, and Linux is just happened to be the most popular platform. I believe any good programmers, especially those who program in C/C++, should be aware of features provided by CPU, especially CPU cache, in order to produce programs with good performance. It is definitely worth posting if any of the related tools is confusing.

– LouisYe
Mar 7 at 19:00

BTW, I made this post cuz I don't find the answer from another related StackOverflow post

– LouisYe
Mar 7 at 19:00

add a comment |

I am confused by the perf events cache-misses and L1-icache-load-misses,L1-dcache-load-misses,LLC-load-misses. As when I tried to perf stat all of them, the answer doesn't seem consistent:

%$: sudo perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations,L1-dcache-load-misses,L1-dcache-loads,L1-dcache-stores,L1-icache-load-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetches ./my_app

 523,288,816 cache-references (22.89%)
 205,331,370 cache-misses # 39.239 % of all cache refs (31.53%)
 10,163,373,365 cycles (39.62%)
 13,739,845,761 instructions # 1.35 insn per cycle (47.43%)
 2,520,022,243 branches (54.90%)
 20,341 faults
 147 migrations
 237,794,728 L1-dcache-load-misses # 6.80% of all L1-dcache hits (62.43%)
 3,495,080,007 L1-dcache-loads (69.95%)
 2,039,344,725 L1-dcache-stores (69.95%)
 531,452,853 L1-icache-load-misses (70.11%)
 77,062,627 LLC-loads (70.47%)
 27,462,249 LLC-load-misses # 35.64% of all LL-cache hits (69.09%)
 15,039,473 LLC-stores (15.15%)
 3,829,429 LLC-store-misses (15.30%)

The L1-* and LLC-* events are easy to understand, as I can tell they are read from the hardware counters in CPU.

The same confusion goes to cache-reference. It is much lower than L1-dcache-loads and much higher then LLC-loads+LLC-stores

My Linux kernel and CPU info:

%$: uname -r

4.10.0-22-generic

%$: lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz
Stepping: 9
CPU MHz: 885.754
CPU max MHz: 4200.0000
CPU min MHz: 800.0000
BogoMIPS: 7584.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

edited Mar 7 at 22:47

asked Mar 7 at 2:51

LouisYe

133

I am confused by the perf events cache-misses and L1-icache-load-misses,L1-dcache-load-misses,LLC-load-misses. As when I tried to perf stat all of them, the answer doesn't seem consistent:

%$: sudo perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations,L1-dcache-load-misses,L1-dcache-loads,L1-dcache-stores,L1-icache-load-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetches ./my_app

 523,288,816 cache-references (22.89%)
 205,331,370 cache-misses # 39.239 % of all cache refs (31.53%)
 10,163,373,365 cycles (39.62%)
 13,739,845,761 instructions # 1.35 insn per cycle (47.43%)
 2,520,022,243 branches (54.90%)
 20,341 faults
 147 migrations
 237,794,728 L1-dcache-load-misses # 6.80% of all L1-dcache hits (62.43%)
 3,495,080,007 L1-dcache-loads (69.95%)
 2,039,344,725 L1-dcache-stores (69.95%)
 531,452,853 L1-icache-load-misses (70.11%)
 77,062,627 LLC-loads (70.47%)
 27,462,249 LLC-load-misses # 35.64% of all LL-cache hits (69.09%)
 15,039,473 LLC-stores (15.15%)
 3,829,429 LLC-store-misses (15.30%)

The L1-* and LLC-* events are easy to understand, as I can tell they are read from the hardware counters in CPU.

The same confusion goes to cache-reference. It is much lower than L1-dcache-loads and much higher then LLC-loads+LLC-stores

My Linux kernel and CPU info:

%$: uname -r

4.10.0-22-generic

%$: lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i5-7600K CPU @ 3.80GHz
Stepping: 9
CPU MHz: 885.754
CPU max MHz: 4200.0000
CPU min MHz: 800.0000
BogoMIPS: 7584.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

caching linux-kernel cpu perf

edited Mar 7 at 22:47

asked Mar 7 at 2:51

LouisYe

133

edited Mar 7 at 22:47

asked Mar 7 at 2:51

LouisYe

133

edited Mar 7 at 22:47

asked Mar 7 at 2:51

LouisYe

133

asked Mar 7 at 2:51

LouisYe

133

asked Mar 7 at 2:51

LouisYe

133

Stack Overflow is for programming questions, not questions about using or configuring Unix and its utilities.. Unix & Linux or Super User would be better places for questions like this.

– Barmar
Mar 7 at 2:53

3

@Barmar The question is not about configuring anything. perf is a tool for measuring performance-related metrics and the question is about what do some of these metrics mean. The Linux tag may not be very relevant to the question, but still perf is a Linux tool, so it's at least marginally relevant.

– Hadi Brais
Mar 7 at 4:04

@HadiBrais I said "using or configuring Unix and its utilities", and he's "using its utilities" (it's a canned comment, I don't tailor it to each question). Actually, the question seems to be more about the design of Linux. But it's not about programming (he didn't post any code).

– Barmar
Mar 7 at 15:34

@Barmar thanks for providing the links. But I don't think StackOverflow should be limited to just "programming questions". My question here is about CPU architecture and related tools. It is about how programmers collect performance usage, and Linux is just happened to be the most popular platform. I believe any good programmers, especially those who program in C/C++, should be aware of features provided by CPU, especially CPU cache, in order to produce programs with good performance. It is definitely worth posting if any of the related tools is confusing.

– LouisYe
Mar 7 at 19:00

BTW, I made this post cuz I don't find the answer from another related StackOverflow post

– LouisYe
Mar 7 at 19:00

add a comment |

Stack Overflow is for programming questions, not questions about using or configuring Unix and its utilities.. Unix & Linux or Super User would be better places for questions like this.

– Barmar
Mar 7 at 2:53

3

@Barmar The question is not about configuring anything. perf is a tool for measuring performance-related metrics and the question is about what do some of these metrics mean. The Linux tag may not be very relevant to the question, but still perf is a Linux tool, so it's at least marginally relevant.

– Hadi Brais
Mar 7 at 4:04

@HadiBrais I said "using or configuring Unix and its utilities", and he's "using its utilities" (it's a canned comment, I don't tailor it to each question). Actually, the question seems to be more about the design of Linux. But it's not about programming (he didn't post any code).

– Barmar
Mar 7 at 15:34

@Barmar thanks for providing the links. But I don't think StackOverflow should be limited to just "programming questions". My question here is about CPU architecture and related tools. It is about how programmers collect performance usage, and Linux is just happened to be the most popular platform. I believe any good programmers, especially those who program in C/C++, should be aware of features provided by CPU, especially CPU cache, in order to produce programs with good performance. It is definitely worth posting if any of the related tools is confusing.

– LouisYe
Mar 7 at 19:00

BTW, I made this post cuz I don't find the answer from another related StackOverflow post

– LouisYe
Mar 7 at 19:00

Stack Overflow is for programming questions, not questions about using or configuring Unix and its utilities.. Unix & Linux or Super User would be better places for questions like this.

– Barmar
Mar 7 at 2:53

@Barmar The question is not about configuring anything. perf is a tool for measuring performance-related metrics and the question is about what do some of these metrics mean. The Linux tag may not be very relevant to the question, but still perf is a Linux tool, so it's at least marginally relevant.

– Hadi Brais
Mar 7 at 4:04

@HadiBrais I said "using or configuring Unix and its utilities", and he's "using its utilities" (it's a canned comment, I don't tailor it to each question). Actually, the question seems to be more about the design of Linux. But it's not about programming (he didn't post any code).

– Barmar
Mar 7 at 15:34

@Barmar thanks for providing the links. But I don't think StackOverflow should be limited to just "programming questions". My question here is about CPU architecture and related tools. It is about how programmers collect performance usage, and Linux is just happened to be the most popular platform. I believe any good programmers, especially those who program in C/C++, should be aware of features provided by CPU, especially CPU cache, in order to produce programs with good performance. It is definitely worth posting if any of the related tools is confusing.

– LouisYe
Mar 7 at 19:00

BTW, I made this post cuz I don't find the answer from another related StackOverflow post

– LouisYe
Mar 7 at 19:00

add a comment |

1 Answer
1

active

oldest

votes

The built-in perf events that you are interested in are mapping to the following hardware performance monitoring events on your processor:

 523,288,816 cache-references (architectural event: LLC Reference) 
 205,331,370 cache-misses (architectural event: LLC Misses) 
 237,794,728 L1-dcache-load-misses L1D.REPLACEMENT
3,495,080,007 L1-dcache-loads MEM_INST_RETIRED.ALL_LOADS
2,039,344,725 L1-dcache-stores MEM_INST_RETIRED.ALL_STORES 
 531,452,853 L1-icache-load-misses ICACHE_64B.IFTAG_MISS
 77,062,627 LLC-loads OFFCORE_RESPONSE (MSR bits 0, 16, 30-37)
 27,462,249 LLC-load-misses OFFCORE_RESPONSE (MSR bits 0, 17, 26-29, 30-37)
 15,039,473 LLC-stores OFFCORE_RESPONSE (MSR bits 1, 16, 30-37)
 3,829,429 LLC-store-misses OFFCORE_RESPONSE (MSR bits 1, 17, 26-29, 30-37)

All of these events are documented in the Intel manual Volume 3. For more information on how to map perf events to native events, see: Hardware cache events and perf and How does perf use the offcore events?.

But how does perf calculate cache-misses event? From my understanding,
if the cache-misses counts the number of memory accesses that cannot
be served by the CPU cache, then shouldn't it be equal to
LLC-loads-misses + LLC-store-misses? Clearly in my case, the
cache-misses is much higher than the Last-Level-Cache-Misses number.

LLC-load-misses and LLC-store-misses count only demand requests but they also count both cacheable and uncacheable requests. On the other hand, cache-misses counts both demand and speculative requests but only the cacheable ones. So it's not necessary that one is larger than the other.

The same confusion goes to cache-reference. It is much lower than
L1-dcache-loads and much higher then LLC-loads+LLC-stores

It's only guaranteed that cache-reference is larger than cache-misses because the former counts requests irrespective of whether they miss the L3. It's normal for L1-dcache-loads to be larger than cache-reference because core-originated loads usually occur only when you have load instructions and because of the cache locality exhibited by many programs. But it's not necessarily always the case because of hardware prefetches.

The L1-* and LLC-* events are easy to understand, as I can tell they
are read from the hardware counters in CPU.

No, it's a trap. They are not easy to understand.

answered Mar 7 at 19:40

Hadi Brais

11k22244

thank you for the answer, now I understand why cache-references is higher than llc-loads+llc-stores, as the former counts both demand and speculative requests. It looks like you suggeset that cache-reference doesn't count any L1 cache access, am I right?

– LouisYe
Mar 8 at 0:00

@LouisYe If a cacheable memory access missed in the L1 and the L2, then it will be counted by cache-references. Otherwise, if it hits in the L1, then, no, it will not be counted by cache-references.

– Hadi Brais
Mar 8 at 0:05

1

Note that ther's also longest_lat_cache.miss and longest_lat_cache.reference - which, at least on my system, count exactly the same as cache-misses and cache-references and offcore_response.demand_data_rd.any_response corresponding to LLC-loads.

– Zulan
Mar 8 at 11:01

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55035313%2fhow-does-linux-perf-calculate-the-cache-references-and-cache-misses-events%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The built-in perf events that you are interested in are mapping to the following hardware performance monitoring events on your processor:

 523,288,816 cache-references (architectural event: LLC Reference) 
 205,331,370 cache-misses (architectural event: LLC Misses) 
 237,794,728 L1-dcache-load-misses L1D.REPLACEMENT
3,495,080,007 L1-dcache-loads MEM_INST_RETIRED.ALL_LOADS
2,039,344,725 L1-dcache-stores MEM_INST_RETIRED.ALL_STORES 
 531,452,853 L1-icache-load-misses ICACHE_64B.IFTAG_MISS
 77,062,627 LLC-loads OFFCORE_RESPONSE (MSR bits 0, 16, 30-37)
 27,462,249 LLC-load-misses OFFCORE_RESPONSE (MSR bits 0, 17, 26-29, 30-37)
 15,039,473 LLC-stores OFFCORE_RESPONSE (MSR bits 1, 16, 30-37)
 3,829,429 LLC-store-misses OFFCORE_RESPONSE (MSR bits 1, 17, 26-29, 30-37)

But how does perf calculate cache-misses event? From my understanding,
if the cache-misses counts the number of memory accesses that cannot
be served by the CPU cache, then shouldn't it be equal to
LLC-loads-misses + LLC-store-misses? Clearly in my case, the
cache-misses is much higher than the Last-Level-Cache-Misses number.

The same confusion goes to cache-reference. It is much lower than
L1-dcache-loads and much higher then LLC-loads+LLC-stores

The L1-* and LLC-* events are easy to understand, as I can tell they
are read from the hardware counters in CPU.

No, it's a trap. They are not easy to understand.

answered Mar 7 at 19:40

Hadi Brais

11k22244

thank you for the answer, now I understand why cache-references is higher than llc-loads+llc-stores, as the former counts both demand and speculative requests. It looks like you suggeset that cache-reference doesn't count any L1 cache access, am I right?

– LouisYe
Mar 8 at 0:00

@LouisYe If a cacheable memory access missed in the L1 and the L2, then it will be counted by cache-references. Otherwise, if it hits in the L1, then, no, it will not be counted by cache-references.

– Hadi Brais
Mar 8 at 0:05

1

Note that ther's also longest_lat_cache.miss and longest_lat_cache.reference - which, at least on my system, count exactly the same as cache-misses and cache-references and offcore_response.demand_data_rd.any_response corresponding to LLC-loads.

– Zulan
Mar 8 at 11:01

add a comment |

The built-in perf events that you are interested in are mapping to the following hardware performance monitoring events on your processor:

 523,288,816 cache-references (architectural event: LLC Reference) 
 205,331,370 cache-misses (architectural event: LLC Misses) 
 237,794,728 L1-dcache-load-misses L1D.REPLACEMENT
3,495,080,007 L1-dcache-loads MEM_INST_RETIRED.ALL_LOADS
2,039,344,725 L1-dcache-stores MEM_INST_RETIRED.ALL_STORES 
 531,452,853 L1-icache-load-misses ICACHE_64B.IFTAG_MISS
 77,062,627 LLC-loads OFFCORE_RESPONSE (MSR bits 0, 16, 30-37)
 27,462,249 LLC-load-misses OFFCORE_RESPONSE (MSR bits 0, 17, 26-29, 30-37)
 15,039,473 LLC-stores OFFCORE_RESPONSE (MSR bits 1, 16, 30-37)
 3,829,429 LLC-store-misses OFFCORE_RESPONSE (MSR bits 1, 17, 26-29, 30-37)

But how does perf calculate cache-misses event? From my understanding,
if the cache-misses counts the number of memory accesses that cannot
be served by the CPU cache, then shouldn't it be equal to
LLC-loads-misses + LLC-store-misses? Clearly in my case, the
cache-misses is much higher than the Last-Level-Cache-Misses number.

The same confusion goes to cache-reference. It is much lower than
L1-dcache-loads and much higher then LLC-loads+LLC-stores

The L1-* and LLC-* events are easy to understand, as I can tell they
are read from the hardware counters in CPU.

No, it's a trap. They are not easy to understand.

answered Mar 7 at 19:40

Hadi Brais

11k22244

thank you for the answer, now I understand why cache-references is higher than llc-loads+llc-stores, as the former counts both demand and speculative requests. It looks like you suggeset that cache-reference doesn't count any L1 cache access, am I right?

– LouisYe
Mar 8 at 0:00

@LouisYe If a cacheable memory access missed in the L1 and the L2, then it will be counted by cache-references. Otherwise, if it hits in the L1, then, no, it will not be counted by cache-references.

– Hadi Brais
Mar 8 at 0:05

1

Note that ther's also longest_lat_cache.miss and longest_lat_cache.reference - which, at least on my system, count exactly the same as cache-misses and cache-references and offcore_response.demand_data_rd.any_response corresponding to LLC-loads.

– Zulan
Mar 8 at 11:01

add a comment |

The built-in perf events that you are interested in are mapping to the following hardware performance monitoring events on your processor:

 523,288,816 cache-references (architectural event: LLC Reference) 
 205,331,370 cache-misses (architectural event: LLC Misses) 
 237,794,728 L1-dcache-load-misses L1D.REPLACEMENT
3,495,080,007 L1-dcache-loads MEM_INST_RETIRED.ALL_LOADS
2,039,344,725 L1-dcache-stores MEM_INST_RETIRED.ALL_STORES 
 531,452,853 L1-icache-load-misses ICACHE_64B.IFTAG_MISS
 77,062,627 LLC-loads OFFCORE_RESPONSE (MSR bits 0, 16, 30-37)
 27,462,249 LLC-load-misses OFFCORE_RESPONSE (MSR bits 0, 17, 26-29, 30-37)
 15,039,473 LLC-stores OFFCORE_RESPONSE (MSR bits 1, 16, 30-37)
 3,829,429 LLC-store-misses OFFCORE_RESPONSE (MSR bits 1, 17, 26-29, 30-37)

But how does perf calculate cache-misses event? From my understanding,
if the cache-misses counts the number of memory accesses that cannot
be served by the CPU cache, then shouldn't it be equal to
LLC-loads-misses + LLC-store-misses? Clearly in my case, the
cache-misses is much higher than the Last-Level-Cache-Misses number.

The same confusion goes to cache-reference. It is much lower than
L1-dcache-loads and much higher then LLC-loads+LLC-stores

The L1-* and LLC-* events are easy to understand, as I can tell they
are read from the hardware counters in CPU.

No, it's a trap. They are not easy to understand.

answered Mar 7 at 19:40

Hadi Brais

11k22244

The built-in perf events that you are interested in are mapping to the following hardware performance monitoring events on your processor:

 523,288,816 cache-references (architectural event: LLC Reference) 
 205,331,370 cache-misses (architectural event: LLC Misses) 
 237,794,728 L1-dcache-load-misses L1D.REPLACEMENT
3,495,080,007 L1-dcache-loads MEM_INST_RETIRED.ALL_LOADS
2,039,344,725 L1-dcache-stores MEM_INST_RETIRED.ALL_STORES 
 531,452,853 L1-icache-load-misses ICACHE_64B.IFTAG_MISS
 77,062,627 LLC-loads OFFCORE_RESPONSE (MSR bits 0, 16, 30-37)
 27,462,249 LLC-load-misses OFFCORE_RESPONSE (MSR bits 0, 17, 26-29, 30-37)
 15,039,473 LLC-stores OFFCORE_RESPONSE (MSR bits 1, 16, 30-37)
 3,829,429 LLC-store-misses OFFCORE_RESPONSE (MSR bits 1, 17, 26-29, 30-37)

But how does perf calculate cache-misses event? From my understanding,
if the cache-misses counts the number of memory accesses that cannot
be served by the CPU cache, then shouldn't it be equal to
LLC-loads-misses + LLC-store-misses? Clearly in my case, the
cache-misses is much higher than the Last-Level-Cache-Misses number.

The same confusion goes to cache-reference. It is much lower than
L1-dcache-loads and much higher then LLC-loads+LLC-stores

The L1-* and LLC-* events are easy to understand, as I can tell they
are read from the hardware counters in CPU.

No, it's a trap. They are not easy to understand.

answered Mar 7 at 19:40

Hadi Brais

11k22244

answered Mar 7 at 19:40

Hadi Brais

11k22244

answered Mar 7 at 19:40

Hadi Brais

11k22244

answered Mar 7 at 19:40

Hadi Brais

11k22244

thank you for the answer, now I understand why cache-references is higher than llc-loads+llc-stores, as the former counts both demand and speculative requests. It looks like you suggeset that cache-reference doesn't count any L1 cache access, am I right?

– LouisYe
Mar 8 at 0:00

@LouisYe If a cacheable memory access missed in the L1 and the L2, then it will be counted by cache-references. Otherwise, if it hits in the L1, then, no, it will not be counted by cache-references.

– Hadi Brais
Mar 8 at 0:05

1

Note that ther's also longest_lat_cache.miss and longest_lat_cache.reference - which, at least on my system, count exactly the same as cache-misses and cache-references and offcore_response.demand_data_rd.any_response corresponding to LLC-loads.

– Zulan
Mar 8 at 11:01

add a comment |

thank you for the answer, now I understand why cache-references is higher than llc-loads+llc-stores, as the former counts both demand and speculative requests. It looks like you suggeset that cache-reference doesn't count any L1 cache access, am I right?

– LouisYe
Mar 8 at 0:00

@LouisYe If a cacheable memory access missed in the L1 and the L2, then it will be counted by cache-references. Otherwise, if it hits in the L1, then, no, it will not be counted by cache-references.

– Hadi Brais
Mar 8 at 0:05

1

Note that ther's also longest_lat_cache.miss and longest_lat_cache.reference - which, at least on my system, count exactly the same as cache-misses and cache-references and offcore_response.demand_data_rd.any_response corresponding to LLC-loads.

– Zulan
Mar 8 at 11:01

thank you for the answer, now I understand why cache-references is higher than llc-loads+llc-stores, as the former counts both demand and speculative requests. It looks like you suggeset that cache-reference doesn't count any L1 cache access, am I right?

– LouisYe
Mar 8 at 0:00

@LouisYe If a cacheable memory access missed in the L1 and the L2, then it will be counted by cache-references. Otherwise, if it hits in the L1, then, no, it will not be counted by cache-references.

– Hadi Brais
Mar 8 at 0:05

Note that ther's also longest_lat_cache.miss and longest_lat_cache.reference - which, at least on my system, count exactly the same as cache-misses and cache-references and offcore_response.demand_data_rd.any_response corresponding to LLC-loads.

– Zulan
Mar 8 at 11:01

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ufdjrw

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Алба-Юлія

Захаров Федір Захарович

1 Answer
1

1 Answer
1

1 Answer
1