Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need x86_64 hardware transactional memory benchmarks #8

Open
mtak- opened this issue Mar 30, 2019 · 6 comments
Open

Need x86_64 hardware transactional memory benchmarks #8

mtak- opened this issue Mar 30, 2019 · 6 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@mtak-
Copy link
Owner

mtak- commented Mar 30, 2019

To create heuristics for determining when HTM is likely to speed up a transaction, benchmarks are necessary.

If anyone reading this has a few free moments, and a CPU that supports transactional memory (some haswell processors, or later), I would be grateful if you would help out.

Need your cpuinfo.

linux

$ cat /proc/cpuinfo
... output here ...

macos, use system profiler

Hardware Overview:

  Model Name:	MacBook Pro
  Model Identifier:	MacBookPro13,3
  Processor Name:	Intel Core i7
  Processor Speed:	2.9 GHz
  Number of Processors:	1
  Total Number of Cores:	4
  L2 Cache (per Core):	256 KB
  L3 Cache:	8 MB

In the swym-htm project, there's a x.py script that can be run to create benchmarks.
Path: swym/swym-htm/x.py

My output

$ ./x.py bench
test bench_abort  ... bench:  49,516,424 ns/iter (+/- 2,894,024)
test bench_tx0000 ... bench:  12,223,724 ns/iter (+/- 832,878)
test bench_tx0001 ... bench:  12,501,338 ns/iter (+/- 1,302,260)
test bench_tx0002 ... bench:  12,575,727 ns/iter (+/- 1,187,649)
test bench_tx0004 ... bench:  12,615,560 ns/iter (+/- 1,315,751)
test bench_tx0008 ... bench:  12,570,609 ns/iter (+/- 1,079,593)
test bench_tx0016 ... bench:  12,489,782 ns/iter (+/- 1,333,362)
test bench_tx0024 ... bench:  20,527,809 ns/iter (+/- 2,620,641)
test bench_tx0032 ... bench:  21,912,395 ns/iter (+/- 2,646,718)
test bench_tx0040 ... bench:  22,725,125 ns/iter (+/- 2,352,237)
test bench_tx0048 ... bench:  23,449,303 ns/iter (+/- 2,910,261)
test bench_tx0056 ... bench:  24,562,059 ns/iter (+/- 2,760,177)
test bench_tx0064 ... bench:  14,486,932 ns/iter (+/- 767,444)
test bench_tx0072 ... bench:  14,776,765 ns/iter (+/- 1,522,511)
test bench_tx0080 ... bench:  15,262,921 ns/iter (+/- 1,356,767)
test bench_tx0112 ... bench:  17,002,269 ns/iter (+/- 1,317,870)
test bench_tx0120 ... bench:  17,277,812 ns/iter (+/- 1,579,734)
test bench_tx0128 ... bench:  17,917,132 ns/iter (+/- 2,082,900)
test bench_tx0256 ... bench:  26,625,425 ns/iter (+/- 1,667,125)

Additionally, running test with nocapture will give a feel for the capacity of your cpu

$ ./x.py test capacity --release -- --nocapture
Capacity: 30016
@mtak- mtak- added help wanted Extra attention is needed good first issue Good for newcomers labels Mar 30, 2019
@glaebhoerl
Copy link

$ ./x.py bench
   Compiling libc v0.2.51
   Compiling fs_extra v1.1.0
   Compiling cc v1.0.32
   Compiling scopeguard v0.3.3
   Compiling lazy_static v1.3.0
   Compiling cfg-if v0.1.7
   Compiling lock_api v0.1.5
   Compiling crossbeam-utils v0.6.5
   Compiling swym v0.1.0-preview (/tmp/swym)
error[E0432]: unresolved import `swym_htm`
   --> src/lib.rs:109:9
    |
109 | pub use swym_htm as htm;
    |         ^^^^^^^^^^^^^^^ no `swym_htm` external crate

error: aborting due to previous error

For more information about this error, try `rustc --explain E0432`.
error: Could not compile `swym`.
warning: build failed, waiting for other jobs to finish...
error: build failed

(I assume this needs some kind of trivial fix but don't have spare energy to figure it out.)

@mtak-
Copy link
Owner Author

mtak- commented Apr 7, 2019

Thank you for trying!

I got travis going to help keep me honest. These issues should be fixed now.

I'm particularly interested in how the new i9's perform - the caching architecture is different. I'm also a little curious about the sudden increased performance at bench_tx0064.

@glaebhoerl
Copy link

Don't have one of those unfortunately... I do have an i7-6820HQ.

My benchmark results don't have the same test names as the ones you pasted, is that just because you've changed them in the meantime?

./x.py test capacity --release -- --nocapture didn't print anything resembling the line from your comment (just a bunch of stuff like "running 0 tests").

cpuinfo
# cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 1000.169
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 1000.184
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 1
cpu cores       : 4
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 1000.394
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 2
cpu cores       : 4
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 1000.073
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 6
initial apicid  : 6
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 1000.009
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 1000.007
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 1
cpu cores       : 4
apicid          : 3
initial apicid  : 3
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 1000.047
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 2
cpu cores       : 4
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 94
model name      : Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
stepping        : 3
microcode       : 0x9e
cpu MHz         : 1000.000
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 7
initial apicid  : 7
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips        : 5424.00
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:
results
$ ./x.py bench
    Finished release [optimized] target(s) in 0.02s
     Running target/release/deps/swym-66445be1c48c7d2d

running 11 tests
test internal::frw_lock::test::is_send_sync ... ignored
test memory::leak_multi ... ignored
test memory::leak_single ... ignored
test memory::overaligned ... ignored
test memory::zero_sized ... ignored
test memory::zero_sized_drop ... ignored
test panic::nest_fail ... ignored
test panic::simple ... ignored
test panic::write_log ... ignored
test tcell::test::publish_3x ... ignored
test tcell::test::publish_retry ... ignored

test result: ok. 0 passed; 0 failed; 11 ignored; 0 measured; 0 filtered out

     Running target/release/deps/get_one-1e1e24fc97fae4aa

running 6 tests
test get_one::read::boxed::run        ... bench:   1,115,066 ns/iter (+/- 146,217)
test get_one::read::usize::run        ... bench:   1,115,047 ns/iter (+/- 84,515)
test get_one::rw_logged::boxed::run   ... bench:   3,944,137 ns/iter (+/- 409,589)
test get_one::rw_logged::usize::run   ... bench:   3,939,644 ns/iter (+/- 420,033)
test get_one::rw_unlogged::boxed::run ... bench:   2,770,712 ns/iter (+/- 185,353)
test get_one::rw_unlogged::usize::run ... bench:   2,591,859 ns/iter (+/- 204,150)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out

     Running target/release/deps/increment-0ecf5e5e010d9aa5

running 2 tests
test increment::logged   ... bench:  31,409,770 ns/iter (+/- 6,070,916)
test increment::unlogged ... bench:  26,880,634 ns/iter (+/- 1,777,773)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

     Running target/release/deps/read-dcd145b937362b8c

running 2 tests
test read::standard_key ... bench:     590,354 ns/iter (+/- 7,064)
test read::try_key      ... bench:     579,851 ns/iter (+/- 65,860)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

     Running target/release/deps/rw-3693148031e66b02

running 2 tests
test rw::standard_key ... bench:   1,471,890 ns/iter (+/- 141,891)
test rw::try_key      ... bench:   1,473,867 ns/iter (+/- 127,016)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

     Running target/release/deps/set_one-c4920b7abd39b76d

running 3 tests
test set_one::boxed::run ... bench:  63,468,766 ns/iter (+/- 5,239,124)
test set_one::drop::run  ... bench:  47,207,950 ns/iter (+/- 9,818,289)
test set_one::usize::run ... bench:  27,447,694 ns/iter (+/- 2,541,328)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

     Running target/release/deps/single_threaded_scaling-7c844fee57222055

running 39 tests
test single_threaded_scaling::atomic_write_001 ... bench:           5 ns/iter (+/- 0)
test single_threaded_scaling::atomic_write_002 ... bench:          10 ns/iter (+/- 1)
test single_threaded_scaling::atomic_write_004 ... bench:          19 ns/iter (+/- 2)
test single_threaded_scaling::atomic_write_008 ... bench:          40 ns/iter (+/- 2)
test single_threaded_scaling::atomic_write_016 ... bench:          85 ns/iter (+/- 4)
test single_threaded_scaling::atomic_write_032 ... bench:         170 ns/iter (+/- 17)
test single_threaded_scaling::atomic_write_064 ... bench:         411 ns/iter (+/- 17)
test single_threaded_scaling::atomic_write_065 ... bench:         413 ns/iter (+/- 31)
test single_threaded_scaling::atomic_write_066 ... bench:         424 ns/iter (+/- 31)
test single_threaded_scaling::atomic_write_067 ... bench:         408 ns/iter (+/- 32)
test single_threaded_scaling::atomic_write_068 ... bench:         422 ns/iter (+/- 42)
test single_threaded_scaling::atomic_write_128 ... bench:         639 ns/iter (+/- 48)
test single_threaded_scaling::atomic_write_256 ... bench:       1,286 ns/iter (+/- 92)
test single_threaded_scaling::lock_write_001   ... bench:          16 ns/iter (+/- 0)
test single_threaded_scaling::lock_write_002   ... bench:          34 ns/iter (+/- 0)
test single_threaded_scaling::lock_write_004   ... bench:          64 ns/iter (+/- 8)
test single_threaded_scaling::lock_write_008   ... bench:         129 ns/iter (+/- 0)
test single_threaded_scaling::lock_write_016   ... bench:         264 ns/iter (+/- 16)
test single_threaded_scaling::lock_write_032   ... bench:         532 ns/iter (+/- 23)
test single_threaded_scaling::lock_write_064   ... bench:       1,066 ns/iter (+/- 33)
test single_threaded_scaling::lock_write_065   ... bench:       1,084 ns/iter (+/- 105)
test single_threaded_scaling::lock_write_066   ... bench:       1,095 ns/iter (+/- 64)
test single_threaded_scaling::lock_write_067   ... bench:       1,180 ns/iter (+/- 85)
test single_threaded_scaling::lock_write_068   ... bench:       1,129 ns/iter (+/- 42)
test single_threaded_scaling::lock_write_128   ... bench:       2,117 ns/iter (+/- 95)
test single_threaded_scaling::lock_write_256   ... bench:       4,256 ns/iter (+/- 258)
test single_threaded_scaling::write_001        ... bench:          31 ns/iter (+/- 0)
test single_threaded_scaling::write_002        ... bench:          40 ns/iter (+/- 6)
test single_threaded_scaling::write_004        ... bench:          71 ns/iter (+/- 0)
test single_threaded_scaling::write_008        ... bench:         125 ns/iter (+/- 11)
test single_threaded_scaling::write_016        ... bench:         234 ns/iter (+/- 1)
test single_threaded_scaling::write_032        ... bench:         509 ns/iter (+/- 18)
test single_threaded_scaling::write_064        ... bench:       1,004 ns/iter (+/- 143)
test single_threaded_scaling::write_065        ... bench:       1,476 ns/iter (+/- 129)
test single_threaded_scaling::write_066        ... bench:       2,061 ns/iter (+/- 42)
test single_threaded_scaling::write_067        ... bench:       2,572 ns/iter (+/- 167)
test single_threaded_scaling::write_068        ... bench:       3,082 ns/iter (+/- 37)
test single_threaded_scaling::write_128        ... bench:      44,172 ns/iter (+/- 5,512)
test single_threaded_scaling::write_256        ... bench:     225,326 ns/iter (+/- 17,103)

test result: ok. 0 passed; 0 failed; 0 ignored; 39 measured; 0 filtered out

     Running target/release/deps/thread_key-5484fee832b3a90f

running 1 test
test thread_key::thread_key ... bench:   2,015,138 ns/iter (+/- 256,289)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out

@mtak-
Copy link
Owner Author

mtak- commented Apr 7, 2019

There's an x.py script per project. Looks like you're running the root swym/x.py. The one I'm interested in is in swym/swym-htm/x.py. Thanks again!

@glaebhoerl
Copy link

Oops, sorry. Third time's the charm...

test bench_abort     ... bench:  49,708,538 ns/iter (+/- 3,006,367)
test bench_tx0000    ... bench:  13,160,964 ns/iter (+/- 917,229)
test bench_tx0001    ... bench:  13,910,020 ns/iter (+/- 1,064,743)
test bench_tx0002    ... bench:  13,478,335 ns/iter (+/- 1,041,851)
test bench_tx0004    ... bench:  13,727,176 ns/iter (+/- 1,057,050)
test bench_tx0008    ... bench:  13,657,537 ns/iter (+/- 921,114)
test bench_tx0016    ... bench:  12,993,921 ns/iter (+/- 963,741)
test bench_tx0024    ... bench:  19,727,981 ns/iter (+/- 1,841,748)
test bench_tx0032    ... bench:  20,865,940 ns/iter (+/- 1,619,074)
test bench_tx0040    ... bench:  22,240,479 ns/iter (+/- 1,797,312)
test bench_tx0048    ... bench:  23,911,298 ns/iter (+/- 2,156,623)
test bench_tx0056    ... bench:  24,660,521 ns/iter (+/- 1,692,822)
test bench_tx0064    ... bench:  15,083,699 ns/iter (+/- 1,112,182)
test bench_tx0072    ... bench:  16,249,445 ns/iter (+/- 1,326,942)
test bench_tx0080    ... bench:  15,920,539 ns/iter (+/- 1,043,853)
test bench_tx0112    ... bench:  18,044,455 ns/iter (+/- 1,295,858)
test bench_tx0120    ... bench:  18,167,374 ns/iter (+/- 1,563,842)
test bench_tx0128    ... bench:  18,451,934 ns/iter (+/- 1,422,192)
test bench_tx0256    ... bench:  28,358,658 ns/iter (+/- 3,042,093)
Capacity: 24896

(That last one was showing pretty large variance from run to run... as low as 18K, as high as 30.)

I set the CPU scaling governor to performance and SIGSTOPped other applications, HT is enabled and Turbo Boost is enabled.

@mtak-
Copy link
Owner Author

mtak- commented Apr 7, 2019

Much appreciated! That drop at 64 is also present in your benchmarks.

The capacity variation is normal. There's some number it won't exceed, which is suppose to be the size of the L1 data cache, but it's nice to confirm those assumptions (confirmed by your benchmarks 32KB).

It'll be interesting to see how intel's post-meltdown/spectre HTM performs (i9). I'll have access to an i9 in a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants