Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[do not merge] rhealstone benchmark #1240

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

lukileczo
Copy link
Member

Description

Motivation and Context

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Chore (refactoring, style fixes, git/CI config, submodule management, no code logic changes)

How Has This Been Tested?

  • Already covered by automatic testing.
  • New test added: (add PR link here).
  • Tested by hand on: (list targets here).

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing linter checks and tests passed.
  • My changes generate no new compilation warnings for any of the targets.

Special treatment

  • This PR needs additional PRs to work (list the PRs, preferably in merge-order).
  • I will merge this PR by myself when appropriate.

#define FTMCTRL_BASE 0xff903000


#define RAM_ADDR 0x07000000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[clang-format-pr] reported by reviewdog 🐶
suggested fix

Suggested change
#define RAM_ADDR 0x07000000
#define RAM_ADDR 0x07000000


ROOTFS="$PREFIX_BOOT/rootfs.jffs2"

local erase_sz=$(image_builder.py query --nvm "$NVM_CONFIG" '{{ nvm.flash0._meta.block_size }}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [shellcheck] reported by reviewdog 🐶
Declare and assign separately to avoid masking return values. SC2155

mv "$ROOTFS.tmp" "$ROOTFS"
fi

local FS_OFFS=$(image_builder.py query --nvm "$NVM_CONFIG" '{{ nvm.flash0.rootfs.offs }}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [shellcheck] reported by reviewdog 🐶
Declare and assign separately to avoid masking return values. SC2155

fi

local FS_OFFS=$(image_builder.py query --nvm "$NVM_CONFIG" '{{ nvm.flash0.rootfs.offs }}')
local FS_SZ=$(image_builder.py query --nvm "$NVM_CONFIG" '{{ nvm.flash0.rootfs.size }}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [shellcheck] reported by reviewdog 🐶
Declare and assign separately to avoid masking return values. SC2155

@@ -63,10 +61,11 @@ b_image_target() {

ROOTFS="$PREFIX_BOOT/rootfs.jffs2"

local erase_sz=$(image_builder.py query --nvm "$NVM_CONFIG" '{{ nvm.flash0._meta.block_size }}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [shellcheck] reported by reviewdog 🐶
Declare and assign separately to avoid masking return values. SC2155

[ "${BASH_SOURCE[0]}" -ef "$0" ] && echo "You should source this script, not execute it!" && exit 1


FLASH_SZ=$((0x8000000))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [shellcheck] reported by reviewdog 🐶
FLASH_SZ appears unused. Verify use (or export if used externally). SC2034



FLASH_SZ=$((0x8000000))
ROOTFS_SZ=$((0x800000))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [shellcheck] reported by reviewdog 🐶
ROOTFS_SZ appears unused. Verify use (or export if used externally). SC2034

Copy link
Member

@agkaminski agkaminski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General remark - watch out for atomics, these might have some heavy overheads (including unexpected mutex in some cases). Perhaps it's ok, but we can't be sure without inspecting resulting binary


bool deadBrk;
atomic_int count = 0;
atomic_bool done = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be volatile, but I'm not sure what exact effect it has on atomic_s

Comment on lines 117 to 125
benchStart = getCntr();

priority(4);

usleep(0);

threadJoin(tid1, 0);

benchEnd = getCntr();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduce 2 syscalls overhead here that skews the result. Perhaps we could measure the time in task1 instead?


#define BENCHMARKS 10000

atomic_uint_least64_t benchEnd = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atomic might give unwanted overhead, especially 64 bit one - we trigger the interrupt via SW, so should be fine to use plain volatile


int irqHandler(unsigned int n, void *arg)
{
if (n == IRQ_UNUSED) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if is not strictly needed - we provide n argument as a way for one handler to handle more than one interrupt, check on n is not enforced

{
BENCH_NAME("Interrupt latency");

uint32_t *irqCtrl = mmap(NULL, _PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_DEVICE | MAP_PHYSMEM | MAP_ANONYMOUS, -1, (uintptr_t)INT_CTRL_BASE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be volatile, actually it's kinda weird that the access to the irqCtrl hasn't been optimized out. Are we sure we have -O2?

Comment on lines 149 to 159
uint64_t loopOverhead = getCntr();

for (volatile int i = 0; i < BENCHMARKS; i++) {
}

for (volatile int i = 0; i < BENCHMARKS; i++) {
}

loopOverhead = getCntr() - loopOverhead;

uint64_t joinOverhead = threadJoinOverhead();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for performance calibration, I assume. nop version might be better, we use non-volatile iterator in the use-case

Comment on lines 172 to 180
uint64_t benchStart = getCntr();

priority(4);
usleep(0);

threadJoin(tid1, 0);
threadJoin(tid2, 0);

uint64_t benchEnd = getCntr();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could store start time in task1 and end time in task2? This would eliminate 3 syscalls and context switch overheads from the test

Comment on lines 69 to 78
for (volatile int cnt2 = 0; cnt2 < MAX_LOOPS; cnt2++) {
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this one is for task2. Then nop version is more adequate

Comment on lines 90 to 98
benchStart = getCntr();
priority(4);

usleep(0);

threadJoin(tid1, 0);
threadJoin(tid2, 0);

benchEnd = getCntr();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as other cases, perhaps time the execution in threads instead

Comment on lines 68 to 65
for (volatile unsigned int i = 0; i < MAX_LOOPS; i++) {
/* usleep(0); */
}
for (volatile unsigned int i = 0; i < MAX_LOOPS; i++) {
/* usleep(0); */
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho volatile -> nop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants