a2xx/a3xx/a4xx: gpu fixes / opp table Samsung J5 #332

chr-chr · 2023-10-15T21:47:32Z

Hi,

My aim was simply to pretty up the gpu frequencies table. That raised bugs.

update MSM8916 gpu frequency table (dtsi)
fix msm_gpu PM clk handling
fix gcc-msm8916 clk
fix a3xx gpu busy counter
new msm_gpu_devfreq load meter (simple avg)

Need more testing on other devices. May uncover or fix bugs on other devices:

adreno 1c00000.gpu: _opp_config_clk_single: failed to set clock rate: -22 EINVAL

<...>: rcg didn't update its configuration

Please check if your device gpu ever uses the between frequencies:

cd /sys/class/devfreq/1c00000.gpu
watch -d grep . *

Apply devfreq patch, for the load meter!

The gpu busy counter was broken for me, wrong reg.
I fixed it for a3xx
a4xx may be broken aswell?
a5xx ... pls test
a6xx ... pls test

Before I got:

available_frequencies:19200000 400000000
available_governors:userspace powersave performance simple_ondemand
cur_freq:400000000
governor:simple_ondemand
max_freq:400000000
min_freq:19200000
name:1c00000.gpu
polling_interval:50
target_freq:400000000
timer:delayed
trans_stat:     From  :   To
trans_stat:           :  19200000 400000000   time(ms)
trans_stat:   19200000:         0         2         0
trans_stat:* 400000000:         2         0    925683
trans_stat:Total transition : 4

after:

available_governors:userspace powersave performance simple_ondemand
cur_freq:200000000
governor:simple_ondemand
load:65.08 %
max_freq:465000000
min_freq:19200000
name:1c00000.gpu
polling_interval:50
target_freq:200000000
timer:delayed
trans_stat:     From  :   To
trans_stat:           :  19200000 100000000 200000000 310000000 400000000 465000000   time(ms
)
trans_stat:   19200000:         0         0         0         0         0         1       623
trans_stat:  100000000:         0         0         0         0         0         3      4036
trans_stat:* 200000000:         0         3         0         0         0         2     11750
trans_stat:  310000000:         0         0         6         0         0         0     22790
trans_stat:  400000000:         0         0         0         5         0         0     10583
trans_stat:  465000000:         0         0         0         1         5         0     19733
trans_stat:Total transition : 26

The overclocking is optional ;)

testing cheat sheet:

full fps test:
vblank_mode=1 glxgears -fullscreen

and in vsync mode we want 300 frames/5s 60pfs:
glxgears -fullscreen
works well with 200MHz only (X11).

posh/wayland wants 400MHz for 60fps
note about a possible frame drop to 30fps/200MHz in wayland:
fine tune in /sys/kernel/debug/dri/0/devfreq *)
On the other hand: power cons. dropps ~150mW (!)

*) downdifferential < upthreshold < 100 (%)

Setting bus speed "opp-peak-kBps" needs interconnect frame work, which is broken. I dropped the settings as comments.

stephan-gh · 2023-10-16T08:28:12Z

AFAIK the 465 MHz GPU is basically only meant for MSM8916v2 (aka Snapdragon 412). Which is pretty much only used in the BQ Aquaris X5 (longcheer-l8910). I doubt you will find the speedbin set on any other device.
Can you clarify why the opp-suspend is necessary? I don't quite understand it yet.

chr-chr · 2023-10-16T16:02:27Z

AFAIK the 465 MHz GPU is basically only meant for MSM8916v2 (aka Snapdragon 412). Which is pretty much only used in the BQ Aquaris X5 (longcheer-l8910). I doubt you will find the speedbin set on any other device.

Well, at least it works. My devices shows up:

adreno 1c00000.gpu: speed-bin version: 1 value: 0

I was looking at an LineageOS-custom ROM I 'm using and I realized they are happily overclocking CPU and GPU .... wtf?
However my intention was to engage the in between values to get power savings. However, actually they don't get picked: gpu+devfreq/governor is broken. To test set max value.

* Can you clarify why the `opp-suspend` is necessary? I don't quite understand it yet.

It's a hint for devfreq_suspend_device / resume handlers, they will call then devfreq_set_target()

And I must hotfix that ... bug: with my patch the gpu may not ramp up at all from suspend

stephan-gh · 2023-10-16T17:18:22Z

Note that implementing/extending frequency scaling does not necessarily result in power saving. It's always a balance between powering off [the GPU] completely when it is idle vs. keeping it on at a lower frequency. When [the GPU] is powered off (suspended), the frequency shouldn't matter because the clock should get disabled entirely. In powered off/suspended state it should also consume the least amount of power (ideally none).

This means there are two possibilities if you consider a lightly loaded system (with not much to do for the CPU or GPU):

You can run the GPU at maximum frequency and try to finish the work quickly so that you can quickly power off the GPU entirely again.
You can reduce the GPU frequency. This means the GPU will consume less power, but it will also take longer to finish the work.

It's hard to generalize which one of these is better. I would expect (2) is better if you have a lightly loaded system with frequent small periodic tasks, which makes it too expensive to power off the GPU inbetween. Otherwise (1) could also be better in many cases.

If you want to be sure that you are actually improving power consumption I would strongly recommend that you try to measure the power consumption with and without your changes. Otherwise you might end up making power consumption worse. :')

When the GPU is powered off (suspended), the frequency shouldn't matter because the clock should get disabled entirely.

BTW, given ^, I'm still not sure why you need the opp-suspend. The frequency of the GPU shouldn't matter if it gets powered off/suspended. I would expect that you are more likely to slow down the suspend/resume procedure which would waste power.

chr-chr · 2023-10-16T19:24:44Z

Hi, thanks for the comments

If you want to be sure that you are actually improving power consumption I would strongly recommend that you try to measure the power consumption with and without your changes. Otherwise you might end up making power consumption worse. :')

True, this is why I had to make brightness control working because it's consuming the most power:
Starting glxgears in fullscreen surprisingly dropped the power consumption ... which reminds me that's an OLED display :)

Actually lowering freq seems to reduce power consumption a bit: 200MHz are ok for 60fps glxgears (synced).
Still there's the option to switch the bus speed, that call is actually a nop because this interconnect framework isn't enabled .... and it's broken :/

I'm still not sure why you need the opp-suspend. The frequency of the GPU shouldn't matter if it gets powered off/suspended.

True, but even beforehand the clock was set without opp-suspend in suspend/resume handlers. So I just kept it that way. And the mistake was that setting the freq on a disabled clock locks up.

Remember, this is a draft, I'm still not 100% happy with that ;)

chr-chr · 2023-10-18T17:59:42Z

Good news!

The busy counter is broken and I found out why and got it fixed!
Pushing soon ... there's still just one issue ;)

Meanwhile my 2 cents on this:

You can run the GPU at maximum frequency and try to finish the work quickly so that you can quickly power off the GPU entirely again.

Thats the way it's implemented.
Why? My assumtions:
No other way for powersaving! Why?
Because there were just 2 steps available: min and max. Why??
Well, I smell they were facing the same lockups and OOpses I got while trying ...

Also the governor on_demand wasn't made for this. To cheat around is ugly.
I'd use the passive instead or:

You can reduce the GPU frequency. This means the GPU will consume less power, but it will also take longer to finish the work.

And I assume this will make a better power saving. My rought measures seems to prove -99mW.
Actually 1) turns /sys/class/power_supply/sm5703_fg/current_now into a great source of entropy.

Signed-off-by: Christoph Rudorff <chris@rudorff.com>

On my device that calc. was always >100% load got this from upstream, and a4xx may be bugged aswell the /sys/kernel/debug/dri/0/perf does work ... Signed-off-by: Christoph Rudorff <chris@rudorff.com>

Signed-off-by: Christoph Rudorff <chris@rudorff.com>

adreno 1c00000.gpu: _opp_config_clk_single: failed to set clock rate: -22 EINVAL

a3xx_set_supported_hw() set opp-supported-hw from fuse value, defaults to 0x1 in case the value wasn't set. Signed-off-by: Christoph Rudorff <chris@rudorff.com>

gfx3d_clk_src rcg didn't update its configuration. set clk_enable before set_rate set "suspend rate" before clk_disable Signed-off-by: Christoph Rudorff <chris@rudorff.com>

Signed-off-by: Christoph Rudorff <chris@rudorff.com>

for fine tuning: /sys/kernel/debug/dri/0/devfreq/ Signed-off-by: Christoph Rudorff <chris@rudorff.com>

chr-chr · 2023-10-21T16:54:00Z

Ok, here is my state of the art, tested and works very well on my device, finally.

So, the impact is greater than I expected.
Looking at history, I'm not the first one making this more crazy. oO Am I?

All in all this reverts msm_gpu_devfreq pretty down to some "more cleverness", according to the logs. Am I?

stephan-gh · 2023-10-21T18:09:06Z

Nice! I'm afraid at this point you would need to convince the upstream freedreno people about this though, since the changes are a bit too extensive to maintain downstream. There are some minor cosmetic things I could point out but aside from that I don't have enough experience with the Adreno GPU drivers to review this properly.

chr-chr · 2023-10-21T18:21:43Z

Nice! I'm afraid at this point you would need to convince the upstream freedreno people about this though

Well, most convincing would be having more positive test reports.

I do see power savings.

Add a test case which replaces an active ingress qdisc while keeping the miniq in-tact during the transition period to the new clsact qdisc. # ./vmtest.sh -- ./test_progs -t tc_link [...] ./test_progs -t tc_link [ 3.412871] bpf_testmod: loading out-of-tree module taints kernel. [ 3.413343] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #332 tc_links_after:OK #333 tc_links_append:OK #334 tc_links_basic:OK #335 tc_links_before:OK #336 tc_links_chain_classic:OK #337 tc_links_chain_mixed:OK #338 tc_links_dev_chain0:OK #339 tc_links_dev_cleanup:OK #340 tc_links_dev_mixed:OK #341 tc_links_ingress:OK #342 tc_links_invalid:OK #343 tc_links_prepend:OK #344 tc_links_replace:OK #345 tc_links_revision:OK Summary: 14/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240708133130.11609-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

chr-chr force-pushed the J5-hacking branch from 5184df5 to c0b8642 Compare October 16, 2023 16:51

chr-chr changed the title ~~SM-J5: gpu fixes~~ a2xx/a3xx/a4xx: gpu fixes / opp table Samsung J5 Oct 17, 2023

chr-chr force-pushed the J5-hacking branch from c0b8642 to c48aba5 Compare October 17, 2023 17:39

Christoph Rudorff added 8 commits October 21, 2023 18:24

devfreq: show load in sysfs

bbe2ad3

Signed-off-by: Christoph Rudorff <chris@rudorff.com>

a3xx/...: does a3xx_gpu_busy() work?!

28ebf3f

On my device that calc. was always >100% load got this from upstream, and a4xx may be bugged aswell the /sys/kernel/debug/dri/0/perf does work ... Signed-off-by: Christoph Rudorff <chris@rudorff.com>

dts: msm8916 update gpu_opp_table

2bb5663

Signed-off-by: Christoph Rudorff <chris@rudorff.com>

gcc-msm8916: fix clk setup

8c589f7

adreno 1c00000.gpu: _opp_config_clk_single: failed to set clock rate: -22 EINVAL

drm/msm/a3xx: use speed-bin to setup opp

0be406c

a3xx_set_supported_hw() set opp-supported-hw from fuse value, defaults to 0x1 in case the value wasn't set. Signed-off-by: Christoph Rudorff <chris@rudorff.com>

drm/msm/gpu: fix clk lockup PM suspend/resume handler

a40b139

gfx3d_clk_src rcg didn't update its configuration. set clk_enable before set_rate set "suspend rate" before clk_disable Signed-off-by: Christoph Rudorff <chris@rudorff.com>

revert msm_gpu_devfreq to something simple

3afa529

Signed-off-by: Christoph Rudorff <chris@rudorff.com>

msm_gpu_devfreq: report average load

43c0e98

for fine tuning: /sys/kernel/debug/dri/0/devfreq/ Signed-off-by: Christoph Rudorff <chris@rudorff.com>

chr-chr force-pushed the J5-hacking branch from c48aba5 to 43c0e98 Compare October 21, 2023 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a2xx/a3xx/a4xx: gpu fixes / opp table Samsung J5 #332

a2xx/a3xx/a4xx: gpu fixes / opp table Samsung J5 #332

chr-chr commented Oct 15, 2023 •

edited

Loading

stephan-gh commented Oct 16, 2023

chr-chr commented Oct 16, 2023

stephan-gh commented Oct 16, 2023

chr-chr commented Oct 16, 2023

chr-chr commented Oct 18, 2023

chr-chr commented Oct 21, 2023 •

edited

Loading

stephan-gh commented Oct 21, 2023

chr-chr commented Oct 21, 2023

a2xx/a3xx/a4xx: gpu fixes / opp table Samsung J5 #332

Are you sure you want to change the base?

a2xx/a3xx/a4xx: gpu fixes / opp table Samsung J5 #332

Conversation

chr-chr commented Oct 15, 2023 • edited Loading

stephan-gh commented Oct 16, 2023

chr-chr commented Oct 16, 2023

stephan-gh commented Oct 16, 2023

chr-chr commented Oct 16, 2023

chr-chr commented Oct 18, 2023

chr-chr commented Oct 21, 2023 • edited Loading

stephan-gh commented Oct 21, 2023

chr-chr commented Oct 21, 2023

chr-chr commented Oct 15, 2023 •

edited

Loading

chr-chr commented Oct 21, 2023 •

edited

Loading