Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High System Load with Low CPU Utilization on Linux? | Tanel Poder Consulting #14

Open
utterances-bot opened this issue Dec 3, 2020 · 13 comments

Comments

@utterances-bot
Copy link

High System Load with Low CPU Utilization on Linux? | Tanel Poder Consulting

https://tanelpoder.com/posts/high-system-load-low-cpu-utilization-on-linux/

Copy link

thanks for you awesome post!
could you let me translate into chinese? thanks anyway

Copy link
Owner

Yes, please do - and link to my post as the source. Please send me your translation's link too, so I can add it here! This is great, thanks.

Copy link

chestack commented Aug 4, 2021

@wanghenshui, henshui 同学 have you finished the Chinese translation looking forward to it

Copy link

That is so funny! Today I investigated slow fync(), and used ftrace to find xfs_log_force_lsn being the culprit!

Copy link

I've been looking into a high load issue (high sys cpu with many blocking procs too, mainly sshd), that seems to implicate xlog_grant_head_wait() lately. Still working to further explain it, but it seems somewhat related.

@tanelpoder
Copy link
Owner

What OS kernel version? This seemed familiar from past (I think I just read about it and didn't see it myself) and there are some known bugs/race conditions causing "retries" around it.

Copy link

raydenz commented May 4, 2022

Hello,

Thank you
but in my case i get load average : 60.11, 59.19, 58.73
but when i run sudo ps -eo s,user,cmd | grep ^[RD] | sort | uniq -c | sort -nbr
i find only 10 threads :-(


5 R oracle
3 R root
2 D oracle


i run it multiple times, i always have ~10 threads
my database server has a load average of ~60 all day when i check on my monitoring tool (Zabbix)

I don't know why i can't list all the 60 threads ??

regards

Copy link
Owner

Let's drill down a little, to rule out a scenario where the monitoring tool does something wrong and just use a standard Linux command and go direct to the source of system load average info too. Please post the output of this:

w
cat /proc/loadavg 
ps -eo s,user | sort | uniq -c | sort -nbr

@raydenz
Copy link

raydenz commented May 9, 2022

hello
Sorry for the late reply.

dosiadm@xxxx# w
09:18:41 up 236 days, 18:42, 1 user, load average: 34.59, 36.01, 37.98
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
dosiadm pts/1 snt1lpstd06-s.sn 09:18 0.00s 0.08s 0.06s w

dosiadm@xxxx# cat /proc/loadavg
34.56 35.72 37.75 9/1668 36329

dosiadm@xxxx# # ps -eo s,user | sort | uniq -c | sort -nbr
1075 S root
201 S oracle
14 D oracle
10 R oracle
5 S zabbix
5 S dosiadm
3 S ibox
2 S postfix
1 S sysload
1 S rpcuser
1 S rpc
1 S ntp
1 S nagios
1 S guardium
1 S dbus
1 S USER
1 R dosiadm

dosiadm@xxx# ps -eo s,user,cmd | grep ^[RD] | sort | uniq -c | sort -nbr
17 D oracle oraclePSBL10R0 (LOCAL=NO)
4 R oracle oraclePSBL10R0 (LOCAL=NO)
1 R dosiadm ps -eo s,user,cmd
1 D oracle ora_mmon_PSBL10R0

regards

Copy link

Hi Tanel,
We met a strange issue,the MySQL Server Load will climb high every two hours,but the cpu/memory/io is normal,qps/tps did not change much.

The following is the stack info,could you please help to take a look?

Thanks

samples | avg_threads | comm | state | syscall | wchan | kstack

   8 |        0.20 | (mysqld)      | Running (ON CPU)       | [running]       | 0                     | -                                                                                                                                                                             
   7 |        0.17 | (mysqld)      | Disk (Uninterruptible) | pwrite64        | do_blockdev_direct_IO | system_call_fastpath()->SyS_pwrite64()->vfs_write()->do_sync_write()->xfs_file_aio_write()->xfs_file_dio_aio_write()->__blockdev_direct_IO()->do_blockdev_direct_IO()         
   7 |        0.17 | (ossutil*)    | Running (ON CPU)       | futex           | futex_wait_queue_me   | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                          
   5 |        0.12 | (ksoftirqd/*) | Running (ON CPU)       | [running]       | 0                     | ret_from_fork_nospec_end()->kthread()->smpboot_thread_fn()                                                                                                                    
   5 |        0.12 | (mysqld)      | Disk (Uninterruptible) | fsync           | xfs_log_force_lsn     | system_call_fastpath()->SyS_fsync()->do_fsync()->xfs_file_fsync()->xfs_log_force_lsn()                                                                                        
   5 |        0.12 | (mysqld)      | Running (ON CPU)       | futex           | futex_wait_queue_me   | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                          
   4 |        0.10 | (mysqld)      | Running (ON CPU)       | [running]       | 0                     | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                          
   2 |        0.05 | (mysqld)      | Disk (Uninterruptible) | fsync           | blkdev_issue_flush    | system_call_fastpath()->SyS_fsync()->do_fsync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()                                                             
   1 |        0.02 | (ksoftirqd/*) | Running (ON CPU)       | [kernel_thread] | smpboot_thread_fn     | ret_from_fork_nospec_end()->kthread()->smpboot_thread_fn()                                                                                                                    
   1 |        0.02 | (kswapd*)     | Running (ON CPU)       | [running]       | 0                     | -                                                                                                                                                                             
   1 |        0.02 | (mysqld)      | Disk (Uninterruptible) | fsync           | 0                     | system_call_fastpath()->SyS_fsync()->do_fsync()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()                                                             
   1 |        0.02 | (mysqld)      | Disk (Uninterruptible) | fsync           | wait_on_page_bit      | system_call_fastpath()->SyS_fsync()->do_fsync()->xfs_file_fsync()->filemap_write_and_wait_range()->filemap_fdatawait_range()->__filemap_fdatawait_range()->wait_on_page_bit() 
   1 |        0.02 | (mysqld)      | Disk (Uninterruptible) | futex           | 0                     | system_call_fastpath()->SyS_pwrite64()->vfs_write()->do_sync_write()->xfs_file_aio_write()->xfs_file_dio_aio_write()->__blockdev_direct_IO()->do_blockdev_direct_IO()         
   1 |        0.02 | (mysqld)      | Disk (Uninterruptible) | pread64         | do_blockdev_direct_IO | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()              
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | [running]       | 0utex_wait_queue_me   | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                          
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | [running]       | hrtimer_nanosleep     | system_call_fastpath()->SyS_nanosleep()->hrtimer_nanosleep()                                                                                                                  
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | futex           | 0                     | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                          
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | futex           | futex_wait_queue_me   | do_futex()->futex_wait()->futex_wait_queue_me()                                                                                                                               
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | futex           | futex_wait_queue_me   | system_call_fastpath()->SyS_pread64()->vfs_read()->do_sync_read()->xfs_file_aio_read()->xfs_file_dio_aio_read()->__blockdev_direct_IO()->do_blockdev_direct_IO()              
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | io_getevents    | read_events           | system_call_fastpath()->SyS_io_getevents()->read_events()                                                                                                                     
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | io_submit       | 0                     | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                          
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | nanosleep       | 0                     | -                                                                                                                                                                             
   1 |        0.02 | (mysqld)      | Running (ON CPU)       | pread64         | 0                     | system_call_fastpath()->SyS_futex()->do_futex()->futex_wait()->futex_wait_queue_me()                                                                                          

@tanelpoder
Copy link
Owner

How high did the load go - adding up these "average active threads" numbers (and a probably a long tail of other output lines) doesn't seem to add to a high load (like 50+ or hundreds?)

Copy link

Hi Tanel,
on the MySQLmaster server,the load go from 2 to 10,on slave ,load go from 1 to 5,we are wondering why load go high every two hours,we dont' have cron jobs at all.

Copy link

Awesome read! I stumbled upon this article when researching similar problem and this helped drill inside, although I didn t solve my problem yet.
I have two identical servers one with ubuntu 16 and other with 20. Under the similar load through fio, which emulates kind a workload they handle the ubuntu 20 behaves like 2-3x worse. Still trying to figure out what is going on with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants