Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default thread heuristic no longer a good choice in Mac OSX Sequoia #2925

Open
1 task done
jchien14 opened this issue Nov 7, 2024 · 6 comments
Open
1 task done

Comments

@jchien14
Copy link

jchien14 commented Nov 7, 2024

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

x86_64 ripgrep 14.1.1

How did you install ripgrep?

Homebrew

What operating system are you using ripgrep on?

macOS 15.1

Describe your bug.

I upgraded from macOS 14.7 to 15.1 last night. Since then, I noticed ripgrep on my repo regressed from ~2s a search to ~10s

rg foo 1.28s user 108.10s system 1104% cpu 9.901 total
rg foo 1.40s user 85.94s system 882% cpu 9.898 total

From empirically testing, it appears that --threads 3 provides the ideal behavior (my CPU has 14 cores)

rg --threads 4 foo  0.78s user 7.55s system 360% cpu 2.307 total
rg --threads 4 foo  0.76s user 7.64s system 392% cpu 2.143 total
rg --threads 4 foo  0.77s user 7.38s system 389% cpu 2.092 total
rg --threads 3 foo  0.76s user 5.48s system 286% cpu 2.179 total
rg --threads 3 foo  0.76s user 5.98s system 290% cpu 2.321 total
rg --threads 3 foo  0.75s user 5.78s system 289% cpu 2.255 total
rg --threads 3 foo  0.78s user 5.89s system 293% cpu 2.270 total

It looks like the current default is 12. Not sure what changed in 15.1 but two other people reproed this as well upon upgrading, one with a laptop with most system extensions uninstalled, so I don't think it's a bad interaction with a piece of third-party software on 15.1.

What are the steps to reproduce the behavior?

Run any ripgrep search without specifying the number of threads on 15.1

What is the actual behavior?

Slow query, 10s in a relatively large repository I tested

What is the expected behavior?

~1-2 seconds on OSX. For reference, ripgrepping this repo on Ubuntu is ~600ms

@BurntSushi
Copy link
Owner

Can you provide a benchmark I can run please?

@BurntSushi
Copy link
Owner

2s -> 10s feels like a mammoth change. Feels like something else is going on here personally.

I doubt the default heuristic is going to change any time soon, so you'll definitely want to make a ripgrep config file or something with your preferred default.

@jchien14
Copy link
Author

jchien14 commented Nov 7, 2024

Not sure what you mean by a benchmark -- unfortunately, I don't have a framework or anything, just a few reports amongst coworkers.

Here's a process sample (16 threads)
rg_2024-11-07_101925_9S3I.sample.txt
I wonder if 15.1 has worse max parallelism at opening and closing file handles?

Currently working around it with a config file to get the previous performance with the config file, but wanted to document this in case anyone else hits it

@BurntSushi
Copy link
Owner

This is your benchmark:

I noticed ripgrep on my repo regressed from ~2s a search to ~10s

But I can't run this. Can you share a benchmark that I can run please? For example, consider trying on an open source repository that we can both access. Like maybe the Linux kernel or Chromium.

The first step in diagnosing performance problems is having a shared reality. Sometimes that's not possible. But if the problem here is truly as general as "more threads is dramatically slower," then it should ideally reproduce on another repo.

@BurntSushi
Copy link
Owner

And yes, thank you for reporting this! I appreciate it.

@jchien14
Copy link
Author

jchien14 commented Nov 8, 2024

Following the instructions here for a fresh checkout of Chromium on Mac: https://chromium.googlesource.com/chromium/src/+/main/docs/mac_build_instructions.md#Install

Results are quite different:

rg --threads 12 testtesttest  4.67s user 263.67s system 893% cpu 30.029 total
rg --threads 11 testtesttest  5.53s user 194.23s system 709% cpu 28.156 total
rg --threads 10 testtesttest  5.32s user 105.07s system 524% cpu 21.043 total
rg --threads 9 testtesttest  5.02s user 70.38s system 404% cpu 18.660 total
rg --threads 8 testtesttest  4.83s user 57.36s system 327% cpu 18.968 total
rg --threads 7 testtesttest  4.71s user 51.42s system 272% cpu 20.628 total
rg --threads 6 testtesttest  4.60s user 47.50s system 230% cpu 22.561 total
rg --threads 5 testtesttest  4.41s user 43.28s system 184% cpu 25.837 total
rg --threads 4 testtesttest  4.58s user 42.68s system 146% cpu 32.208 total
rg --threads 3 testtesttest  5.20s user 46.57s system 117% cpu 43.994 total

Don't have much time until the weekend to do more samples than n=1, and also investigate the differences in file size/# files that might account for it (the chromium repo is much bigger than the one I was testing on).

It is quite interesting that there's appears there's an optimum in terms of both wall-time that's different than the optimum for CPU usage (for the repo I was testing on they were the same). Performance starts going down significantly after the optimum. Unfortunately, I no longer can test on 14.7 to compare.

ddeville added a commit to ddeville/scripts that referenced this issue Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants