Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Injecting Unwanted Timeout errors is simply unreasonable #867

Closed
duaneking opened this issue Nov 7, 2024 · 7 comments
Closed

Injecting Unwanted Timeout errors is simply unreasonable #867

duaneking opened this issue Nov 7, 2024 · 7 comments

Comments

@duaneking
Copy link

Describe the bug
I'm using CLOC to run reports on the open source chromium code base. My system in question has the fastest SSD we can buy, and I have over 128gb of RAM, so that's not a problem either.

My first scan was:

cloc --report-file=cloc.md --md .

That created errors:

This did not: cloc --timeout 10 --report-file=cloc.md --md .

The fact it created errors when I did not enable timeouts is the core issue. The software should simply not be injecting faults unless its asked to. I did not ask it to.

I did not use the --timeout param because I did not know I needed it; only to have a long running scan destroyed because the injected timeout is dumb and uses an algorithm seemingly designed to make builds fail for no reason. I don't expect any kind of algorithm for timeouts to be enabled unless I enable it explicitly.

  1. The --timeout stuff is poorly documented and enforces additional constraints that a reasonable developer would not expect to exist. This feature should not exist unless its turned on, as injecting timeout errors I never expected to have happen on a dedicated build machine with the most modern hardware possible should not be happening... unless I turn it on. I did not turn it on.

  2. A fast scan of the help output should have given me the knowledge I needed to mitigate having to run a full scan of the chromium code base over several hours only to get errors back as part of a fully wasted report that was full of time; Instead it acts like it's not a big deal and directs me to a different param I'm not using. The docs for that parameter are even more confusing, And while they definitely seem related - But very separately worded- the cold hard fact is this is a different type of algorithm that should also not be running unless I configure it so enabling it and then expecting me to turn it off using poorly explained flags is simply unreasonable.

  3. The default timeout, based in my reading of the docs, should be 0 / zero to turn this off, with the user allowed to override as they see fit to enable the features; As it stands right now this feature is turned on by default, and the default value of what looks like 10 isn't even being respected, and that's not documented and that's creating wasted time and wasted money. Yes, my use of the 10 second value was intentional, I did that to show that the documented default timeout is not being respected, either. The feature is clearly broken.

  4. Number 3 above is also the most in line with software engineering excellence best practices. Don't make my build fail for no reason other than because you think I have too many new lines in the file. That's dumb, And the constant issues that this feature seems to have based on my little bit of research - for JSON files and other things especially - kind of proves my point.

cloc; OS; OS version

Most recent windows/ WinGetEXE from download page downloaded today on a Win11Pro Windows Build machine.

> cloc --version
2.02

To Reproduce

I suspect this will reproduce in any large code base, but I was using the Chromium Codebase. Download the open source chromium code base, fetch, sync with gclient, etc. to get that code.

Refer to the Chromium Docs: https://chromium.googlesource.com/chromium/src/+/HEAD/docs/windows_build_instructions.md

Once you have the code, cd to that directory and run:

cloc --report-file=cloc.md --md .
  401403 text files.
  333417 unique files.
  114995 files ignored.

5 errors:
Line count, exceeded timeout:  ./chrome/browser/file_select_helper_unittest.cc
Line count, exceeded timeout:  ./chrome/updater/constants.h
Line count, exceeded timeout:  ./third_party/chromevox/third_party/sre/sre_browser.js
Line count, exceeded timeout:  ./third_party/webxr_test_pages/webxr-samples/js/webxr-polyfill.js
Line count, exceeded timeout:  ./third_party/webxr_test_pages/webxr-samples/js/webxr-polyfill.module.js
Wrote cloc.md

These errors simply should not be happening. This is a defect. I never asked for these errors to be injected. My expectation is that it would read the file no matter how big it was leverage the 128 gigs of RAM in the system to do what it needed to do and then give me the report. I did not ask it to respect any kind of timeout.

Expected result

I expected the following:

cloc --report-file=cloc.md --md .
  401403 text files.
  333417 unique files.
  114995 files ignored.
Wrote cloc.md

Instead the tool added extra errors that don't really exist, changed the output from what I was expecting, broke a build system, wasted hours of my time, changed the report output, and honestly just created a lot of problems that shouldn't exist on sane systems with same defaults that follow best engineering practices.

Additional context

If I had used the timeout parameters in any way, then yeah let's enable that feature and make sure it works correctly... but because I wasn't using any of the timeout parameters in any way, shape, or form, that should not impact my scan at all and it is a DEFECT for those features to impact my scan if I do not turn them on.

@AlDanial
Copy link
Owner

AlDanial commented Nov 9, 2024

Lots going on here. To start, built-in timeouts were added to cloc long ago to protect users from runaway regular expressions from effectively causing cloc to freeze. Here's a simple example from the chromium code base: first run cloc on just the file chromium/chrome/updater/tag_unittest.cc . Takes a tiny fraction of a second. Now edit the file and insert as the first line char y[] = "*/a/*"; and rerun cloc against the file. Takes 200x longer.

The point is that some combinations of characters drastically alter the count time. No user is expected to know that, though,hence the protective measure of the timeouts.

When you said "having to run a full scan of the chromium code base over several hours" are you saying your cloc run of the chromium code base took several hours? On my Dell Alienware R16 running Ubuntu 24.04, cloc takes less than three minutes to run (and comes up with similar file counts as you show above). It's harder for me to test on Windows but I'll try that later.

If you have wording suggestions on improving the documentation I'll certainly consider updating it.

@loreb
Copy link

loreb commented Nov 9, 2024

Just my 2 cents... rather than rewording I would suggest spelling out the suggested workaround, ie if you detect any timeouts add a recommendation at the end along the lines of "timeouts are needed to protect from pathological cases, see bugs 536 & 867; consider clock --timeout 123... or --timeout -1 to disable timeouts altogether".

@AlDanial
Copy link
Owner

@loreb : I like your idea and will implement that.

Regarding cloc on Windows: I'm running on a 2015 Dell XPS 13 laptop, Windows 11 Home, and the cloc 2.02 exe. It is painfully slow on the chromium code base. I can see how this will take hours. Hard to say if the old hardware, the Windows OS, or cloc is to blame.

@duaneking
Copy link
Author

Lots going on here. To start, built-in timeouts were added to cloc long ago to protect users from runaway regular expressions

Then with all due respect, I would consider the runaway usage of regular expressions to be a problem as well. Well written Regular Expressions should not get "stuck", so that would be a legitimate defect, correct? This injection of faults to cover up and event after a regular expression that might have a defect in it breaks and gets stuck, sounds like a legitimate issue that should also be considered to me.

Now edit the file and insert as the first line char y[] = "*/a/*"; and rerun cloc against the file. Takes 200x longer.

Thank you for showing that this is so brittle. That's an excellent test case, I vote it be part of unit tests if its not already.

The point is that some combinations of characters drastically alter the count time. No user is expected to know that, though,hence the protective measure of the timeouts.

With respect, That is in itself a problem, but I also think perf is secondary to functionality in terms of priority. The timeouts are not perceived by myself or others I have spoken to as protective or mitigating problems, they are perceived as unreasonable injection of artificial errors to mitigate the side effects of other errors, and worse per your comments above, they also hide defects under it and make the application more brittle. I respectfully understand that that may not be your perspective, but keep in mind as a user without your perspective as a core developer for cloc, that none of this is documented and even the help output requires as part of Unix that only parameters you use impact your output. This is fundamental in unix, which you seem to want to respect, so I'm calling it out.

When you said "having to run a full scan of the chromium code base over several hours" are you saying your cloc run of the chromium code base took several hours? On my Dell Alienware R16 running Ubuntu 24.04, cloc takes less than three minutes to run (and comes up with similar file counts as you show above). It's harder for me to test on Windows but I'll try that later.

I can only work with the hardware I have. As an act of good faith:

  • Having only 128gb of RAM should be sufficient for CLOC, right? I can also confirm I have used it on systems with less, on smaller code bases, with no issue. This includes systems with 2gb, 4gb, 8gb, 16gb, 32gb, and 64gb of RAM, So I suspect this is not my issue.

The drive in question is on Windows 11 Pro and is a NVMe PC811 SK hynix 1024GB right from Dell. I don't think this is my issue.

For clarity: I was also running types of multiple scans. WSL or Linux is not my current goal, and is never used as this pipeline is Windows 11 Pro Native. That could be the issue, I do not know.

If you have wording suggestions on improving the documentation I'll certainly consider updating it.

The default is documented is 10 seconds but if I use 10 seconds as the argument for the parameter I get different output.

So what does it actually do?

@AlDanial
Copy link
Owner

The default is documented is 10 seconds

You're misreading the documentation. --diff-timeout defaults to 10 seconds. --timeout has no hardcoded default. Instead, timeout durations are computed on a per-file basis (basically the number of lines in the file divided by a constant).

Then with all due respect, I would consider the runaway usage of regular expressions to be a problem as well.

I'm sure you're right. Most of cloc's regular expression heavy lifting is done by Regexp::Common::Comment. Redoing that is beyond me.

Since you work with large code bases you'll be better off with a faster counter. scc offers many advantages over cloc: much faster, no timeouts, true code parsing rather than using regular expressions. I'm sure it will serve you better than cloc.

AlDanial added a commit that referenced this issue Nov 18, 2024
extend timeout duration for files with extremely long lines
@duaneking
Copy link
Author

The default is documented is 10 seconds

You're misreading the documentation. --diff-timeout defaults to 10 seconds. --timeout has no hardcoded default. Instead, timeout durations are computed on a per-file basis (basically the number of lines in the file divided by a constant).

The algorithm used to define that ttl should only happen if the user asks for it. If I don't enable it, it should not be turned on.

Having this happen without being turned on manually physically hurts.

Respectfully, Having it turned on without being enabled violates the UNIX Koans as well.

Then with all due respect, I would consider the runaway usage of regular expressions to be a problem as well.

I'm sure you're right. Most of cloc's regular expression heavy lifting is done by Regexp::Common::Comment. Redoing that is beyond me.

Should this be called out as a defect in that Dependency if the interface isn't allowing you to error out on a "stuck" regex? You would think there would be an exception or something thrown, or some kind of error result that you can then use.

Since you work with large code bases you'll be better off with a faster counter. scc offers many advantages over cloc: much faster, no timeouts, true code parsing rather than using regular expressions. I'm sure it will serve you better than cloc.

Thank you for the recommendation; But I was hoping that good, simple cloc would work without choosing to fail randomly based on the size of a file. Thats the problem here. It should not be failing randomly without being asked to,

I get that the size of a file can be an issue if your throwing it into RAM and holding it into memory to act on it as part of a O(N) scan of a directory tree.. but like I said, I have 128gb of RAM, and any delays are simply going to be due to running 30+ cores at full throttle.

Like I said, the default should be that the fault injection be turned off, respectfully.

@AlDanial
Copy link
Owner

The algorithm used to define that ttl should only happen if the user asks for it. If I don't enable it, it should not be turned on.

I disagree and will therefore close this ticket without additional work beyond the new end-of-run comment suggested by @loreb and implemented with fd8d789

Consider the flip side: without the timeouts I'll get wrathful issues from users whose CI/CD pipelines froze, or took too long, or racked up unacceptable AWS fees because cloc got tripped up by a regex. That is a worse outcome than timeout errors.

In your shoes I'd ditch cloc for a more suitable tool, one that satisfies on grounds of software engineering best practices and Unix koans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants