Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect issue grouping on Windows #940

Closed
supervacuus opened this issue Jan 28, 2024 · 5 comments
Closed

Incorrect issue grouping on Windows #940

supervacuus opened this issue Jan 28, 2024 · 5 comments

Comments

@supervacuus
Copy link
Collaborator

Probably unrelated, but there is a topic https://forum.sentry.io/t/sentry-crashpad-crashing/9007/8, that definitely leads to incorrect issues grouping on our board. Appreciate for any advices how to handle it.

Originally posted by @stima in #933 (comment)

@supervacuus
Copy link
Collaborator Author

I can't share the code, but I will try to investigate myself. Probably unrelated, but there is a topic https://forum.sentry.io/t/sentry-crashpad-crashing/9007/8, that definitely leads to incorrect issues grouping on our board. Appreciate for any advices how to handle it.

I responded in Discord, but for completeness and because it is way easier to find, I'll post it here, too:

Just to be clear: you don't want HandleAbortSignal to appear as the title of your issue, right? Because neither the screenshot nor the forum gives away what should be different.

Crashpad retrieves the context from the SIGABRT handler on Windows (which is not invoked when raising exceptions, by the way, because exceptions are always handled via SEH on Windows), and that means the stack trace will include all stack frames from the signal-handler down to the line that called abort().

Your problem is probably because none of your stack-frames are classified as "in-app". If there are no "in-app" frames, then Sentry will select the top-most of the system frames as crash-cause in the title. You can check which grouping rule affects which frame in your issue when you scroll down to the "Event Grouping Information" and open the section with "Show Details".

If you only have "By Exception Stack-trace" and no "By In-app Exception Stack-trace" contributing variant, then that means that no stack-trace rule sets any in-app frames. Setting up "in-app" frame stack rules in your project is particularly important on Windows because there are no pre-defined module paths set up due to the locale-dependent user directories.

You can find all about the possible options here: https://docs.sentry.io/product/data-management-settings/event-grouping/stack-trace-rules/

In your case, I would start with a simple

stack.abs_path:**/your_project/**.cpp +app

As soon as frames from your translation units are assigned to be in-app, Sentry will automatically select the top-most "in-app" frame for the title.

Be aware that grouping typically uses the right stack frames, even without defining "in-app" in the stack rules, because we exclude all the frames above and below from the grouping, but this is not reflected in the title.

@supervacuus
Copy link
Collaborator Author

supervacuus commented Jan 28, 2024

We should continue our discussion here because responding to specific points on Discord leads me to reach the maximum message length. It's also better to move this here for further processing if the issue turns out to be something we can fix (which will probably be in another repo).

There are quite a few things that we need to unwind before we can make this an actionable issue so please bear with me.

Originally posted by @stima in Discord#cpp

Hey @supervacuus thanks for such detailed response, unfortunetly that does not help.
I set stack.abs_path:**/AbortCrashpad/**.cpp +app and "in app" rules are not contributing variant for grouping.
grafik

Can you also provide a screenshot of "All values" (vs. only "Contributing values") with all the frames expanded? Or could you give us a link (mail to karl.struggl@sentry.io, for instance) to an event so some employees can look at the details of this event? The grouping you see is a bit surprising, but it might be easier to explain with all the frames.

In any case, you are correct that "in-app" rules will only fix the issue in your example when there are frames in the stack trace where that rule can be applied. This would be easier to verify with the whole event (or at least a full stack trace or "All values" from the grouping) so we can compare.

Additionally I would like to say that unhandled exceptions lead to abort, that leads to abort handler execution before it is propagated to SEH (or fastfail)

This is slightly the crux of this topic. Typically, abort() will not be called when a C++ exception is raised in Windows because a C++ throw will be translated to a _CxxThrowException/RaiseException call and, if unhandled, will end up in our (or in this case crashpad's) UnhandledExceptionFilter independent from which thread this exception comes (because the filter is defined process-global, but executed per-thread).

In your case, you throw from a thread function that runs in a noexcept context (because std::thread() is noexcept), and in that case, a program must terminate() (which leads to an abort()) if an exception escapes. This overrules the usual C++ exception handling on Windows and completely bypasses SEH and thus ends up in crashpad's SIGABRT handler, which can only produce a stack trace from the handler down to the cause (rather than access the crash context directly as an SEH filter could).

just for reference, minidump correctly displayed in VS, that as for me, says that exception information is correctly written.
grafik

That is also curious. If I run your code and load the generated *.dmp file into Visual Studio, it will point to the CaptureContext() invocation in HandleAbortSignal(), which is the same as the one we are producing (and also the expected location, except if VS has no access to the debug-symbols of sentry.dll):

minidump pointer in visual studio:
minidump pointer in visual studio

stack trace of the minidump pointer in visual studio:
stack trace of the minidump pointer in visual studio

the stack trace as rendered in sentry.io:
the stack trace as rendered in sentry.io

All values considered before grouping:
All values considered before grouping

The final grouping:
The final grouping

So, I think we are pretty close to what Visual Studio does because the _Invoke and AbortedFunc are the only remaining contributing frames after applying the default rules. We cannot automatically mark these frames as "in-app" because that is not true in the general case, although we might check if it could make sense to apply this. With user-defined stack rules for in-app frames, we could select the proper application frame for the issue title.

I would expect that grouping logic will group by AbortFunction that are stack trace rules
grafik

The grouping (not the title selection) should work without additional user-defined stack rules (as shown above). Let's find out why it diverges.

CC: @Swatinem, @kahest: I wonder how much sentry-native can improve the experience for this use case since it seems to be mostly about grouping rather than the data the client SDK produces.

@stima
Copy link
Contributor

stima commented Jan 29, 2024

Hey thanks for that update. I would like to point that for some reason I do not have same stack in sentry. I sent an email with all requested and available information to karl.struggl@sentry.io. Providing here some screenshots for a reference:

the stack trace as rendered in sentry.io:
image

All values considered before grouping:
image

The final grouping:
image

I would also like to point that, VS during opening a dump (by double clicking) a pointer to a next instruction with a line which causes abort, that I am interested as user:
image

(from my understanding that is from exception information and RIP instruction that is collected by crashpad as return address during abort handler exection).

To summarize, from user perspective it would be more valuable to understand the line that is actually cause abort.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Jan 29, 2024
@supervacuus
Copy link
Collaborator Author

Hi @stima!

Hey thanks for that update. I would like to point that for some reason I do not have same stack in sentry. I sent an email with all requested and available information to karl.struggl@sentry.io.

Thanks for the additional information. The stack trace shows that you haven't uploaded all the required image artifacts: concretely, sentry.dll and AbortCrashpad.exe are missing artifacts AFAICS. Be sure to upload both *.exe/*.dll and *.pdb.

If artifacts are missing, the issue page will generally warn you both at the top and again under "Images Loaded", where you can check Status/Processing or get more image information in the corresponding details view.

To summarize, from user perspective it would be more valuable to understand the line that is actually cause abort.

I agree, but first, you need a working stack trace; for this, we need the remaining missing artifacts. As a result, the grouping will reduce to the throw site of AbortedFunc(), and if you have set up the stack-rule, then this will be the sole frame highlighted in the stack trace and the title of your issue.

@stima
Copy link
Contributor

stima commented Jan 31, 2024

@supervacuus thanks, really appreciate your help. I need to admit that I need to read documentation more carefully. That issue may be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Archived in project
Development

No branches or pull requests

2 participants