-
-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve diagnostics and stability during BUGCHECK #7270
base: master
Are you sure you want to change the base?
Conversation
…or is invalid (like it's done in PIO_header) for better diagnostics
…se there is always a risk of crashing
…is invalid and I/O functions were not called before
…ugcheck_msg because otherwise a precedence relationship can be ruined for other threads that may still be active
2d175b4
to
d42db21
Compare
src/jrd/cch.cpp
Outdated
@@ -2144,7 +2144,7 @@ void CCH_shutdown(thread_db* tdbb) | |||
bcb_repeat* tail = bcb->bcb_rpt; | |||
const bcb_repeat* const end = tail + bcb->bcb_count; | |||
|
|||
if (tail && tail->bcb_bdb) | |||
if (tail && tail->bcb_bdb && !bugcheck) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also avoids flush, is it intended ? If yes, is it really good thought out ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCH_flush
wouldn't be called anyway because DBB_bugcheck
flag is already set at the moment, and LongJump::raise()
is called in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
In this case - is bugcheck
agrument really necessary ?
Does DBB_bugcheck
flag might be used instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it is not clear what happens with backup lock if clear_dirty_flag_and_nbak_state()
not called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
DBB_bugcheck
flag might be used instead ?
I guess no because later CCH_shutdown
will be called again in JRD_shutdown_database
. This time with bugcheck == false
, and clear_dirty_flag_and_nbak_state()
will be finally called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, could you explain - what problem with precedence relationship you trying to fix here ?
BDB_db_dirty
and BDB_dirty
are cleared but pages are not flushed. In concurrent environment other threads may want to write some pages according to the precedence and they may assume that these pages are written but actually they are not.
Did you test it explicitly ?
For example, if backup state remains locked in CS - it could hung all other attachments.
I did and clear_dirty_flag_and_nbak_state()
is certainly called during JRD_shutdown_database
(both CS and SS). Is there any case where we can get better results from calling clear_dirty_flag_and_nbak_state()
earlier? The old code seems good for CS but for SS I don't see how it can be safely called before other attachments/threads are stopped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this made me to look at the issue from another side.
AFAIU, the goal of the immediate call of CCH_shutdown() when database is bug-checked is to stop all database IO ASAP and not allow to write anything. Note, page cache is not flushed and database file is closed. Thus, even in concurrent environment, no other thread should be able to write any page. So, the real problem is - how to effectively disable IO after bugcheck (and don't spam firebird.log with messages about invalid file handle), IMHO.
On Windows, I would try to use jrd_file::fil_ext_lock
to block all database IO. Of course, PIO should add check for DBB_bugcheck
flag after acquiring jrd_file::fil_ext_lock
.
POSIX implementation have no this lock, but I see no problem to add it there.
Probably, PIO_read() should be allowed after bugcheck and database file should not be closed immediately (as we already prohibit any writes).
Looks like this is out of scope of this ticket, btw. But it is always better to fix root issue, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the real problem is - how to effectively disable IO after bugcheck (and don't spam firebird.log with messages about invalid file handle), IMHO.
How about abort()
? Core dump will be a bonus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the real problem is - how to effectively disable IO after bugcheck (and don't spam firebird.log with messages about invalid file handle), IMHO.
How about
abort()
? Core dump will be a bonus.
abort()
is called when BugcheckAbort
is set to true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, the goal of the immediate call of CCH_shutdown() when database is bug-checked is to stop all database IO ASAP and not allow to write anything. Note, page cache is not flushed and database file is closed.
- I wonder why a database file is closed after releasing the locks. Can we change the order in a case of BUGCHECK and close it before?
- If we close a database file without synchronizing with other threads, can we consider it safe? Especially when other threads are executing
PIO_write
at the moment.
POSIX implementation have no this lock, but I see no problem to add it there.
If we really need this lock then changes from #8146 may help with it.
On 8/27/24 17:50, Vlad Khorsun wrote:
So, the real problem is - how to effectively disable IO after bugcheck
(and don't spam firebird.log with messages about invalid file handle),
IMHO.
May be simply do not put message about invalid file handle to the log
when DBB is bugchecked?
|
No description provided.