Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve diagnostics and stability during BUGCHECK #7270

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ilya071294
Copy link
Contributor

No description provided.

…or is invalid (like it's done in PIO_header) for better diagnostics
…is invalid and I/O functions were not called before
…ugcheck_msg because otherwise a precedence relationship can be ruined for other threads that may still be active
src/jrd/cch.cpp Outdated
@@ -2144,7 +2144,7 @@ void CCH_shutdown(thread_db* tdbb)
bcb_repeat* tail = bcb->bcb_rpt;
const bcb_repeat* const end = tail + bcb->bcb_count;

if (tail && tail->bcb_bdb)
if (tail && tail->bcb_bdb && !bugcheck)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also avoids flush, is it intended ? If yes, is it really good thought out ?

Copy link
Contributor Author

@ilya071294 ilya071294 Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCH_flush wouldn't be called anyway because DBB_bugcheck flag is already set at the moment, and LongJump::raise() is called in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.
In this case - is bugcheck agrument really necessary ?
Does DBB_bugcheck flag might be used instead ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is not clear what happens with backup lock if clear_dirty_flag_and_nbak_state() not called

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does DBB_bugcheck flag might be used instead ?

I guess no because later CCH_shutdown will be called again in JRD_shutdown_database. This time with bugcheck == false, and clear_dirty_flag_and_nbak_state() will be finally called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, could you explain - what problem with precedence relationship you trying to fix here ?

BDB_db_dirty and BDB_dirty are cleared but pages are not flushed. In concurrent environment other threads may want to write some pages according to the precedence and they may assume that these pages are written but actually they are not.

Did you test it explicitly ?
For example, if backup state remains locked in CS - it could hung all other attachments.

I did and clear_dirty_flag_and_nbak_state() is certainly called during JRD_shutdown_database (both CS and SS). Is there any case where we can get better results from calling clear_dirty_flag_and_nbak_state() earlier? The old code seems good for CS but for SS I don't see how it can be safely called before other attachments/threads are stopped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this made me to look at the issue from another side.

AFAIU, the goal of the immediate call of CCH_shutdown() when database is bug-checked is to stop all database IO ASAP and not allow to write anything. Note, page cache is not flushed and database file is closed. Thus, even in concurrent environment, no other thread should be able to write any page. So, the real problem is - how to effectively disable IO after bugcheck (and don't spam firebird.log with messages about invalid file handle), IMHO.

On Windows, I would try to use jrd_file::fil_ext_lock to block all database IO. Of course, PIO should add check for DBB_bugcheck flag after acquiring jrd_file::fil_ext_lock.
POSIX implementation have no this lock, but I see no problem to add it there.
Probably, PIO_read() should be allowed after bugcheck and database file should not be closed immediately (as we already prohibit any writes).

Looks like this is out of scope of this ticket, btw. But it is always better to fix root issue, IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the real problem is - how to effectively disable IO after bugcheck (and don't spam firebird.log with messages about invalid file handle), IMHO.

How about abort()? Core dump will be a bonus.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the real problem is - how to effectively disable IO after bugcheck (and don't spam firebird.log with messages about invalid file handle), IMHO.

How about abort()? Core dump will be a bonus.

abort() is called when BugcheckAbort is set to true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, the goal of the immediate call of CCH_shutdown() when database is bug-checked is to stop all database IO ASAP and not allow to write anything. Note, page cache is not flushed and database file is closed.

  1. I wonder why a database file is closed after releasing the locks. Can we change the order in a case of BUGCHECK and close it before?
  2. If we close a database file without synchronizing with other threads, can we consider it safe? Especially when other threads are executing PIO_write at the moment.

POSIX implementation have no this lock, but I see no problem to add it there.

If we really need this lock then changes from #8146 may help with it.

@AlexPeshkoff
Copy link
Member

AlexPeshkoff commented Aug 28, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants