Fix error when changing force write mode #8146

TreeHunter9 · 2024-06-03T13:22:29Z

When we try to change force write mode on database that has connections, we can get various types of errors or asserts from PIO_ functions (Super Server mode only). These errors can be reproduced using TPCC load and changing force write mode.
This patch fixes cases where someone tries to change force write mode when database has connections. This is done via RWLock for file descriptor, so we can be sure that some other thread won't close this descriptor while we're using it.
For Classic Server we can change force write mode only with no connections.

…thread won't close it while we're using it. For Classic Server you can change forced write mode only with no connections.

hvlad · 2024-06-06T08:23:09Z

For Classic Server we can change force write mode only with no connections.

Why ?

hvlad · 2024-06-06T08:29:30Z

src/include/firebird/impl/msg/jrd.h

@@ -968,3 +968,4 @@ FB_IMPL_MSG(JRD, 965, ods_upgrade_err, -901, "HY", "000", "ODS upgrade failed wh
 FB_IMPL_MSG(JRD, 966, bad_par_workers, -924, "HY", "000", "Wrong parallel workers value @1, valid range are from 1 to @2")
 FB_IMPL_MSG(JRD, 967, idx_expr_not_found, -902, "42", "000", "Definition of index expression is not found for index @1")
 FB_IMPL_MSG(JRD, 968, idx_cond_not_found, -902, "42", "000", "Definition of index condition is not found for index @1")
+FB_IMPL_MSG(JRD, 969, forced_write_change_err, -902, "42", "000", "Cannot change forced write mode for classic server while it has connections")


This error code is already used in master

hvlad · 2024-06-06T08:42:21Z

src/jrd/os/posix/unix.cpp

@@ -751,13 +764,13 @@ bool PIO_read(thread_db* tdbb, jrd_file* file, BufferDesc* bdb, Ods::pag* page,
 	SINT64 bytes;
 	FB_UINT64 offset;

+	EngineCheckout cout(tdbb, FB_FUNCTION, EngineCheckout::UNNECESSARY);
+	ReadLockGuard readGuard(file->fil_desc_lock, FB_FUNCTION);
+
 	if (file->fil_desc == -1)
 		return unix_error("read", file, isc_io_read_err, status_vector);

 	Database* const dbb = tdbb->getDatabase();


Shouldn't it be before EngineCheckout ? At least from logical POV

hvlad · 2024-06-06T08:43:06Z

src/jrd/os/posix/unix.cpp

@@ -803,13 +816,13 @@ bool PIO_write(thread_db* tdbb, jrd_file* file, BufferDesc* bdb, Ods::pag* page,
 	SINT64 bytes;
 	FB_UINT64 offset;

+	EngineCheckout cout(tdbb, FB_FUNCTION, EngineCheckout::UNNECESSARY);
+	ReadLockGuard readGuard(file->fil_desc_lock, FB_FUNCTION);
+
 	if (file->fil_desc == -1)
 		return unix_error("write", file, isc_io_write_err, status_vector);

 	Database* const dbb = tdbb->getDatabase();


Same question as in PIO_read above

hvlad · 2024-06-06T08:46:52Z

src/jrd/os/posix/unix.cpp

@@ -858,6 +871,8 @@ static jrd_file* seek_file(jrd_file* file, BufferDesc* bdb, FB_UINT64* offset,
 			break;
 	}

+	ReadLockGuard readGuard(file->fil_desc_lock, FB_FUNCTION);
+


Looks as not necessary when seek_file() is called from PIO_read() and PIO_write() - the most often cases.
The exception is when caller is PIO_init_data(), so there should be some smart solution.

TreeHunter9 · 2024-06-07T12:19:53Z

For Classic Server we can change force write mode only with no connections.

Why ?

I tried to implement this via AST, looking at LCK_monitor as an example, but I couldn't figure it out where and when I should relock my lock after LCK_downgrade in AST. So I stop on restriction for classic server.
Maybe you can give me some advice on this issue?

hvlad · 2024-06-07T12:24:54Z

For Classic Server we can change force write mode only with no connections.

Why ?

I tried to implement this via AST, looking at LCK_monitor as an example, but I couldn't figure it out where and when I should relock my lock after LCK_downgrade in AST. So I stop on restriction for classic server. Maybe you can give me some advice on this issue?

My question was - why different CS processes must run with the same FW mode ?
I.e. why force same mode here ? I highly doubt it can produce

various types of errors or asserts from PIO_ functions

TreeHunter9 · 2024-06-07T12:59:57Z

why different CS processes must run with the same FW mode ?

I think because it's more expected, if someone changes FW mode, they will expect all connections to change FW mode too. This works with Super, but not with Classic.
But this is how I see it, currently if we change FW mode on Classic, it will not change for already connected users, maybe I should keep this behavior.

I.e. why force same mode here ? I highly doubt it can produce

various types of errors or asserts from PIO_ functions

Yes, these errors can only be reproduced on Super, I should have mentioned that in the PR description, added it now.

hvlad · 2024-06-07T13:22:21Z

The more I think about it the more I consider that solution is not fully correct and excessive, sorry.

There is no (direct) harm when different CS processes use different FW mode. It is not perfect, I agree, but we lived with it many years with no complains.

As for SS, where the real problem happens, I think it is not good to re-open database file with a (lot of) user connections - it could lead to a significant performance loss for a visible time, as OS flushes the file buffers when file is closing and it could take a time. Note, all user connections the require disk IO will wait for this process to finish.

So, instead, I offer to save new FW flag value to the header page (all architectures) and re-open file only if there is no other connections (required for SS only, but could be discussed). This is similar to how change of page buffers number works, btw.

aafemt · 2024-06-07T13:37:54Z

So, instead, I offer to save new FW flag value to the header page (all architectures) and re-open file only if there is no other connections (required for SS only, but could be discussed).

I would expect that while file open mode changes are delayed in this case for SS, changes in page cache flush strategy would be immediate. Perhaps file re-opening also may be immediate if the change is done in sole connection.

hvlad · 2024-06-07T14:35:09Z

So, instead, I offer to save new FW flag value to the header page (all architectures) and re-open file only if there is no other connections (required for SS only, but could be discussed).

I would expect that while file open mode changes are delayed in this case for SS, changes in page cache flush strategy would be immediate.

Could you show what you means by 'flush strategy', how it works now and how you would expect it to work ?

Perhaps file re-opening also may be immediate if the change is done in sole connection.

This is what I said above.

aafemt · 2024-06-07T14:46:50Z

Could you show what you means by 'flush strategy', how it works now and how you would expect it to work ?

I expect that when I set FW OFF, dirty pages won't be flushed immediately on commit for all connections. Because that's why I do it.

aafemt · 2024-06-07T15:01:22Z

My last comment is for SS. CS can do whatever it wants.

hvlad · 2024-06-07T15:15:15Z

Could you show what you means by 'flush strategy', how it works now and how you would expect it to work ?

I expect that when I set FW OFF, dirty pages won't be flushed immediately on commit for all connections. Because that's why I do it.

When engine works with FW=ON (FIL_force_write flag is set) is doesn't call PIO_flush(), because it makes no sence - OS does it for us on every write().

When engine works with FW=OFF (FIL_force_write flag is not set) is does call PIO_flush() (as often as it allowed by MaxUnflushedWrites and MaxUnflushedWriteTime settings) - this is necessary, as OS doesn't do it.

Now, imagine database with FW=ON and admin changed FW to OFF. File mode is not changed and OS still writes every page to disk immediately. You offer to clear FIL_force_write flag and make engine to call PIO_flush() in addition to the OS immediate writes. In best case it will change nothing, in worst case - adds performance penalty. In any case your expectation about perf boost is wrong.

In opposite case, when FW is changed from OFF to ON, engine will never call PIO_flush() and OS also will not write data to disk immediately. This is very bad for on-disk consistency, obviously.

aafemt · 2024-06-07T15:19:00Z

as often as it allowed by MaxUnflushedWrites and MaxUnflushedWriteTime settings

Which by default are -1, i.e. "never".

hvlad · 2024-06-07T15:44:56Z

as often as it allowed by MaxUnflushedWrites and MaxUnflushedWriteTime settings

Which by default are -1, i.e. "never".

Not on Windows.
Either way, that doesn't make your expectation/proposal correct.

ilya071294 · 2024-06-10T08:14:46Z

As for SS, where the real problem happens, I think it is not good to re-open database file with a (lot of) user connections - it could lead to a significant performance loss for a visible time, as OS flushes the file buffers when file is closing and it could take a time. Note, all user connections the require disk IO will wait for this process to finish.

I guess the main point was when someone changes the FW mode, he usually wants to get the effect immediately. And there are 2 ways to do this: either force all user connections to close or reopen the file with relatively less time and performance penalty. The second one looks better from this point. Also, if we allow the actual FW mode change to be delayed then we cannot trust its status showed by gstat.

This is similar to how change of page buffers number works, btw.

Similarity is good but changing the FW mode is more critical, I believe.

dyemanov · 2024-06-10T08:36:11Z

What I don't like in the "delayed option" is it being unpredictable. DBA turns FW on and expects that crashes will not lead to corruptions after that. The same for CS - if some connections will still be using FW=OFF after that and crash and corrupt the database this is not what DBA expects after changing FW to ON. But they have no idea when the change will be actually applied. So to ensure that they need also to shutdown the database to force everybody to reconnect. This is hardly pleasant under user load. This is the difference between BUFFERS and FW -- one is just about performance but the other is about reliability which is more important, IMO.

hvlad · 2024-06-10T08:47:00Z

What I don't like in the "delayed option" is it being unpredictable. DBA turns FW on and expects that crashes will not lead to corruptions after that. The same for CS - if some connections will still be using FW=OFF after that and crash and corrupt the database this is not what DBA expects after changing FW to ON.

Same for SC.

But they have no idea when the change will be actually applied.

Document it. Issue warning.

So to ensure that they need also to shutdown the database to force everybody to reconnect. This is hardly pleasant under user load.

Explain, please - why reasonable DBA could think about changing FW under real load ever ?
What is real necessity for this and how often it happens ?

PS this is again flame about nothing. The original intent was to fix errors on SS when FW is changing under load.
But it was not about making any uneducated "want-to-be-dba" happy.

TreeHunter9 · 2024-06-14T06:56:37Z

So, we stop at the realization proposed by @hvlad?

romansimakov · 2024-06-14T16:04:40Z

So to ensure that they need also to shutdown the database to force everybody to reconnect. This is hardly pleasant under user load.

Explain, please - why reasonable DBA could think about changing FW under real load ever ? What is real necessity for this and how often it happens ?

Why to change FW at all? Loading, indexing, or even just because someone wants. At least untill we don't require EX lock on DB for this.
So it should either predictbaly work or not.
Moreover, an ability to change FW on the fly looks not so bad option for me, and might be useful for some heavy regilar operations, in combination with nbackup as "global transaction".

Create RWLock for file descriptor, so we can be sure that some other …

e94879e

…thread won't close it while we're using it. For Classic Server you can change forced write mode only with no connections.

hvlad reviewed Jun 6, 2024

View reviewed changes

ilya071294 mentioned this pull request Sep 5, 2024

Improve diagnostics and stability during BUGCHECK #7270

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error when changing force write mode #8146

Fix error when changing force write mode #8146

TreeHunter9 commented Jun 3, 2024 •

edited

Loading

hvlad commented Jun 6, 2024

hvlad Jun 6, 2024

hvlad Jun 6, 2024

hvlad Jun 6, 2024

hvlad Jun 6, 2024

TreeHunter9 commented Jun 7, 2024

hvlad commented Jun 7, 2024

TreeHunter9 commented Jun 7, 2024 •

edited

Loading

hvlad commented Jun 7, 2024

aafemt commented Jun 7, 2024

hvlad commented Jun 7, 2024

aafemt commented Jun 7, 2024

aafemt commented Jun 7, 2024

hvlad commented Jun 7, 2024

aafemt commented Jun 7, 2024

hvlad commented Jun 7, 2024

ilya071294 commented Jun 10, 2024

dyemanov commented Jun 10, 2024

hvlad commented Jun 10, 2024

TreeHunter9 commented Jun 14, 2024

romansimakov commented Jun 14, 2024

Fix error when changing force write mode #8146

Are you sure you want to change the base?

Fix error when changing force write mode #8146

Conversation

TreeHunter9 commented Jun 3, 2024 • edited Loading

hvlad commented Jun 6, 2024

hvlad Jun 6, 2024

Choose a reason for hiding this comment

hvlad Jun 6, 2024

Choose a reason for hiding this comment

hvlad Jun 6, 2024

Choose a reason for hiding this comment

hvlad Jun 6, 2024

Choose a reason for hiding this comment

TreeHunter9 commented Jun 7, 2024

hvlad commented Jun 7, 2024

TreeHunter9 commented Jun 7, 2024 • edited Loading

hvlad commented Jun 7, 2024

aafemt commented Jun 7, 2024

hvlad commented Jun 7, 2024

aafemt commented Jun 7, 2024

aafemt commented Jun 7, 2024

hvlad commented Jun 7, 2024

aafemt commented Jun 7, 2024

hvlad commented Jun 7, 2024

ilya071294 commented Jun 10, 2024

dyemanov commented Jun 10, 2024

hvlad commented Jun 10, 2024

TreeHunter9 commented Jun 14, 2024

romansimakov commented Jun 14, 2024

TreeHunter9 commented Jun 3, 2024 •

edited

Loading

TreeHunter9 commented Jun 7, 2024 •

edited

Loading