Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault error in version 1.6 #340

Open
slothyrulez opened this issue Jul 24, 2024 · 8 comments
Open

Segmentation fault error in version 1.6 #340

slothyrulez opened this issue Jul 24, 2024 · 8 comments
Labels

Comments

@slothyrulez
Copy link

Hi!

We are running postgresql 14.10 on AWS RDS, we have installed recently the pg_cron 1.6 extension, following this guide:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_pg_cron.html

We only have one job configured, to launch every day out of peak hours, a VACUUM VERBOSE ANALYZE on the whole database.

I have found we have 2 crashes related to pg_cron:


	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: background worker "pg_cron" (PID 24839) was terminated by signal 11: Segmentation fault
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:DETAIL: Failed process was running: VACUUM VERBOSE ANALYZE
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: terminating any other active server processes /etc/rds/dbbin/pgscripts/rds_wal_archive: line 29: 29367 Quit (core dumped) usleep 50000
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[2788]:FATAL: archive command failed with exit code 131
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[2788]:DETAIL: The failed archive command was: /etc/rds/dbbin/pgscripts/rds_wal_archive pg_wal/0000000100004C6600000027 skipTimeCheckForWalSegmentAndReadyFile
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: all server processes terminated; reinitializing
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[29368]:LOG: database system was interrupted; last known up at 2024-07-24 07:16:46 UTC
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC:10.2.0.249(58010):rdsrepladmin@[unknown]:[29369]:FATAL: the database system is in recovery mode

and

2024-07-21 07:18:59 UTC::@:[723]:LOG: cron job 1: INFO: vacuuming "pg_toast.pg_toast_5855542"
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: background worker "pg_cron" (PID 30526) was terminated by signal 11: Segmentation fault
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:DETAIL: Failed process was running: VACUUM VERBOSE ANALYZE
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: terminating any other active server processes
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: all server processes terminated; reinitializing
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: database system was interrupted; last known up at 2024-07-21 07:18:13 UTC
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC:10.2.0.249(39436):rdsrepladmin@[unknown]:[2583]:FATAL: the database system is in recovery mode
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: database system was not properly shut down; automatic recovery in progress
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: redo starts at 4C55/52048718
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: invalid record length at 4C55/5956D5B0: wanted 24, got 0
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: redo done at 4C55/5956D588 system usage: CPU: user: 0.19 s, system: 0.06 s, elapsed: 0.27 s
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: checkpoint starting: end-of-recovery immediate
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC:172.31.21.40(48308)::[2584]:FATAL: the database system is in recovery mode

Do you have any hints ?

@marcoslot
Copy link
Collaborator

marcoslot commented Jul 25, 2024

That's an interesting one. It's possible that the VACUUM VERBOSE ANALYZE crashes itself (e.g. due to another extension), though I wonder if the amount of notices produced by VERBOSE is causing an issue with the background worker mode used by RDS. Can you try to run without VERBOSE to see whether the issue goes away? (which is a strong indication that pg_cron is indeed the problem)

@slothyrulez
Copy link
Author

Ey!, thanks for your time.

Yes, I think is possible that the cause is not pg_cron itself, but VACUUM.
Ok, I'll reconfigure the job to run without the VERBOSE, and let you know.

Thanks again.

@slothyrulez
Copy link
Author

@marcoslot update here: we have been running VACUUM ANALYZE 5 days without any issue for the moment.
imagen

@marcoslot marcoslot added the bug label Aug 1, 2024
@slothyrulez
Copy link
Author

Checked again and running as expected, I don't know if i can help you somehow.

@rinshadka
Copy link

Hi, Me too getting this issue in aurora postgreSQL RDS database. while executing a materialized view via pg_cron.

2024-08-21 06:47:00 UTC::@:[554]:LOG: background worker "pg_cron" (PID 22225) was terminated by signal 11: Segmentation fault
2024-08-21 06:47:00 UTC::@:[554]:DETAIL: Failed process was running: SELECT ingress.refresh_interactions_materialized_view();
2024-08-21 06:47:00 UTC::@:[554]:LOG: terminating any other active server processes
2024-08-21 06:47:00 UTC::@:[554]:FATAL: Aurora Runtime process unexpectedly exited
2024-08-21 06:47:00 UTC::@:[554]:LOG: database system is shut down

When I manually executes this, it will take around 4 minutes to finish.

Thanks.

@marcoslot
Copy link
Collaborator

@bdrouvotAWS do you happen to have any way of getting a stack trace?

@bdrouvotAWS
Copy link
Collaborator

@rinshadka @slothyrulez could you please ping me at bdrouvot@amazon.com?

@Politta
Copy link

Politta commented Oct 2, 2024

I'm having a similar issue using postgreSQL RDS

2024-10-02 13:00:00.000,"2024-10-02 13:00:00 UTC::@:[554]:LOG:  background worker ""pg_cron"" (PID 25883) was terminated by signal 11: Segmentation fault",xxx-xxx-xxx-xxxx-xxxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,"2024-10-02 13:00:00 UTC::@:[554]:DETAIL:  Failed process was running: Select * from dbo.""archiveFormInstances""(2* 370, 25000)",xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:LOG:  terminating any other active server processes,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:FATAL:  Aurora Runtime process unexpectedly exited,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:LOG:  database system is shut down,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql

Anyone had any luck debugging issues like that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants