Segmentation fault error in version 1.6 #340

slothyrulez · 2024-07-24T15:57:14Z

Hi!

We are running postgresql 14.10 on AWS RDS, we have installed recently the pg_cron 1.6 extension, following this guide:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_pg_cron.html

We only have one job configured, to launch every day out of peak hours, a VACUUM VERBOSE ANALYZE on the whole database.

I have found we have 2 crashes related to pg_cron:


	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: background worker "pg_cron" (PID 24839) was terminated by signal 11: Segmentation fault
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:DETAIL: Failed process was running: VACUUM VERBOSE ANALYZE
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: terminating any other active server processes /etc/rds/dbbin/pgscripts/rds_wal_archive: line 29: 29367 Quit (core dumped) usleep 50000
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[2788]:FATAL: archive command failed with exit code 131
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[2788]:DETAIL: The failed archive command was: /etc/rds/dbbin/pgscripts/rds_wal_archive pg_wal/0000000100004C6600000027 skipTimeCheckForWalSegmentAndReadyFile
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[703]:LOG: all server processes terminated; reinitializing
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC::@:[29368]:LOG: database system was interrupted; last known up at 2024-07-24 07:16:46 UTC
	2024-07-24T07:18:48.000Z	2024-07-24 07:18:48 UTC:10.2.0.249(58010):rdsrepladmin@[unknown]:[29369]:FATAL: the database system is in recovery mode

and

2024-07-21 07:18:59 UTC::@:[723]:LOG: cron job 1: INFO: vacuuming "pg_toast.pg_toast_5855542"
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: background worker "pg_cron" (PID 30526) was terminated by signal 11: Segmentation fault
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:DETAIL: Failed process was running: VACUUM VERBOSE ANALYZE
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: terminating any other active server processes
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[703]:LOG: all server processes terminated; reinitializing
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: database system was interrupted; last known up at 2024-07-21 07:18:13 UTC
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC:10.2.0.249(39436):rdsrepladmin@[unknown]:[2583]:FATAL: the database system is in recovery mode
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: database system was not properly shut down; automatic recovery in progress
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: redo starts at 4C55/52048718
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: invalid record length at 4C55/5956D5B0: wanted 24, got 0
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: redo done at 4C55/5956D588 system usage: CPU: user: 0.19 s, system: 0.06 s, elapsed: 0.27 s
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC::@:[2582]:LOG: checkpoint starting: end-of-recovery immediate
	2024-07-21T07:18:59.000Z	2024-07-21 07:18:59 UTC:172.31.21.40(48308)::[2584]:FATAL: the database system is in recovery mode

Do you have any hints ?

The text was updated successfully, but these errors were encountered:

marcoslot · 2024-07-25T08:34:30Z

That's an interesting one. It's possible that the VACUUM VERBOSE ANALYZE crashes itself (e.g. due to another extension), though I wonder if the amount of notices produced by VERBOSE is causing an issue with the background worker mode used by RDS. Can you try to run without VERBOSE to see whether the issue goes away? (which is a strong indication that pg_cron is indeed the problem)

slothyrulez · 2024-07-25T12:10:36Z

Ey!, thanks for your time.

Yes, I think is possible that the cause is not pg_cron itself, but VACUUM.
Ok, I'll reconfigure the job to run without the VERBOSE, and let you know.

Thanks again.

slothyrulez · 2024-07-30T14:56:31Z

@marcoslot update here: we have been running VACUUM ANALYZE 5 days without any issue for the moment.

slothyrulez · 2024-08-02T16:22:35Z

Checked again and running as expected, I don't know if i can help you somehow.

rinshadka · 2024-08-21T07:51:06Z

Hi, Me too getting this issue in aurora postgreSQL RDS database. while executing a materialized view via pg_cron.

2024-08-21 06:47:00 UTC::@:[554]:LOG: background worker "pg_cron" (PID 22225) was terminated by signal 11: Segmentation fault
2024-08-21 06:47:00 UTC::@:[554]:DETAIL: Failed process was running: SELECT ingress.refresh_interactions_materialized_view();
2024-08-21 06:47:00 UTC::@:[554]:LOG: terminating any other active server processes
2024-08-21 06:47:00 UTC::@:[554]:FATAL: Aurora Runtime process unexpectedly exited
2024-08-21 06:47:00 UTC::@:[554]:LOG: database system is shut down

When I manually executes this, it will take around 4 minutes to finish.

Thanks.

marcoslot · 2024-08-22T10:23:08Z

@bdrouvotAWS do you happen to have any way of getting a stack trace?

bdrouvotAWS · 2024-08-23T04:42:45Z

@rinshadka @slothyrulez could you please ping me at bdrouvot@amazon.com?

Politta · 2024-10-02T23:25:02Z

I'm having a similar issue using postgreSQL RDS

2024-10-02 13:00:00.000,"2024-10-02 13:00:00 UTC::@:[554]:LOG:  background worker ""pg_cron"" (PID 25883) was terminated by signal 11: Segmentation fault",xxx-xxx-xxx-xxxx-xxxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,"2024-10-02 13:00:00 UTC::@:[554]:DETAIL:  Failed process was running: Select * from dbo.""archiveFormInstances""(2* 370, 25000)",xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:LOG:  terminating any other active server processes,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:FATAL:  Aurora Runtime process unexpectedly exited,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql
2024-10-02 13:00:00.000,2024-10-02 13:00:00 UTC::@:[554]:LOG:  database system is shut down,xxxx-xxx-xx-xxxx-xxxx:/aws/rds/cluster/xxxxxxx/postgresql

Anyone had any luck debugging issues like that?

marcoslot added the bug label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault error in version 1.6 #340

Segmentation fault error in version 1.6 #340

slothyrulez commented Jul 24, 2024

marcoslot commented Jul 25, 2024 •

edited

Loading

slothyrulez commented Jul 25, 2024

slothyrulez commented Jul 30, 2024

slothyrulez commented Aug 2, 2024

rinshadka commented Aug 21, 2024

marcoslot commented Aug 22, 2024

bdrouvotAWS commented Aug 23, 2024

Politta commented Oct 2, 2024

Segmentation fault error in version 1.6 #340

Segmentation fault error in version 1.6 #340

Comments

slothyrulez commented Jul 24, 2024

marcoslot commented Jul 25, 2024 • edited Loading

slothyrulez commented Jul 25, 2024

slothyrulez commented Jul 30, 2024

slothyrulez commented Aug 2, 2024

rinshadka commented Aug 21, 2024

marcoslot commented Aug 22, 2024

bdrouvotAWS commented Aug 23, 2024

Politta commented Oct 2, 2024

marcoslot commented Jul 25, 2024 •

edited

Loading