Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

call cudaDeviceReset in sirius finalize #61

Draft
wants to merge 1 commit into
base: ristretto
Choose a base branch
from

Conversation

simonpintarelli
Copy link
Collaborator

This seems to fix the crash in MPI_Finalize, not entirely clear why though. The sirius miniapp, did call cudaDeviceReset and didn't crash.

GTL_DEBUG: [3] cudaHostUnregister: pointer does not correspond to a registered memory region
MPICH ERROR [Rank 3] [job id 19001.0] [Thu Jul  4 15:30:14 2024] [nid006079] - Abort(808544770) (rank 3 in comm 0): Fatal error in PMPI_Finalize: Invalid count, error stack:
PMPI_Finalize(214)...........................: MPI_Finalize failed
PMPI_Finalize(161)...........................: 
MPID_Finalize(713)...........................: 
MPIDI_SHMI_mpi_finalize_hook(87).............: 
MPIDI_POSIX_mpi_finalize_hook(151)...........: 
MPIDU_genq_shmem_pool_destroy_unsafe(327)....: 
MPIDU_genq_shmem_pool_gpu_mem_unregister(128): 
(unknown)(): Invalid count

@simonpintarelli
Copy link
Collaborator Author

Actually it is still crashing in MPI_Finalize when spfft+gpu_direct is used.

@simonpintarelli simonpintarelli marked this pull request as draft July 8, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant