Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation faults with Particle Injectors #611

Closed
Tissot11 opened this issue Mar 17, 2023 · 2 comments
Closed

Segmentation faults with Particle Injectors #611

Tissot11 opened this issue Mar 17, 2023 · 2 comments
Labels

Comments

@Tissot11
Copy link

Description

I am having segmentation faults with a setup that use Particle Injectors. I see segmentation faults on two machines but on another machine surprisingly it could work (short run). I attach setup file, with together with stdout and std err files on Raven(.4464174) and Justus (8714049) machines. On Raven, I used intel/21.2.0 impi/2021.2 mkl/2021.2 anaconda/3/2020.02 hdf5-mpi/1.12.0 while on Justus I used lib/hdf5/1.12.1-intel-19.1.2-impi-2019.8 and numlib/python_scipy/1.5.0_numpy-1.19.0_python-3.8.3 modules.

A colleague at Justus cluster administration compiled the current Smilei (fetched from Github) with the debug option

module load lib/hdf5/1.12.1-intel-19.1.2-impi-2019.8 HDF5 1.12.1 has been loaded $ export PYTHONEXE=python3 $ export HDF5_ROOT_DIR=$HDF5_HOME [$ make config=debug env VERSION : 4.7-248-ge563595d9-master SMILEICXX : mpicxx OPENMP_FLAG : -fopenmp -D_OMP HDF5_ROOT_DIR : /opt/bwhpc/common/lib/hdf5/1.12.1-intel-19.1.2-impi-2019.8 FFTW3_LIB_DIR : SITEDIR : /home/xx/xx_xxxxxx/xx_xx/.local/lib/python3.6/site-packages PYTHONEXE : python3 PY_CXXFLAGS : -I/usr/include/python3.6m -I/usr/include/python3.6m -I/usr/lib64/python3.6/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION PY_LDFLAGS : -lpython3.6m -lpthread -ldl -lutil -lm -Xlinker -export-dynamic CXXFLAGS : -D__VERSION=\"4.7-248-ge563595d9-master\" -std=c++11 -Wall -I/opt/bwhpc/common/lib/hdf5/1.12.1-intel-19.1.2-impi-2019.8/include -Isrc -Isrc/Profiles -Isrc/MultiphotonBreitWheeler -Isrc/ElectroMagnSolver -Isrc/ParticleBC -Isrc/MovWindow -Isrc/Radiation -Isrc/DomainDecomposition -Isrc/Collisions -Isrc/SmileiMPI -Isrc/Patch -Isrc/PartCompTime -Isrc/Tools -Isrc/ElectroMagnBC -Isrc/ParticleInjector -Isrc/Field -Isrc/Merging -Isrc/Diagnostic -Isrc/Particles -Isrc/Python -Isrc/Pusher -Isrc/ElectroMagn -Isrc/Interpolator -Isrc/Projector -Isrc/Ionization -Isrc/Params -Isrc/Species -Isrc/Checkpoint -Isrc/picsar_interface -Ibuild/src/Python -I/usr/include/python3.6m -I/usr/include/python3.6m -I/usr/lib64/python3.6/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -g -pg -D__DEBUG -O0 -fopenmp -D_OMP LDFLAGS : -L/opt/bwhpc/common/lib/hdf5/1.12.1-intel-19.1.2-impi-2019.8/lib -lhdf5 -lpython3.6m -lpthread -ldl -lutil -lm -Xlinker -export-dynamic -lm -fopenmp -D_OMP $ make config=debug -j8

and collected the following backtrace of the crashed simulation run

(gdb) bt #0 raise (sig=11) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00000000012f56f7 in backward::SignalHandling::sig_handler (signo=11, info=0x150c97ff7f70, _ctx=0x150c97ff7e40) at src/Tools/backward.hpp:2260 #2 <signal handler called> #3 0x000000000064621f in std::vector<double, std::allocator<double> >::data (this=0x0) at /usr/include/c++/8/bits/stl_vector.h:1056 #4 0x000000000107b5c3 in Particles::getPtrPosition (this=0x150c88007eb0, idim=0) at src/Particles/Particles.h:429 #5 0x00000000010e9253 in VectorPatch::injectParticlesFromBoundaries (this=0x7ffca0e7b908, params=..., timers=..., itime=1) at src/Patch/VectorPatch.cpp:699 #6 0x000000000129c68e in L_main_534__par_region4_2_45 () at src/Smilei.cpp:544 #7 0x0000150d42f3e3f3 in __kmp_invoke_microtask () from /opt/bwhpc/common/compiler/intel/compxe.2020.2.254/compilers_and_libraries_2020.2.254/linux/compiler/lib/intel64_lin/libiomp5.so #8 0x0000150d42ec2273 in __kmp_invoke_task_func (gtid=0) at ../../src/kmp_runtime.cpp:7515 #9 0x0000150d42ec121e in __kmp_launch_thread (this_thr=0x0) at ../../src/kmp_runtime.cpp:6109 #10 0x0000150d42f3e8cc in _INTERNAL8aaf6219::__kmp_launch_worker (thr=0x0) at ../../src/z_Linux_util.cpp:593 #11 0x0000150d4573e1cf in start_thread (arg=<optimized out>) at pthread_create.c:479 #12 0x0000150d42869e73 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb)

Can you please investigate this? I believe this might be also relevant for #593, #521 and also for #293 that I had raised in past. Please let me know if you need more information.

d15_th75_mi25.py.txt
tjob_hybrid.err.4464174.txt
tjob_hybrid.out.4464174.txt
tjob_hybrid-8714049.err.txt
tjob_hybrid-8714049.out.txt

@Tissot11 Tissot11 added the bug label Mar 17, 2023
@mccoys
Copy link
Contributor

mccoys commented Mar 17, 2023

I found the bug and indeed it could be the same as #593, but I am unsure it would also affect the other issues you mentioned. The bug seemed to be caused by other changes elsewhere in the code (not that long ago).

Anyways it should be fixed in the develop branch -> 9f6da66

Please close this if it works for you

@Tissot11
Copy link
Author

Thanks for the quick reply and resolving the issue! I have checked the simulation run and it seems to be working fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants