Highlights¶
+Improved performance using GPU offloading¶
+Smilei computational performance has been recently enhanced with GPU offloading support, +in particular the projection of current has been accelerated with a CUDA kernel while other features +have been accelerated with the use of either OpenAcc Pragmas or OpenMP pragmas.
+SMILEI can bes tested on recent supercomputers such as Adastra where it shows great weak scaling efficiency even while using diagnostics as shown in figure below:
+ ++
Mitigation of numerical artifacts with relativistic particles¶
+The staggering in space and time of the electromagnetic fields and the numerical dispersion of electromagnetic solvers +using finite differences in the time domain, typically used in electromagnetic Particle in Cell codes, create +numerical artifacts that become increasingly detrimental in presence of relativisitc macro-particles.
+A typical example of simulation where the effects of these artifacts can become relevant and alter the numerical results +is Laser Wakefield Acceleration. In this kind of simulation, a numerical artifact called Numerical Cherenkov Radiation is +generated by the spurious interaction between relativistic macro-particles and electromagnetic fields that are numerically +slower due to the dispersion of the Finite Difference Time Domain solver for Maxwell’s equations. +This artifact can significantly alter the dynamics of relativistic electron beams accelerated in plasma waves, quickly increasing +their divergence along their propagation in the plasma.
+Recently an interpolation scheme called B-TIS3 has been implemented in the code CALDER and published in P.-L. Bourgeois, X. Davoine (2023) +This scheme can significantly reduce the effects of Numerical Cherenkov Radiation and of the staggering of the electromagnetic fields, +as shown in this Figure:
+ +Both panels show a field proportional to the y component of the Lorentz force acting on the electron macro-particles in a Smilei simulation of Laser Wakefield Acceleration. +Top panel: simulation not using the B-TIS3 interpolation. +Bottom panel: simulation using the B-TIS3 interpolation. +The high frequency modulations, due to the Numerical Cherenkov Radiation are evident in the top panel, while they are +considerably reduced in the bottom panel. +These two simulations were obtained using the AMcylindrical geometry (1 azimuthal mode) and the Laser Envelope model.
++
Laser Envelope model and averaged tunnel ionization¶
+Laser-plasma interaction in underdense plasmas can be accurately simulated under certain conditions through a +laser envelope model, where the high frequency oscillations of the laser pulse do not need to be resolved. +In this model the laser effect on the plasma is described through the ponderomotive force and the effect of the precence of the +plasma on the laser pulse is described through the plasma susceptibility. These terms in the laser plasma interaction +can be expressed only as function of the laser complex envelope.
+This technique relaxes the constraint on the minimum spatial and temporal scales to be resolved and can yield +speed-ups of some orders of magnitude, expecially if coupled with a purely cylindrical geometry +(where only one azimuthal mode is taken into account).
+The envelope model is particularly used for Laser Wakefield Acceleration, where often the laser pulse envelope is at least ten +times longer than the laser carrier wavelength. Recently an new averaged tunnel ionization model has been developed for the envelope model, +allowing the simulation of Laser Wakefield Acceleration with ionization injection with this efficient technique.
+Following is the comparison of the electron density from two LWFA simulations, one using the laser envelope model and the averaged +tunnel ionization module (a) and one without the envelope model (b).
+ +In these simulations an intense laser pulse is propagating in a plasma composed of helium and partially ionized nitrogen. +The laser field near the pulse’s center is intense enough to further ionize the nitrogen ions, releasing electrons that can be trapped and +accelerated in the relativistic plasma wave behind the laser pulse.
+Previous averaged tunnel ionization models did not allow to accurately describe this LWFA scheme at relativistic regimes. +In this new model also the longitudinal momentum of the electrons obtained through ionization is initialized following analytical derivations. +Including this longitudinal momentum initialization allows to accurately describe the dynamics of these electrons.
+Following is a comparison of the accelerated electron spectra at the end of these simulations.
+ +In the green line it is shown the result of the previously known averaged ionization model. Without the longitudinal +momentum initialization, few electrons obtained through ionization are trapped and accelerated in the plasma wave. +The red line shows the result with the new averaged ionization model implemented in Smilei, which accurately reproduces +the spectrum obtained with the simulation without an envelope model (blue line).
+The envelope simulation required an amount of computing resources orders of magnitude smaller than those required by the simulation without a +laser envelope model.
+More details on the envelope model and the averaged tunnel ionization model in Smilei can be found here
++
Field initialization of a relativistic particle beam¶
+In Plasma Wakefield Acceleration (PWFA) a plasma wave is generated behind a relativistic particle +beam propagating in a plasma.
+To simulate this phenomenon, it is necessary to self-consistently initialize the electromagnetic fields +of a relativistic particle beam in vacuum, before its entrance into the plasma.
+Following is the image of a PWFA simulation which used this technique at its start.
+ +The “driver” relativistic electron beam (in yellow) propagates through a plasma and drives a nonlinear Langmuir wave (in blue) that +propagates at a velocity close to that of light in its wake. A “witness” relativistic electron bunch injected in this wave +can be accelerated with electric fields orders of magnitude higher than those sustainable by metallic radio-frequency +accelerating cavities.
++
Azimuthal Fourier decomposition¶
+In Laser Wakefield Acceleration (LWFA) a plasma wave is generated behind an intense laser pulse +propagating in an underdense plasma. +The physics in this phenomenon cannot be accurately simulated through 2D Cartesian +simulations. Nonetheless, 3D Cartesian simulations can be computationally demanding, +hence preliminary studies for LWFA experiments, typically consisting of many PIC simulations, +cannot be realissically be carried with 3D Cartesian simulations.
+The azimuthal Fourier decomposition addresses this issue by using a cylindrical grid, hence +a 2D grid, decomposing the fields in azimuthal harmonics to take into account a third dimension in space +and treating macro-particles in the 6D phase space.
+This technique can yield thus simulations with a computational cost comparable to that of 2D simulations, but +at the same time with an accuracy comparable to that of a full 3D Cartesian simulation.
+Following is the image of a LWFA simulation using azimuthal Fourier decomposition, which has made it +feasible in a medium-scale laboratory cluster. An equivalent 3D Cartesian simulation would have required +an amount of resources greater by at least an order of magnitude.
+ +The laser (in red) propagates through a low density plasma and drives a nonlinear Langmuir wave (in blue) that +propagates at a velocity close to that of light in its wake. In this simulation, a moving window is used +so we can follow the laser as it propagates through the plasma. We see electrons (in white) being self-injected +in this wakefield where they see a strong electric field that accelerates them up to ultra-relativistic (GeV) energy level.
+An animation generated from the simulation data can be found here
++
Improved performance using vectorization¶
+Smilei computational performance has been recently enhanced with +vectorized operations, +in particular the projection of currents and the interpolation of fields. +Typically, the new algorithms are more efficient than the old ones above +10 particles per cell, up to 3 times faster. An adaptive switching technique +ensures that the best version is used, dynamically and locally.
+This has been validated on large-scale simulations. +An example of a mildly-relativistic collisionless shock simulation is provided +in Fig. 1 (watch the video).
+ +High-density regions are switched to vectorized operators while low-density +regions remain scalar (they have only 8 particles per cell). +In this particular case, the treatment of particles can be sped-up by 2.
+For more details, checkout the doc and this +ArXiV paper.
++
Scalability in a wakefield acceleration simulation¶
+Wakefield-acceleration of electrons in an underdense plasma creates a +hotspot of electrons, which makes the simulation strongly imbalanced. +This spot represent a large part of the total calculations, so that +more computing power should be allocated on it.
+Please refer to the doc Parallelization basics to learn the basics of the +parallelization techniques employed in this section.
+1. OpenMP
+In a local area around this hotspot, OpenMP is able to manage the computing +resources to make the overall simulation faster. The following figure shows +the evolution of the time to calculate 100 iterations, as a function of time. +Each line corresponds to a different partition of the box in terms of +MPI processes and OpenMP threads: \(N\times M\), where \(N\) is +the total number of MPI processes, and \(M\) is the number of threads +in each MPI process.
+ +Using more OpenMP threads per MPI process (while keeping the total number +of threads constant) clearly reduces the simulation time, because the +computing power is balanced within each MPI patch collection.
+2. Dynamic load balancing between MPI processes
+At the global simulation scale, OpenMP cannot be used to smoothen the balance. +Instead, a dynamic load balancing (DLB) algorithm periodically exchanges pieces of +the simulation box (patches) between MPI processes, so that each MPI +process owns a fair amount of the simulation load. The following figure +shows how this balancing reduces the time of the simulation.
+ +The red curve is the best situation obtained in the previous section, while +the black curve corresponds to the DLB algorithm enabled.
+The portion of the box belonging to each MPI process varies when the load balancing +occurs. The following figure shows how each of these portions evolve with time.
+ +The four panels correspond to four timesteps during the simulation. +The colorscale represents the log-scaled load of each patch. +The black lines show the borders of each MPI process’ portion of the box. +The MPI processes that are close to the hotspot tend to handle a smaller portion +of the box.
++
High-harmonic generation¶
+The interaction between an ultra-intense (\(I>10^{18}~{\rm W/cm^2}\)) femtosecond laser pulse +with a solid target generates a dense “plasma mirror” at its surface that reflects the laser +in a strongly non-linear manner. The temporal distortion of the reflected wave creates +a train of ultra-short attosecond pulses, associated, in the frequency domain, +to a comb of high-order harmonics.
+We present a 2-dimensional Smilei simulation of laser-solid interaction +with wavelength \(\lambda_0 = 0.8\) µm, peak intensity +\(2\times10^{19}~{\rm W/cm^2}\), at 45° incidence with p-polarization on an overdense plasma slab +of constant electron density \(n_0=200\,n_c\) (\(n_c\) being the critical density), +\(5\lambda_0\)-thick, with an exponential pre-plasma of gradient length \(0.1\,\lambda_0\) +down to a cut-off density \(0.05\,n_c\). The full box size is +\(80\,\lambda_0 \times 60\lambda_0\) and the simulation time \(150\,\lambda_0/c\) +with a total of \(\sim 1.4\) billion quasi-particles in the box.
+The following figure (top panel) shows half of the simulation box in the +y-direction, and the laser field is reported at three different times. +The reflected laser pulse (at time \(t_2\)) shows a different spectral content than +the incident pulse (at time \(t_0\)). The plasma electron density is shown in black. +A close-up view of the interaction region is given in the bottom panel, illustrating +the electron bunches being pulled out from the plasma surface.
+ +Fourier analysis of the reflected laser field, in space and time, provides the +angular distribution of the frequency spectrum of the reflected light, shown in the +following figure (top panel). High harmonics appear up to order 16.
+ +The bottom panel shows trajectories of accelerated electrons ejected from the target. +The angular histogram shows that the momenta of the escaping energetic electrons +(1 to 10 MeV) are mostly directed along two directions which are close to the reflected +laser direction.
+This simulation was run on the CINES/Occigen (Bullx) machine using 256 MPI x 14 OpenMP +threads for about 10700 CPU-hours. The characteristic computing time per particle +(average PIC iteration divided by the number of particles) is of the order of +0.7 µs, including 25% for diagnostics.
++
Brillouin amplification¶
+Coupling, in a plasma, a long energetic “pump” pulse of moderate intensity to +a short counter-propagating “seed” pulse of initially low intensity can transfer energy +from the pump to the seed thanks to the excitation of a plasma or ion-acoustic wave.
+Here, we look specifically at +the stimulated Brillouin scattering +(SBS) amplification, where the excited waves are ion-acoustic waves.
+A pump with intensity \(10^{15}\) W/cm² (wavelength 1 µm) +correspond to the “strong-coupling” regime, particularly robust with respect to +plasma inhomogeneities and seed frequency [Chiaramello2016].
+A 2-dimensional simulation, in conditions close to actual experiments, ran +on a box size of 1024 µm x 512 µm for 10 ps +with 25 billion quasi-particles. The following figure shows the evolution +of the pump and seed intensities in the head-on collision at three different times. +The blue-yellow maps correspond to the plasma density while the white-red maps +correspond to the lasers intensity.
+ +The final seed intensity is nearly 5 times its initial intensity +while the spot size and phase front are well conserved, +suggesting that such a beam could be further focused using plasma mirrors.
+This simulation used the IDRIS/Turing (BlueGene/Q) super-computer using 1.8 million +CPU-hours on 32768 MPI processes, and 4 OpenMP threads per core. +The average time to push a particle was 1.9 µs, including 5% +for diagnostics. On the CINES/Occigen (Bullx) machine, we obtained an average time +of 0.43 µs to push one particle (without diagnostics).
++
Magnetic reconnection at the Earth magnetopause¶
+Magnetic reconnection at the Earth magnetopause regulates the transport of matter, +momentum and energy from the solar wind to the internal magnetosphere. +The solar wind plasma temperature is typically one tenth that of the magnetospheric plasma, +but its density is about ten times larger, and its magnetic field 2-3 times smaller. +This asymmetry makes the reconnection dynamics vastly more complex than in symmetric +environments, and has only been studied for a decade +via numerical simulations +and spacecraft observations.
+Studying the impact of a plasmaspheric plume on magnetopause reconnection +via kinetic numerical simulation is difficult. The simulation first needs +to reach a quasi-steady state reconnection with a typical magnetopause asymmetry, +see the arrival of the plume and then last longer for a quasi-steady state plume +reconnection regime to settle. Due to the large particle density of plumes, +the transition and last phases have substantially longer time scales than the early phase, +which makes the simulation heavy. The domain must be long enough in the downstream direction +for the plasma, expelled during the early and transition phases, to be evacuated from +the reconnection region. Otherwise, upstream plasma would not inflow, +thereby stopping reconnection.
+Three ion populations are present. +The solar wind and magnetospheric populations have densities equal to \(n_0\) and \(n_0/10\), +respectively, on their side of the current sheet, and fall to zero on the other side. +The plume population increases from 0 to \(2\,n_0\) at \(20\,c/\omega_{pi}\) from the initial +current sheet on the magnetospheric side. The magnetic field amplitude goes from \(2\,B_0\) +in the magnetosphere to \(B_0=m_e\omega_{pe}/e\) in the solar wind and is totally in the +simulation plane. The temperature is initially isotropic and its profile is calculated +to balance the total pressure.
+The domain size is 1280 \(c/\omega_{pi} \times\) 256 \(c/\omega_{pi}\). +The total simulation time is \(800\times\) the ion gyration time. +We used a reduced ion to electron mass ratio \(m_i/m_e = 25\), and a ratio +50 of the speed of light by the Alfvén velocity. +There are initially 8.6 billion quasi-protons for the three populations, and 13 billion electrons.
+ +This figure presents some of the simulation results: +the electron density at three different times. +In the top panel, reconnection is in steady state between the solar wind plasma of +density \(\simeq n_0\) and the magnetosphere plasma of density \(\simeq 0.1~n_0\). +At this time, the exhaust is filled with mixed solar wind/hot magnetospheric plasma as +the plume (of density \(\simeq 2~n_0\)) is still located at \(\simeq 10~c/\omega_{pi}\) +from the magnetospheric separatrix. The reconnection rate during this period has a +typical value around \(0.1~\Omega_{ci}^{-1}\), with important fluctuations caused +by plasmoid formation. The plume, originally at \(20~c/\omega_{pi}\) from the magnetopause, +is slowly advected towards the magnetosphere separatrix and finally touches the +reconnection site at about \(t=300~\Omega_{ci}^{-1}\). The second panel at +\(t=370~\Omega_{ci}^{-1}\) shows the plume starting to fill the exhaust after +reaching the reconnection site and mixing with solar wind plasma. +At this time, the reconnection rate collapses to about half its previous value. +The transition phase lasts for about \(100~\Omega_{ci}^{-1}\) before a plume +reconnection regime reaches a quasi-steady state. +The third panel shows the electron density at the end of the simulation, +where the exhaust is filled with plume and solar wind plasma.
+This large-scale simulation has run for a total of 14 million CPU-hours on 16384 cores +of the CINES/Occigen (Bullx) supercomputer within a GENCI-CINES special call. +Overall, the characteristic (full) push-time for a single particle was of the order +of 1.6 µs (including 31% for diagnostics). +No dynamic load balancing was used for this simulation.
++
Collisionless shock in pair plasmas¶
+Relativistic collisionless shocks play a fundamental role in various astrophysical scenarios +(active galactic nuclei, micro-quasars, pulsar wind nebulae and gamma-ray bursts) +where they cause high-energy radiation and particle acceleration related to the +cosmic-ray spectrum. In the absence of particle collisions, the shock is mediated +by collective plasma processes, produced by electromagnetic plasma instabilities +at the shock front.
+Specifically, the Weibel (or current filamentation) instability +is observed in most of the astrophysical relativistic outflows interacting with +the interstellar medium. It can be excited by counter-streaming unmagnetized relativistic +flows, and dominates the instability spectrum for a wide range of parameters. +The resulting strong magnetic turbulence can isotropize the incoming flow, +hence stopping it and leading to compression of the downstream (shocked plasma) and shock formation.
+We present a 2-dimensional PIC simulation of such shock, +driven in an initially unmagnetized electron-positron plasma. +The simulation relies on the “piston” method that consists in initializing the +simulation with a single cold electron-positron plasma drifting +at a relativistic velocity \(v_0 \simeq 0.995\,c\). +Reflecting boundary conditions at the right border creates a counter-penetrating flow.
+The simulation box size is 2048 \(\delta_e \times\) 128 \(\delta_e\) +(\(\delta_e = c/\omega_p\) being the electron skin-depth of the initial flow), +with a total of 2.15 billion quasi-particles. +The following figure show an unstable overlapping region of incoming and +reflected flows, resulting in the creation, before the shock +of filamentary structures in both the magnetic field (panel a) and +the total plasma density (panel b).
+ +The magnetic field at the shock front becomes turbulent and it is strong +enough to stop the incoming particles leading to a pile-up of the plasma +density (panel c).
+The following figure demonstrates the build-up, at late times, of a supra-thermal tail +in the downstream particle energy distribution. +It is characteristic of first-order Fermi acceleration at the shock front, +and appears to follow a \(\gamma^{-2.5}\) power law.
+ +This simulation run on the TGCC/Curie machine using 128 MPI x 8 OpenMP threads +for a total of 18800 CPU-hours for 49780 timesteps. +The average push time for one quasi-particle was of 0.63 µs (including 20% for diagnostics).
+