Enhance user experience with MPI #997

abussy · 2024-08-21T11:51:33Z

This PR aims at enhancing the user experience when using MPI parallelization by not stopping when n_ranks > n_kpt.

The current solution of stopping execution with an error message is not optimal. Indeed, for an arbitrary system, it is not trivial to know the number of irreducible K-points in advance. This is particularly annoying when calculations are launched in an automated fashion, where the only safe bet is to not use MPI at all.

The solution proposed here is simple. When a calculation is started with more MPI ranks than K-points, a new MPI communicator is created. This communicator has as many ranks as there are K-points, while the remaining processes exit the program.

While this may lead to idling CPU time, I believe that not crashing improves the experience. Moreover, a warning is printed describing the situation to the user, so that they can optimize their next run. I would also argue that this is not worse than under-parallelizing in some instances where the number of K-point is not a multiple of the number of MPI ranks (maybe we should also issue a warning in such a case?).

I also took the opportunity to fix testing with the :mpi tag. The current test on parse(Bool, get(ENV, "CI", "false")) in the PlaneWaveBasis creation, which essentially disables MPI testing on the CI, has been removed. Instead, all tests involving kgrids with a single k-point have received the :dont_test_mpi tag. The number of MPI ranks for MPI testing is also hardcoded to 2: because all tests are run as a single calculation, killing processes when n_ranks > n_kpt would make the tests hang. This also fixes local testing via Pkg.test("DFTK"; test_args = ["mpi"]).

mfherbst

Great idea, thanks for the PR.

Two comments:

Calling exit from within PlaneWaveBasis is super unexpected and can easily lead to spurious bugs. I'm thinking of a case where one uses MPI for DFTK and to run other things and all of a sudden part of the processes in the global communicator are just gone and the program hangs or does undefined stuff.
Instead I propose to actually provide a mechanism to do the appropriate splitting of the communicator outside of a PlaneWaveBasis call. What I'm thinking is to essentially internally call build_kpoints to determine the length(kcoords_global) plus a helper function to do the "shrinking" of the communicator plus potentially calling exit from there. The user would then explicitly call 3 functions. The one to determine the actual numer of k-points, then the split and exit function and then PlaneWaveBasis( ) with the communicator of the appropriate length.

Regarding the tests: I had in mind for a long time to actually switch the logic. Instead of "disabling" MPI tests where needed I think it's better to annotate those, which do support MPI instead. (I.e. instead of a :dont_test_mpi have a simple :mpi on the counter-set). If it's too much work, leave it as is and we do a follow-up ...

mfherbst · 2024-08-21T18:07:31Z

BTW the test failures in the "normal" tests seem fake (because of a too tightly chosen test tolerance, I think), but the cancelled MPI test points to an issue in your implementation (could be related to the exit() btw.)

abussy · 2024-08-22T08:23:39Z

Calling exit from within PlaneWaveBasis is super unexpected and can easily lead to spurious bugs. I'm thinking of a case where one uses MPI for DFTK and to run other things and all of a sudden part of the processes in the global communicator are just gone and the program hangs or does undefined stuff.

Yes, I agree actually. In fact, I was facing this exact problem when trying to run the tests with more than 2 MPI ranks.

Ideally, for a robust behavior, the extra MPI ranks should not exit the program, but wait at a synchronization barrier at the end of execution. I am not quite sure how to implement that in a light way, without an arbitrarily large if block after the creation of the sub-communicator. I will give it some thoughts.

There are 2 important questions to be addressed on the form I think:

Should an input script look any different when a parallel run is intended?
Should the user have to think about creating a sub-communicator or setup parallelization when writing their input?

I would tend to say no to both of the above. I'll try to come up with something along these lines.

abussy · 2024-08-23T11:05:39Z

The latest commit brings major simplifications to this MPI issue. Instead of building sub-communicators and killing processes in a dangerous fashion, I propose to duplicate some k-points over the empty MPI ranks. The weights are adjusted in order to keep exactness of the results. A warning is issued as well.

This is the lightest fix I could think of. It is also safe and robust, and the user does not have to think about parallelism in their script. The only potential issue I see is: if the user queries for the k-point mesh after the calculation, they could get a non irreducible k-mesh with duplicates. One possible solution would be to add an extra field to the basis with the original mesh, or write a function that always returns the irreducible mesh.

Regarding the tests: I had in mind for a long time to actually switch the logic. Instead of "disabling" MPI tests where needed I think it's better to annotate those, which do support MPI instead. (I.e. instead of a :dont_test_mpi have a simple :mpi on the counter-set). If it's too much work, leave it as is and we do a follow-up ...

I don't mind changing from :dont_test_mpi to :mpi tags in the tests. However, if this PR is accepted, I believe all tests could actually be run in parallel (to be checked by me). In such a case, we could get rid of this tag completely.

BTW the test failures in the "normal" tests seem fake (because of a too tightly chosen test tolerance, I think), but the cancelled MPI test points to an issue in your implementation (could be related to the exit() btw.)

In my experience, it seems that the anderson.jl test is particularly unstable.

mfherbst · 2024-10-23T09:29:08Z

I'm in favour of this. @abussy Please update and then we can merge this !

@antoine-levitt Objections ?

mfherbst · 2024-10-23T09:51:57Z

@abussy The only thing that we have to be careful with is unfolding the k-point mesh when not using symmetries and during IO operations.

antoine-levitt · 2024-10-23T15:07:01Z

I haven't looked in details but it looks reasonable, go ahead!

abussy · 2024-10-25T12:26:01Z

Notes:

Added a n_irreducible_kpoints filed to the PlaneWaveBasis --> used in I/O so that only irreducible k-points are considered
Added irreducible_kcoords() and irreducible_kweights() functions for the same reasons
BZ unfolding currently not implemented with MPI, so duplicated k-point won't affect it. Added an extra error message in case MPI support gets implemented
Rebased for compatibility with master branch

mfherbst · 2024-10-29T07:27:54Z

src/PlaneWaveBasis.jl

+        @warn("Attempting to parallelize $n_kpt k-points over $n_procs MPI ranks. DFTK does " *
+              "not support processes empty of k-point. Some k-points were duplicated over the " *


That line is a bit long.

mfherbst · 2024-10-29T07:31:02Z

src/PlaneWaveBasis.jl

+    for i_dupl in basis.n_irreducible_kpoints+1:length(basis.kweights_global)
+        for i_irr in 1:basis.n_irreducible_kpoints


We use = when loops are over integers.

mfherbst · 2024-10-29T07:31:32Z

src/common/mpi.jl

@@ -5,6 +5,7 @@ import MPI
 Number of processors used in MPI. Can be called without ensuring initialization.
 """
 mpi_nprocs(comm=MPI.COMM_WORLD) = (MPI.Init(); MPI.Comm_size(comm))
+mpi_rankid(comm=MPI.COMM_WORLD) = (MPI.Init(); MPI.Comm_rank(comm))


Unused function ... remove or test

mfherbst · 2024-10-29T07:32:29Z

src/symmetry.jl

@@ -414,6 +414,10 @@ function unfold_array(basis_irred, basis_unfolded, data, is_ψ)
    if !(basis_irred.comm_kpts == basis_irred.comm_kpts == MPI.COMM_WORLD)
        error("Brillouin zone symmetry unfolding not supported with MPI yet")
    end
+    if basis_irred.n_irreducible_kpoints < mpi_nprocs(basis_irred.comm_kpts)
+        #Note: if this routine is ever generalised for MPI, need special care for duplicated KP


Space after #. May also be a bit too long.

test/PlaneWaveBasis.jl

src/input_output.jl

mfherbst · 2024-10-29T07:43:08Z

src/PlaneWaveBasis.jl

+    # Assume that duplicated k-points are appended at the end of the kcoords/kweights array
+    for i_dupl in basis.n_irreducible_kpoints+1:length(basis.kweights_global)
+        for i_irr in 1:basis.n_irreducible_kpoints
+            if maximum(abs.(basis.kcoords_global[i_dupl]-basis.kcoords_global[i_irr])) < eps(T)


A few comments:

maximum(abs, ... ) is better (no allocation).

Hmm. This mechanics may be a little brittle. I'd be much more defensive here, e.g. make sure each i_irr only matches a single kpoint, e.g. by using only in combination with findall or so. Similarly I'd put an assertion that the weights add up to the total electron count or so (see how it's done in other places, the correct number depends on the computational setup).

Also too long a line 😄.

src/PlaneWaveBasis.jl

abussy · 2024-10-30T12:13:27Z

Implemented most concerns raised by @mfherbst during his review. I'd like to raise two points for discussion:

I gave some thoughts to changing the :dont_test_mpi test tag to :mpi. I think that, by default, DFTK should be MPI compatible. Any method of function not implemented for MPI is then an exception, and using the :dont_test_mpi test flag makes that clear. I am afraid that changing to :mpi instead would lead to a lower impact of the tag.
When reworking the irreducible_kweights_global() function in PlaneWaveBasis.jl, I realised that in some cases, duplicated k-points already exist (even in non-parallel runs), and I refrained from using functions such as findall(). In particular, when calculating band structures with the compute_bands() function, the K-point path might go through special points multiple times, on purpose. As a result, I kept the logic of looking for replicated k-points in basis.kcoords_global[basis.n_irreducible_kpoints+1:end] explicitly. I might possibly change nomenclature though, as the original set of K-points is not necessarily the irreducible one.

mfherbst · 2024-10-30T19:44:30Z

I gave some thoughts to changing the :dont_test_mpi test tag to :mpi. I think that, by default, DFTK should be MPI compatible. Any method of function not implemented for MPI is then an exception, and using the :dont_test_mpi test flag makes that clear. I am afraid that changing to :mpi instead would lead to a lower impact of the tag.

Hmm. Yes, but I think that's ok, because it's fine if only the core is tested with MPI and not the rest. But let's think about this some more and leave as is for now.

, I realised that in some cases, duplicated k-points already exist (even in non-parallel runs),
...
In particular, when calculating band structures with the compute_bands() function,

True, good point.

~~But still there is a problem. The coordinates can be the same if the spin is different. I'm not sure you handle this proplerly right now ?~~

Update: Ah no, this is before the entire spin business.

src/PlaneWaveBasis.jl

mfherbst reviewed Aug 21, 2024

View reviewed changes

abussy force-pushed the refactoring_para branch from b9f580e to 7a46cef Compare October 25, 2024 12:21

mfherbst reviewed Oct 29, 2024

View reviewed changes

src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved

src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved

src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved

abussy added 4 commits October 29, 2024 09:39

Enhance user experience with MPI

a01f34a

Simplifications

3dd6641

Only irreducible k-points to I/O

50f6ce8

General clean-up

3d84e95

abussy force-pushed the refactoring_para branch from 7a46cef to 3d84e95 Compare October 30, 2024 11:58

mfherbst reviewed Oct 30, 2024

View reviewed changes

src/PlaneWaveBasis.jl Outdated Show resolved Hide resolved

src/PlaneWaveBasis.jl Show resolved Hide resolved

Add safety measure in irreducible_kweiths_global()

33ccb9d

mfherbst merged commit ea0ffe4 into JuliaMolSim:master Oct 31, 2024
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance user experience with MPI #997

Enhance user experience with MPI #997

abussy commented Aug 21, 2024

mfherbst left a comment

mfherbst commented Aug 21, 2024 •

edited

Loading

abussy commented Aug 22, 2024

abussy commented Aug 23, 2024

mfherbst commented Oct 23, 2024

mfherbst commented Oct 23, 2024

antoine-levitt commented Oct 23, 2024

abussy commented Oct 25, 2024

mfherbst Oct 29, 2024

mfherbst Oct 29, 2024

mfherbst Oct 29, 2024

mfherbst Oct 29, 2024

mfherbst Oct 29, 2024

abussy commented Oct 30, 2024

mfherbst commented Oct 30, 2024 •

edited

Loading

		@warn("Attempting to parallelize $n_kpt k-points over $n_procs MPI ranks. DFTK does " *
		"not support processes empty of k-point. Some k-points were duplicated over the " *

		for i_dupl in basis.n_irreducible_kpoints+1:length(basis.kweights_global)
		for i_irr in 1:basis.n_irreducible_kpoints

Enhance user experience with MPI #997

Enhance user experience with MPI #997

Conversation

abussy commented Aug 21, 2024

mfherbst left a comment

Choose a reason for hiding this comment

mfherbst commented Aug 21, 2024 • edited Loading

abussy commented Aug 22, 2024

abussy commented Aug 23, 2024

mfherbst commented Oct 23, 2024

mfherbst commented Oct 23, 2024

antoine-levitt commented Oct 23, 2024

abussy commented Oct 25, 2024

mfherbst Oct 29, 2024

Choose a reason for hiding this comment

mfherbst Oct 29, 2024

Choose a reason for hiding this comment

mfherbst Oct 29, 2024

Choose a reason for hiding this comment

mfherbst Oct 29, 2024

Choose a reason for hiding this comment

mfherbst Oct 29, 2024

Choose a reason for hiding this comment

abussy commented Oct 30, 2024

mfherbst commented Oct 30, 2024 • edited Loading

mfherbst commented Aug 21, 2024 •

edited

Loading

mfherbst commented Oct 30, 2024 •

edited

Loading