Simplify/future-proof parallel loops #35

johnomotani · 2021-11-22T12:06:10Z

Looking ahead, implementing MPI domain decomposition might require moving some loops inside generic advection functions. Also as we add more dimensions, the current structure (in #34) where we need to get the correct index-range for each level of the loop will get more complicated and fragile.

This issue is to discuss possible improvements that would make the code more maintainable (for example if we want to re-order array dimensions in future without it being as painful as #22 was).

`NamedDims.jl`

I think using NamedDims.jl (https://github.com/invenia/NamedDims.jl) would be a good start:

NamedDimsArray is a zero-cost abstraction to add names to the dimensions of an array.

It would let us index arrays in a more readable way, and because names are used the order of indices doesn't matter! E.g.

julia> using NamedDims

julia> f = NamedDimsArray{(:z,:y,:x)}(rand(4,3,2));

julia> f[x=2,z=3] # this works, giving a slice
3-element NamedDimsArray{(:y,), Float64, 1, Vector{Float64}}:
 0.8249968213028751
 0.9973308654206787
 0.2462796811020469

julia> f[z=3,x=2] # this works too - index order doesn't matter!
3-element NamedDimsArray{(:y,), Float64, 1, Vector{Float64}}:
 0.8249968213028751
 0.9973308654206787
 0.2462796811020469

Where to keep loop ranges?

It would be useful to have just one struct containing all the loop ranges. One way, keeping the choice from #34 to put the ranges in coordinate structs, would be to create a wrapper coordinates struct containing all the coordinates. An alternative would be to take the loop ranges out of coordinate and put them into some struct of their own. I think the second option might be the best because as we go to more dimensions, I think we will end up with a large number of ranges so it would be best to hide them away somewhere.

loop macros?

The final part, which I'm not so sure about a good design for, would be to design a macro (?) to create for-loops. I'd like to replace something like

for is in loop_ranges.species_range_level0,
        ix in loop_ranges.x_range_level1,
        iy in loop_ranges.y_range_level2,
        iz in loop_ranges.z_range_level3,
        ivpa in loop_ranges.vpa_range_level4
    @views update_f1!(pdf[vpa=ivpa,z=iz,y=iy,x=ix], ...)
end
block_synchronize()
for is in loop_ranges.species_range_level0,
        ix in loop_ranges.x_range_level1,
        iy in loop_ranges.y_range_level2,
        ivpa in loop_ranges.vpa_range_level3,
        ivperp in loop_ranges.vperp_range_level4
    @views update_f2!(pdf[vperp=ivperp,vpa=ivpa,y=iy,x=ix], ...)
end

with maybe

@loop_s_x_y_z_vpa is ix iy iz ivpa begin
    @views update_f1!(pdf[vpa=ivpa,z=iz,y=iy,x=ix], ...)
end
block_synchronize()
@loop_s_x_y_vpa_vperp is ix iy ivpa ivperp begin
    @views update_f2!(pdf[vperp=ivperp,vpa=ivpa,y=iy,x=ix], ...)
end

Questions I have:

is it better to pass loop_ranges (or equivalent) as a parameter to the macro, or keep it as a 'global' (within some module namespace) variable. I'm leaning towards taking the loop ranges out of coordinate, putting them in a dedicated struct like loop_ranges, and then make that a pseudo-global because it will be constant for the whole of any run, and it will be much simpler not to have to pass it around into every function.
Do we pass the loop variables as parameters to the macro (as in the example above) or hard code them? I have slight doubts about how clear things are if we do re-order array dimensions, but once we're using NamedDims.jl we can think of the layout in memory as an implementation detail, and write code just using a fixed, logical order (like {species,x,y,z,vpa,vperp}). If we 'hard code' the loop variables, there is less to type but the loop variables appear 'by magic' out of the macro (which I don't like much), i.e. the loop would look like
```
@loop_s_x_y_z_vpa begin
    @views update_f1!(pdf[vpa=ivpa,z=iz,y=iy,x=ix], ...)
end
```

Fancier generic looping?

I'd originally hoped we'd be able to do something generic like make a derivative!() function that takes a dim argument and then internally loops over all dimensions except for dim while calculating a derivative along dim. Still thinking about if this is possible...

One thing that doesn't work is to get the dimensions from an array in a macro and operate on them, because the dimensions are encoded in the type of the NamedDimsArray, but macros only operate on symbols and do not have access to type information, so this doesn't work.

It might be possible to just hard-code a list of dimension names and have a macro select the appropriate loop (like @loop_s_x_y_z_vpa, etc.) using a dim parameter.

Another possibility might be to somehow use dynamic dispatch and pass in the 'kernel function' that should be called inside the loop. Something like (for inner loop over z)

function multidim_loop(inner_dim::Val{:z}, kernel::Function, args...)
    @loop_s_x_y_vpa_vperp is ix iy ivpa ivperp begin
        kernel((s=is, x=ix, y=iy, vpa=ivpa, vperp=ivperp), args...)
    end
end

and duplicates (possibly auto-generated) for inner loops being other dimensions. Not sure it's possible to pass the indices into the kernel like this (it seems not to be supported by NamedDims.jl natively at the moment, not sure if it's an easy function to add with a new version of to_index(), or if it's bad because it wouldn't compile away, so would have to pay the cost of creating a NamedTuple to pass into kernel() at every step of the loop... Might be possible to assume more about the structure of the arguments to kernel, and do something like

function multidim_loop(inner_dim::Val{:z}, kernel::Function, f::AbstractArray, args...)
    @loop_s_x_y_vpa_vperp is ix iy ivpa ivperp begin
        kernel(f[s=is, x=ix, y=iy, vpa=ivpa, vperp=ivperp, args...)
    end
end

The text was updated successfully, but these errors were encountered:

johnomotani · 2021-11-29T10:03:27Z

I had a go at using NamedDims.jl last week (on the NamedDims branch). It was a bit disappointing:

The 'zero-cost' promise/design-goal seems not to be quite true yet. I ran into particular issues with broadcasting (Type instability when broadcasting invenia/NamedDims.jl#187, Allocations in broadcasting ^ invenia/NamedDims.jl#189). So the NamedDims version of the code is slower than the original. It might be possible to fix or work around the issues, either in our code or in NamedDims.jl (I did actually manage to fix one of the bugs 2-argument function to help const propagation in broadcasted_names() invenia/NamedDims.jl#188, and on the plus side the NamedDims.jl developers were very responsive). Based on my experience so far though, it seems like there would be some work (at least a week or two) to avoid slow downs, and some ongoing effort required to identify and fix/avoid them in future.
Initializing NamedDimsArrays and structs containing them was annoyingly fiddly. It was tricky to get all our initialization code to be type-stable. The advection_info and semi_lagrange_info structs (which contain multi-dimensional arrays, but are created in specialized versions for each dimension in which advection is needed) needed quite a few dimension labels added so that all the types could be inferred at compile time.

Although I like the idea of the NamedDims.jl package, I think for the moment it is not worth using for us. Right now it seems like it would be more of a burden for code maintenance than a help, so it's definitely not worth paying potential small performance costs.

If the Julia compiler can get even smarter at compile-time inference, and NamedDims.jl is developed further, the 'zero-cost' aim might be reached. Could be worth revisiting in a year or two...

johnomotani added the discussion label Nov 22, 2021

johnomotani mentioned this issue Jan 10, 2022

Loop macros #39

Merged

mabarnes closed this as completed in #39 Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify/future-proof parallel loops #35

Simplify/future-proof parallel loops #35

johnomotani commented Nov 22, 2021 •

edited

Loading

johnomotani commented Nov 29, 2021 •

edited

Loading

Simplify/future-proof parallel loops #35

Simplify/future-proof parallel loops #35

Comments

johnomotani commented Nov 22, 2021 • edited Loading

NamedDims.jl

Where to keep loop ranges?

loop macros?

Fancier generic looping?

johnomotani commented Nov 29, 2021 • edited Loading

johnomotani commented Nov 22, 2021 •

edited

Loading

`NamedDims.jl`

johnomotani commented Nov 29, 2021 •

edited

Loading