Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework parallel doc example using CalcJob's #288

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

agoscinski
Copy link
Contributor

The previous example did rely on calcfunctions that are always run sequentially. This example now uses CalcJobs to actually achieve parallel executions.

@superstar54 I removed the old example since it did not make sense to me much, since it was using calcfunctions, but I might be missing something you wanted to actually show with it.

@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.64%. Comparing base (5937b88) to head (d28bb99).
Report is 79 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #288      +/-   ##
==========================================
+ Coverage   75.75%   80.64%   +4.89%     
==========================================
  Files          70       66       -4     
  Lines        4615     5147     +532     
==========================================
+ Hits         3496     4151     +655     
+ Misses       1119      996     -123     
Flag Coverage Δ
python-3.11 80.55% <ø> (+4.88%) ⬆️
python-3.12 80.55% <ø> (?)
python-3.9 80.60% <ø> (+4.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@agoscinski agoscinski marked this pull request as ready for review September 2, 2024 08:52
@superstar54
Copy link
Member

superstar54 commented Sep 3, 2024

Hi @agoscinski , thanks for your efforts on this.

Yes, using CalcJob is indeed the correct approach for running jobs in parallel.

The old example, which utilizes a for loop to create multiple tasks, reflects many real-world scenarios where parallel execution is essential. For instance:

  • Calculating the equation of state, where multiple SCF tasks are created and need to run in parallel.
  • Generating multiple surface slabs from a crystal structure.
  • Simulating molecule adsorption on the non-equivalent sites of a surface slab.
  • Performing XPS calculations on non-equivalent sites.

Besides, in most cases, after these tasks run in parallel, we need to gather and process the results.

The old example is quite valuable for realistic applications, as already mentioned in this post. Therefore, I’d like to suggest keeping the old example but updating it to replace the calcfunction with CalcJob. Additionally, the provenance graph is very informative and beneficial for users to understand the workflow, so I recommend retaining it as well.

@agoscinski
Copy link
Contributor Author

I wanted to put the for loop back into the notebook but compare it to executing WorkGraphs separately so I can relate it a bit to what is written here https://aiida.discourse.group/t/run-only-one-job-on-local-machine/459/2
but I am not sure in this case how to actually get the time when the process has finished. I first used mtime for it as I thought the node is modified after the execution has finished, but the resulting mtime-ctime is below the sleeping time. Is there another way to take the time of a process using submit? I dont want to time it in the notebook as I need to submit two workgraphs.

from aiida import load_profile
from aiida_workgraph import WorkGraph
from aiida.calculations.arithmetic.add import ArithmeticAddCalculation
from aiida.orm import load_code, load_node, load_computer, InstalledCode

load_profile()

from aiida.common.exceptions import NotExistent

# The ArithmeticAddCalculation needs to know where bash is stored
try:
    code = load_code("add@localhost")  # The computer label can also be omitted here
except NotExistent:
    code = InstalledCode(
        computer=load_computer("localhost"),
        filepath_executable="/bin/bash",
        label="add",
        default_calc_job_plugin="core.arithmetic.add",
    ).store()



@task.graph_builder()
def parallel_add(nb_it):
    wg = WorkGraph()
    code = load_code("add@localhost")
    for i in range(nb_it):
        add = wg.add_task(ArithmeticAddCalculation, name=f"add{i}", x=5, y=i, code=code)
        add.set({"metadata.options.sleep": 5})
    return wg
    
wg = WorkGraph(f"parallel_graph_builder")
t = wg.add_task(parallel_add, nb_it=2)

wg.submit(wait=True)
print(load_node(wg.pk).mtime - load_node(wg.pk).ctime)

@superstar54
Copy link
Member

but the resulting mtime-ctime is below the sleeping time.

This shouldn't happen. Could you double-check?

@agoscinski
Copy link
Contributor Author

Ah it is because of caching. After I disabled the caching with verdi the caching is still somehow active, not sure why. I will try to reproduce it later and do an issue on aiida-core

@agoscinski agoscinski force-pushed the parallel-rework branch 2 times, most recently from 45eb592 to c977986 Compare September 10, 2024 08:09
docs/source/conf.py Outdated Show resolved Hide resolved
@agoscinski
Copy link
Contributor Author

agoscinski commented Sep 10, 2024

I had to rename the parallel file so no cached version is used from read the docs. I am still confused why this happened. Since I enforced a rerun on the sphinx-gallery side using run_stale_examples, it might be due to some caching of RTD. When the PR has been reviewed, I will change the name of the name after the review.

Copy link
Member

@superstar54 superstar54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @agoscinski thanks for the work. I do have one concern: the current execution time of 2 minutes and 4.786 seconds is quite long.
Here are my suggestions:

  • The "Parallelizing WorkGraphs" section is not needed. Take one use case as an example, I have PwRelax workgraph and a large set of structures to relax. Users just need to directly submit multiple workgraphs using a simple loop, without waiting for each to finish. If users write another workgraph to manage parallel execution, users need to think about how to passing input into the sub-workgraph, and handling potential interruptions of the top-level workgraph. While tracking the provenance of 100 workgraphs submitted together might be a potential benefit, it’s not something most users would require.
  • Add a description of the workgraph execution mechanism at the beginning of the notebook: Workgraphs are based on dependency-driven execution, meaning parallel tasks run automatically if the tasks have no dependency on each others, and there are sufficient resources. I recall you demonstrated this well in the EuroScipy tutorial, with a workgraph showing how the process works in parallel (the execution order). You could include that example here, and also show the GUI for better visualization.
  • Better to show the GUI for every workgraph: This will help users easily see which tasks will be run in parallel.
  • Mention the result-gathering process: Typically, users will want to gather the results after parallel tasks are complete. You can add a link to the aggregate notebook, and if it's not ready yet, you can include a comment that it will be added later.

docs/gallery/howto/autogen/parallel_wf.py Outdated Show resolved Hide resolved
@task.graph_builder(
inputs=[{"name": "integer"}], outputs=[{"name": "sum", "from": "sum_task.result"}]
)
def add10_wg(integer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use this name add10_wg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add10 because it adds 10 to a number. wg is I think a bit general indicating the problem that I have when I want to refer a graph builder. It can be a function as defined here, a task when integrated into a WorkGraph, and it is actually a WorkGraph that is executed. Because the whole tutorial is about parallelizing workgraphs I added wg into the name, but we can also remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to add10 I think that is more consistent with the rest

docs/gallery/howto/autogen/parallel_wf.py Outdated Show resolved Hide resolved
docs/gallery/howto/autogen/parallel_wf.py Outdated Show resolved Hide resolved
@task.graph_builder(
inputs=[{"name": "integer"}], outputs=[{"name": "sum", "from": "sum_task.result"}]
)
def add10_wg(integer):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add10 because it adds 10 to a number. wg is I think a bit general indicating the problem that I have when I want to refer a graph builder. It can be a function as defined here, a task when integrated into a WorkGraph, and it is actually a WorkGraph that is executed. Because the whole tutorial is about parallelizing workgraphs I added wg into the name, but we can also remove it.

docs/gallery/howto/autogen/parallel_wf.py Outdated Show resolved Hide resolved
docs/gallery/howto/autogen/parallel_wf.py Outdated Show resolved Hide resolved
docs/gallery/howto/autogen/parallel_wf.py Outdated Show resolved Hide resolved
docs/gallery/howto/autogen/parallel_wf.py Outdated Show resolved Hide resolved
@agoscinski
Copy link
Contributor Author

agoscinski commented Sep 11, 2024

The "Parallelizing WorkGraphs" section is not needed. Take one use case as an example, I have PwRelax workgraph and a large set of structures to relax. Users just need to directly submit multiple workgraphs using a simple loop, without waiting for each to finish. If users write another workgraph to manage parallel execution, users need to think about how to passing input into the sub-workgraph, and handling potential interruptions of the top-level workgraph. While tracking the provenance of 100 workgraphs submitted together might be a potential benefit, it’s not something most users would require.

I would not rely on any qe dependent documentation pages as I think we should move them to aiida-tutorials. Also this is more focused on the parallel execution, measuring timings and talking about daemons. If there is a user base for which both use cases might be useful, then we should keep it (at least until more feedback from outside). If it is the running time, then we should work an improving this. I will reduce the sleeps and the iterations at the end to reduce the running time to 1 minute.

Add a description of the workgraph execution mechanism at the beginning of the notebook: Workgraphs are based on dependency-driven execution, meaning parallel tasks run automatically if the tasks have no dependency on each others, and there are sufficient resources. I recall you demonstrated this well in the EuroScipy tutorial, with a workgraph showing how the process works in parallel (the execution order). You could include that example here, and also show the GUI for better visualization.

This is mentioned in the second sentence of the example. I will include the GUI and write comments in the first example emphasizing on this. Maybe you can note in the code where I should also mention it.

Better to show the GUI for every workgraph: This will help users easily see which tasks will be run in parallel.

Included!

Mention the result-gathering process: Typically, users will want to gather the results after parallel tasks are complete. You can add a link to the aggregate notebook, and if it's not ready yet, you can include a comment that it will be added later.

Okay reference it at the end, but the sphinx reference does not work for the moment, since we need to wait for #287 to be merged

@agoscinski
Copy link
Contributor Author

I decreased the time now to 1min11s by reusing the graph builder runs from before for the daemon part. Also I just keep 2 iterations as this is enough for the showcase of the effect of daemons. I reduced the sleep time to 3 seconds.

@superstar54
Copy link
Member

Hi @agoscinski , the docs failed to build. The pre-commit failed. Could you please fix them? I will review them as soon as they pass.

@@ -0,0 +1,278 @@
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that RTD caches the other file somehow, therefore I renamed it. I don't fully understand this.

The previous example did rely on calcfunctions that are always
run sequentially. This example now uses CalcJobs to actually
achieve parallel executions.
Comment on lines +266 to +267
# Be aware that for the moment AiiDA can only run 200 WorkGraphs at the same time.
# To increase that limit one can set this variable to a higher value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Be aware that for the moment AiiDA can only run 200 WorkGraphs at the same time.
# To increase that limit one can set this variable to a higher value.
# Be aware that for the moment, AiiDA can only run 200 processes (WorkGraph, CalcJob etc) at the same time.
# To increase that limit, one can set this variable to a higher value.

Comment on lines +227 to +229
# Since each daemon worker can only manage one WorkGraph (handling the results)
# at a time, one can experience slow downs when running many jobs that can be
# run in parallel. The optimal number of workers depends highly on the jobs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Since each daemon worker can only manage one WorkGraph (handling the results)
# at a time, one can experience slow downs when running many jobs that can be
# run in parallel. The optimal number of workers depends highly on the jobs
# One can experience slow downs when running many jobs (e.g., 100 jobs) that can be
# run in parallel. The optimal number of workers depends highly on the jobs

Comment on lines 254 to 255
# The overhead time has shortens a bit as the handling of the CalcJobs and
# WorkGraphs could be parallelized. One can increase the number of iterations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure on he overhead time has shortens a bit?

Looking at the time, there is no improvement.

Time for running parallelized graph builder 0:00:11.262496
Time for running parallelized graph builder with 2 daemons 0:00:11.264958

Comment on lines 272 to 273
# verdi daemon restart

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a link to the performance page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants