Skip to content

Commit

Permalink
fix: unable to find output_dir in multi-GPU during resume_from_checkp…
Browse files Browse the repository at this point in the history
…oint check (foundation-model-stack#352)

* fix: output_dir doesn't exist during resume_from_checkpoint

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

* fix: fmt

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

---------

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>
Signed-off-by: Anh Uong <anh.uong@ibm.com>
  • Loading branch information
Abhishek-TAMU authored Sep 26, 2024
1 parent 1350f8a commit 0c6a062
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions build/accelerate_launch.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ def main():
#
##########
output_dir = job_config.get("output_dir")
if not os.path.exists(output_dir):
os.makedirs(output_dir)
try:
# checkpoints outputted to tempdir, only final checkpoint copied to output dir
launch_command(args)
Expand Down

0 comments on commit 0c6a062

Please sign in to comment.