Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex regression tests fail when ADflow built with new intel compilers #357

Open
A-CGray opened this issue May 24, 2024 · 2 comments
Open
Assignees

Comments

@A-CGray
Copy link
Member

A-CGray commented May 24, 2024

Description

A handful of the complex ADflow regression tests are failing on the latest docker PR that uses the new intel ifx and mpiifx compilers. Most likely we need to re-train the tests.

Current behavior

 /home/***/repos/adflow/tests/reg_tests/test_adjoint.py:TestCmplxStep_2_laminar_tut_wing.cmplx_test_aero_dvs  ... FAIL (00:00:54.57, 1171 MB)
Traceback (most recent call last):
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 392, in multi_proc_exception_check
    yield
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 199, in root_add_dict
    self._add_dict(name, d, name, **kwargs)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 365, in _add_dict
    self._add_dict(key, d[key], full_name, rtol=rtol, atol=atol, db=db[dict_name])
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 367, in _add_dict
    self._add_values(key, d[key], rtol=rtol, atol=atol, db=db[dict_name], full_name=full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 321, in _add_values
    self.assert_allclose(values, db[name], name, rtol, atol, full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 270, in assert_allclose
    np.testing.assert_allclose(actual, reference, rtol=rtol, atol=atol, err_msg=msg)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-08, atol=5e-10
Failed value for: Eval Functions Sens:: mdo_tutorial_cd: mdo_tutorial_cl: mdo_tutorial_cmz: mach_mdo_tutorial
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 1.13985595e-09
Max relative difference: 1.96705549e-08
 x: array(0.057947)
 y: array(0.057947)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/***/repos/adflow/tests/reg_tests/test_adjoint.py", line 372, in cmplx_test_aero_dvs
    self.handler.root_add_dict("Eval Functions Sens:", funcsSens, rtol=rtol, atol=atol)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 197, in root_add_dict
    with multi_proc_exception_check(self.comm):
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 409, in multi_proc_exception_check
    raise exc[0](msg).with_traceback(exc[2])
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 392, in multi_proc_exception_check
    yield
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 199, in root_add_dict
    self._add_dict(name, d, name, **kwargs)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 365, in _add_dict
    self._add_dict(key, d[key], full_name, rtol=rtol, atol=atol, db=db[dict_name])
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 367, in _add_dict
    self._add_values(key, d[key], rtol=rtol, atol=atol, db=db[dict_name], full_name=full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 321, in _add_values
    self.assert_allclose(values, db[name], name, rtol, atol, full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 270, in assert_allclose
    np.testing.assert_allclose(actual, reference, rtol=rtol, atol=atol, err_msg=msg)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
    raise AssertionError(msg)
AssertionError: Exception raised on rank 0: 
Not equal to tolerance rtol=1e-08, atol=5e-10
Failed value for: Eval Functions Sens:: mdo_tutorial_cd: mdo_tutorial_cl: mdo_tutorial_cmz: mach_mdo_tutorial
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 1.13985595e-09
Max relative difference: 1.96705549e-08
 x: array(0.057947)
 y: array(0.057947)

Expected behavior

Tests should pass

Code versions

  • Operating System:
  • Python:
  • OpenMPI:
  • CGNS:
  • PETSc:
  • Compiler:
  • This repository:
@A-CGray
Copy link
Member Author

A-CGray commented Oct 29, 2024

When trying to run the adflow tests on the public:u22-intel-impi-latest-amd64 image from https://github.com/mdolab/docker/pull/266 on my machine I get the following errors on many of the tests, any idea what's going on here @eirikurj ?

(mpi) ./tests/reg_tests/test_functionals.py:TestFunctionals_2_euler_matrix_jst_tut_wing.test_forces_and_tractions  ... FAIL (00:00:0.00, 0 MB)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 3826 RUNNING AT 9dee14de4df5
=   KILLED BY SIGNAL: 7 (Bus error)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 3827 RUNNING AT 9dee14de4df5
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

(mpi) ./tests/reg_tests/test_functionals.py:TestFunctionals_2_euler_matrix_jst_tut_wing.test_functions  ... FAIL (00:00:0.00, 0 MB)

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 3837 RUNNING AT 9dee14de4df5
=   KILLED BY SIGNAL: 7 (Bus error)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 3838 RUNNING AT 9dee14de4df5
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

@eirikurj
Copy link
Contributor

You probably need to increase the shared memory size. You can add a flag when starting the container, docker run --shm-size=XX. The default is 65MB, but you can increase it significantly, e.g., for 2GB add --shm-size=2G. This is probably too big in general (something like O(100) MB, e.g., 256MB is probably sufficient), but should be fine since we should have plenty of RAM and not too many containers running, but you can experiment. If you dont want to bother with per-container settings, then you can add the following to /etc/docker/daemon.json

{
    "default-shm-size": "2G"
}

but you might want to keep this smaller then. See if this resolves your immediate problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants