From 175e6dc224e330fb18bfc6444e8fba53792ab500 Mon Sep 17 00:00:00 2001 From: ifilot Date: Thu, 24 Aug 2023 12:11:46 +0200 Subject: [PATCH 1/9] Improving installation instructions --- docs/installation.rst | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/docs/installation.rst b/docs/installation.rst index 27ab8e6..59278ea 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -11,6 +11,7 @@ available to you: * `Eigen3 `_ (matrix algebra) * `Boost `_ (common routines) * `TCLAP `_ (command line instruction library) +* `CMake `_ (build tool) On Debian-based operating systems, one can run the following:: @@ -21,6 +22,9 @@ On Debian-based operating systems, one can run the following:: is to use `Debian for Windows Subsystem for Linux (WSL) `_. The compilation instructions below can be readily used. +.. warning:: + In order to compile for GPU using CUDA, one needs Eigen3 version **3.4.0** or higher. + Compilation ----------- @@ -47,8 +51,8 @@ CUDA support version 2, you can use CUDA from the Linux environment under Windows. Detailed instructions are given `here `_. -The similarity analysis functionality of :program:`Bramble` significantly -benefits from the availability of a graphical card. To compile :program:`Bramble` +The similarity analysis functionality of :program:`Bramble` can +benefit from the availability of a graphical card. To compile :program:`Bramble` with CUDA support, run CMake with:: cmake ../src -DMOD_CUDA=1 -DCUDA_ARCH= @@ -56,7 +60,7 @@ with CUDA support, run CMake with:: wherein `` is replaced with the architecture of your graphical card. For example, if you use an RTX 4090, you would use ``-DCUDA_ARCH=sm_89``. To test that :program:`Bramble` can use your GPU, you can run the ``bramblecuda`` -tool:: +tool whose sole function is to test for the availability of a GPU on the system:: ./bramblecuda @@ -79,9 +83,12 @@ Typical output should look as follows:: Peak Memory Bandwidth (GB/s): 1008.1 .. note:: - There is currently no support for using multiple GPUs. :program:`Bramble` - automatically selects the first GPU available and executes the code on this - GPU. Multi-GPU support is however in development. + * There is currently no support for using multiple GPUs. :program:`Bramble` + automatically selects the first GPU available and executes the code on this + GPU. Multi-GPU support is however in development. + * The functionality of `bramblecuda` is only for showing information on the + GPUs available on your system. The actual GPU-accelerated calculation is + still handled by the `bramble` executable. Testing ------- From ca2039597c8e985b7f2b6d480afdb2c97f6cf17a Mon Sep 17 00:00:00 2001 From: Ivo Filot Date: Sat, 26 Aug 2023 21:32:43 +0200 Subject: [PATCH 2/9] Adding execution times --- docs/examples.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/docs/examples.rst b/docs/examples.rst index bbfcef2..c86ac8f 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -163,3 +163,20 @@ to bulk atoms, :math:`\mu_{ij} \approx 36` is found. .. figure:: _static/img/similarity_analysis_co1121.png :align: center + +Execution times +*************** + +To get an impression of typical execution times and the benefit of GPU +acceleration, we refer to the Table as seen below. + +.. list-table:: Execution times for the Co HCP 11-21 + :widths: 50 50 + :header-rows: 1 + + * - System + - Execution time (averaged) + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz + - 172.58s + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz + RTX 4090 + - 90.18s From 22e7d17fc52fa96bf68b4700abd41b5b9ce9ca9a Mon Sep 17 00:00:00 2001 From: Ivo Filot Date: Sat, 26 Aug 2023 21:57:21 +0200 Subject: [PATCH 3/9] Migrating execution times --- .gitignore | 2 ++ docs/examples.rst | 35 ++++++++++++++++++----------------- 2 files changed, 20 insertions(+), 17 deletions(-) diff --git a/.gitignore b/.gitignore index 1219a46..4765fcd 100644 --- a/.gitignore +++ b/.gitignore @@ -39,3 +39,5 @@ examples/data # Sphinx docs docs/_build/ docs/userguide/build* +pa_*.txt +sa_*.txt diff --git a/docs/examples.rst b/docs/examples.rst index c86ac8f..b611b1c 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -61,10 +61,26 @@ between the surface and bulk atoms amounts to :math:`\mu_{ij} = 30.8`. .. figure:: _static/img/similarity_analysis_rh111.png :align: center +Execution times +*************** + +To get an impression of typical execution times and the benefit of GPU +acceleration, we refer to the Table as seen below. + +.. list-table:: Execution times for the Rh FCC111 example + :widths: 50 50 + :header-rows: 1 + + * - System + - Execution time (averaged) + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz + - 172.58s + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz + RTX 4090 + - 90.18s + Co HCP 11-21 ------------ - The following code is used to run this example:: ./build/bramble -p patterns/patterns.json -i src/test/data/POSCAR_Co1121 -o pa_co1121.txt @@ -146,7 +162,7 @@ atoms are automatically recognized. Continuing the study by performing a similarity analysis by running:: - ./build/bramble -s -i src/test/data/POSCAR_Rh111 -o sa_fcc111.txt + ./build/bramble -s -i src/test/data/POSCAR_Co1121 -o sa_fcc111.txt yields the result as shown in the image below. Comparing the image with the CNA pattern per atom above, we can readily interpret this result. The light @@ -164,19 +180,4 @@ to bulk atoms, :math:`\mu_{ij} \approx 36` is found. .. figure:: _static/img/similarity_analysis_co1121.png :align: center -Execution times -*************** - -To get an impression of typical execution times and the benefit of GPU -acceleration, we refer to the Table as seen below. - -.. list-table:: Execution times for the Co HCP 11-21 - :widths: 50 50 - :header-rows: 1 - * - System - - Execution time (averaged) - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz - - 172.58s - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz + RTX 4090 - - 90.18s From 0a9315e8466d6795521cfabccacb87f40fd143c0 Mon Sep 17 00:00:00 2001 From: Ivo Filot Date: Sun, 27 Aug 2023 10:19:57 +0200 Subject: [PATCH 4/9] Adding calculation times for Co1121 example --- docs/examples.rst | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/docs/examples.rst b/docs/examples.rst index b611b1c..5d3705c 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -73,9 +73,9 @@ acceleration, we refer to the Table as seen below. * - System - Execution time (averaged) - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) - 172.58s - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz + RTX 4090 + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + RTX 4090 - 90.18s Co HCP 11-21 @@ -180,4 +180,19 @@ to bulk atoms, :math:`\mu_{ij} \approx 36` is found. .. figure:: _static/img/similarity_analysis_co1121.png :align: center +Execution times +*************** + +To get an impression of typical execution times and the benefit of GPU +acceleration, we refer to the Table as seen below. +.. list-table:: Execution times for the Co HCP 11-21 example + :widths: 50 50 + :header-rows: 1 + + * - System + - Execution time (averaged) + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + - 2368.63s + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + RTX 4090 + - 5207.93s From a00b16ffab4e3102ca5400376209b725db7562e4 Mon Sep 17 00:00:00 2001 From: Ivo Filot Date: Sun, 27 Aug 2023 10:22:07 +0200 Subject: [PATCH 5/9] Add notation execution times --- docs/examples.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/examples.rst b/docs/examples.rst index 5d3705c..cbb7856 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -193,6 +193,6 @@ acceleration, we refer to the Table as seen below. * - System - Execution time (averaged) * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) - - 2368.63s + - 2368.63s (39m28s) * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + RTX 4090 - - 5207.93s + - 5207.93s (1h26m47s) From 0af2fc5b6862c3d54a7b5ae9bbc62558567cd45f Mon Sep 17 00:00:00 2001 From: ifilot Date: Mon, 4 Sep 2023 19:42:46 +0200 Subject: [PATCH 6/9] Expanding documentation --- docs/examples.rst | 8 +++++-- docs/execution_model.rst | 51 ++++++++++++++++++++++++++++++++++++++++ docs/index.rst | 1 + docs/publications.rst | 4 ++++ docs/user_interface.rst | 15 ++++++++++++ 5 files changed, 77 insertions(+), 2 deletions(-) create mode 100644 docs/execution_model.rst diff --git a/docs/examples.rst b/docs/examples.rst index cbb7856..99fbf2d 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -77,6 +77,10 @@ acceleration, we refer to the Table as seen below. - 172.58s * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + RTX 4090 - 90.18s + * - Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz (6 threads) + - 311.84s + * - Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz (6 threads) + RTX 2070 + - 125.08s Co HCP 11-21 ------------ @@ -193,6 +197,6 @@ acceleration, we refer to the Table as seen below. * - System - Execution time (averaged) * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) - - 2368.63s (39m28s) - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + RTX 4090 - 5207.93s (1h26m47s) + * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + RTX 4090 + - 2368.63s (39m28s) diff --git a/docs/execution_model.rst b/docs/execution_model.rst new file mode 100644 index 0000000..f065dc2 --- /dev/null +++ b/docs/execution_model.rst @@ -0,0 +1,51 @@ +.. _execution_model: +.. index:: Execution model + +Execution model +=============== + +When :program:`Bramble` is compiled with the CUDA module, one can use GPU +acceleration to speed up the execution. This is especially beneficial when +performing a similarity analysis. :program:`Bramble` supports multi-GPU +setups, so one can use multiple GPUs if more than one GPU is available. + +When performing the similarity analysis, an inventory of all the jobs is made. +``N+1`` OpenMP threads are being spawned where ``N`` equals the number of GPUs. +Each GPU gets assigned a CPU thread and jobs are relayed to the GPU via the CPU +thread. The remaining OpenMP thread employs so-called nested parallellism and +executes another OpenMP parallel environment which uses all CPUs. + +Obviously, this implies that the ``N`` CPU threads which are involved in +managing the GPUs are also used for other parts of the calculation. Since the +computational load of managing the GPUs is however relatively minimal, this does +come at a huge impact on performance. In fact, not using these CPUs is worse +than partially also using them to manage the GPUs. + +When no GPUs are available, :program:`Bramble` uses no nested parallelism and +simply executes all jobs concurrently wherein each job uses OpenMP parallelism +on a per-job basis. + +.. _memory_load: + +Memory load +----------- + +Performing calculations is quite memory expensive and as a rule of thumb, one +needs roughly 8GB of memory per execution thread. For example, if one uses +two GPUs, one needs roughly 24GB of memory. If memory is limited, one option +is to use swapping, however this comes at a great cost on performance. Nevertheless, +it might still be beneficial. + +Assuming the user has root privileges, one can use the following instructions +to increase the amount of swap memory:: + + sudo mkswap /swapfile + sudo chmod 600 /swapfile + sudo swapon /swapfile + sudo swapon --show + +Typical output would yield:: + + NAME TYPE SIZE USED PRIO + /dev/sdb3 partition 976M 976M -2 + /swapfile file 8G 196.3M -3 diff --git a/docs/index.rst b/docs/index.rst index 04a4846..c0ef303 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -50,6 +50,7 @@ requests are ideally submitted via the `github issue tracker installation background + execution_model gallery user_interface examples diff --git a/docs/publications.rst b/docs/publications.rst index e155b65..834bbe5 100644 --- a/docs/publications.rst +++ b/docs/publications.rst @@ -6,6 +6,10 @@ Publications The following publications make use of :program:`Bramble` +* *Unraveling the Role of Metal–Support Interactions on the Structure Sensitivity + of Fischer–Tropsch Synthesis*, van Etten, M.P.C., de Laat, M.E., Hensen, E.J.M., + Filot, I.A.W., J. Phys. Chem. C, **2023**, 127, 31, 15148-15156, + DOI: `10.1021/acs.jpcc.3c02240 `_ * *Enumerating Active Sites on Metal Nanoparticles: Understanding the Size Dependence of Cobalt Particles for CO Dissociation*, van Etten M.P.C., Zijlstra B., Hensen E.J.M., Filot, I.A.W., ACS Catal., **2021**, 11, 14, diff --git a/docs/user_interface.rst b/docs/user_interface.rst index 54570d4..fbaa066 100644 --- a/docs/user_interface.rst +++ b/docs/user_interface.rst @@ -13,6 +13,14 @@ validate the pattern library. line arguments as long as any instructions belonging to a specific argument are directly after that argument. +.. warning:: + * Note that :program:`Bramble` uses roughly 8GB per execution thread, where + the number of execution threads is ``N+1`` where ``N`` is the number of GPUs. + See also :ref:`this page `. + * For systems having **multiple** GPUs, one needs to explicitly set + ``--ngpu `` to make use of all GPUs. If not, only one of + the GPUs is being used. + Bramble ------- @@ -57,6 +65,13 @@ mandatory command line arguments:: * ``-o``, ``--output`` ```` Where to write the output to. +* ``-g``, ``--ngpu`` ```` + Number of GPUs to use. This option is only available when :program:`Bramble` + is compiled with the CUDA module. If more GPUs are allocated via this tag + than the number of GPUs available, the number is automatically lowered to + match the number of GPUs available. The default value is 1, so for multi-GPU + systems, the user needs to manually adjust this value. + *Example*: ``./bramble -p ../patterns/patterns.json -i ../src/test/data/co_np.geo -o result.txt`` Typical output looks as follows:: From 1dfeb18b133506ade91e6793a1ceab17009cc405 Mon Sep 17 00:00:00 2001 From: ifilot Date: Tue, 5 Sep 2023 12:44:50 +0200 Subject: [PATCH 7/9] Adding computational times --- docs/examples.rst | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/examples.rst b/docs/examples.rst index 99fbf2d..392ee66 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -73,13 +73,13 @@ acceleration, we refer to the Table as seen below. * - System - Execution time (averaged) - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + * - Intel(R) Core(TM) i9-10900K (20 threads) - 172.58s - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + RTX 4090 + * - Intel(R) Core(TM) i9-10900K (20 threads) + RTX 4090 - 90.18s - * - Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz (6 threads) + * - Intel(R) Core(TM) i5-8400 (6 threads) - 311.84s - * - Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz (6 threads) + RTX 2070 + * - Intel(R) Core(TM) i5-8400 (6 threads) + RTX 2070 - 125.08s Co HCP 11-21 @@ -196,7 +196,15 @@ acceleration, we refer to the Table as seen below. * - System - Execution time (averaged) - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + * - Intel(R) Core(TM) i9-10900K (20 threads) - 5207.93s (1h26m47s) - * - Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (20 threads) + RTX 4090 + * - Intel(R) Core(TM) i9-10900K (20 threads) + RTX 4090 - 2368.63s (39m28s) + * - Intel(R) Core(TM) i5-8400 (6 threads) + RTX 2070 + - 2986.00s (49m46s) + * - Intel(R) Xeon(R) Gold 6234 (16 threads) + A5000 + - 3912.19 (65m12s) + * - Intel(R) Core(TM) i5-12400F (12 threads) + 1x GTX 1080 Ti + - 2759.49 (45m59s) + * - Intel(R) Core(TM) i5-12400F (12 threads) + 2x GTX 1080 Ti + - 2067.24s (34m27s) From 61d3bab4ca5544fce8b5c37b4651d166cc60f18b Mon Sep 17 00:00:00 2001 From: ifilot Date: Thu, 7 Sep 2023 08:41:01 +0200 Subject: [PATCH 8/9] Adding easybuild --- docs/installation.rst | 14 ++++++++++---- src/test/test_similarity.cpp | 12 +++++++++--- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/docs/installation.rst b/docs/installation.rst index 59278ea..e337c82 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -55,11 +55,9 @@ The similarity analysis functionality of :program:`Bramble` can benefit from the availability of a graphical card. To compile :program:`Bramble` with CUDA support, run CMake with:: - cmake ../src -DMOD_CUDA=1 -DCUDA_ARCH= + cmake ../src -DMOD_CUDA=1 -wherein `` is replaced with the architecture of your graphical card. For -example, if you use an RTX 4090, you would use ``-DCUDA_ARCH=sm_89``. To -test that :program:`Bramble` can use your GPU, you can run the ``bramblecuda`` +To test that :program:`Bramble` can use your GPU, you can run the ``bramblecuda`` tool whose sole function is to test for the availability of a GPU on the system:: ./bramblecuda @@ -127,3 +125,11 @@ Typical output should look as follows:: 100% tests passed, 0 tests failed out of 9 Total Test time (real) = 1.73 sec + +EasyBuild Installation +---------------------- + +For HPC infrastructure, there is also the option to install :program:`Bramble` using EasyBuild. +Make a copy of `bramble-1.1.0.eb` and run:: + + eb bramble-1.1.0.eb --minimal-toolchains --add-system-to-minimal-toolchains --robot diff --git a/src/test/test_similarity.cpp b/src/test/test_similarity.cpp index e9a6782..9882c07 100644 --- a/src/test/test_similarity.cpp +++ b/src/test/test_similarity.cpp @@ -22,6 +22,9 @@ #include #include "similarity_analysis.h" +#ifdef MOD_CUDA +#include "card_manager.h" +#endif // check that we can read .geo files BOOST_AUTO_TEST_CASE(test_similarity) { @@ -57,9 +60,12 @@ BOOST_AUTO_TEST_CASE(test_similarity) { BOOST_TEST(ans2 == ans3, boost::test_tools::tolerance(1e-7)); #ifdef MOD_CUDA - float ans4 = sa.calculate_distance_metric_cuda(dm3, dm4, &permvec[0]); - BOOST_TEST(ans2 == ans4, boost::test_tools::tolerance(1e-7)); - BOOST_TEST(ans3 == ans4, boost::test_tools::tolerance(1e-7)); + CardManager cm; + if(cm.get_num_gpus() > 0) { + float ans4 = sa.calculate_distance_metric_cuda(dm3, dm4, &permvec[0]); + BOOST_TEST(ans2 == ans4, boost::test_tools::tolerance(1e-7)); + BOOST_TEST(ans3 == ans4, boost::test_tools::tolerance(1e-7)); + } #endif // MOD_CUDA //------------------------------------------------------------------------- From 3d03a820bdd6507b1bc394981512cc981315af3d Mon Sep 17 00:00:00 2001 From: ifilot Date: Thu, 7 Sep 2023 09:07:14 +0200 Subject: [PATCH 9/9] Expanding on documentation --- docs/execution_model.rst | 6 ++++++ docs/installation.rst | 3 ++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/execution_model.rst b/docs/execution_model.rst index f065dc2..7cc4274 100644 --- a/docs/execution_model.rst +++ b/docs/execution_model.rst @@ -9,6 +9,12 @@ acceleration to speed up the execution. This is especially beneficial when performing a similarity analysis. :program:`Bramble` supports multi-GPU setups, so one can use multiple GPUs if more than one GPU is available. +.. warning:: + :program:`Bramble` requires a GPU with at least 8Gb of memory. :program:`Bramble` + will check whether the GPU supports the calculation prior to execution and throws + an error when the GPU is not supported. You can also check the memory available + on your GPU by running ``bramblecuda``. + When performing the similarity analysis, an inventory of all the jobs is made. ``N+1`` OpenMP threads are being spawned where ``N`` equals the number of GPUs. Each GPU gets assigned a CPU thread and jobs are relayed to the GPU via the CPU diff --git a/docs/installation.rst b/docs/installation.rst index e337c82..54090e2 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -23,7 +23,8 @@ On Debian-based operating systems, one can run the following:: The compilation instructions below can be readily used. .. warning:: - In order to compile for GPU using CUDA, one needs Eigen3 version **3.4.0** or higher. + * In order to compile for GPU using CUDA, one needs Eigen3 version **3.4.0** or higher. + * Your GPU needs at least 8Gb of memory in order to use Bramble. Compilation -----------