The following example programs introduce the main concepts of OpenMP step by step.
-
hello-world-parallel.c (the most basic hello world executed in parallel)
-
hello-world-parallel-id.c (similar to 1. but the id of each thread and the total number of threads are also printed)
-
hello-world-parallel-id-func.c (same result as 2. but a function is called to print inside the parallel region)
-
hello-world-parallel-id-scope.c (same result as 2. but data scope is used in the parallel region)
-
table-add1-manual.c (add a number to each element of a table, using manual parallelization)
-
table-add1.c (same result as 5. but with automatic work scheduling)
-
table-add1-combined.c (same result as 6. but with combined parallel region and for construct)
-
table-add1-wrong.c (similar to 6. and 7. but giving wrong answer - find the error in the code!)
-
table-implicit-notpar.c (parallel execution gives wrong answer - find out why!)
-
table-sum.c (computing the sum of all elements in a table)
-
table-sum-wrong.c (similar to 10. but gives wrong answer - find the error in the code!)
2020 Homework on 2D wave equation
2019 Homework on Poisson solvers (see this figure)
poisson-SOR.c Example: ./poisson-SOR -N 400 -M 400 -a 1e-6 -o 1.9
2023/24 Homework:
Set #1:
- Run the examples 1 - 10.
- Find and correct the mistakes in examples 8, 9 and 11.
Set #2:
- Write a code in C/C++ that calculates π using a Monte-Carlo method. Use up to 10^10 points.
- Parallelize the code using OpenMP. Run the code using 1, 2, 4, etc. threads (up to twice the physical number of cores, try at least up to 8).
- Create a Jupyter python notebook within which you automatically run the C/C++ code using different number of cores (use a list) and plot the execution time as a function of the number of cores. Alternatively, run the C/C++ code using a script, save the results in a file and load them into a Jupyter notebook to make the plot.
- In the same notebook, create a second plot of parallel speedup vs number of cores.
- Use Amdahl's law to fit the resulting curve and find the proportion p of the code that benefits from parallelization and the maximum possible speedup in the limit of 10000 cores. (You can do the fit with e.g. spipy's curvefit.)
- Run the code for 10^2, 10^3, 10^4, ... , 10^11 points and, in a second notebook, calculate the convergence rate of your Monte-Carlo implementation. You can do this if you fit a line in a log-log- plot of the error in calculating π vs. the number of points. Try to find what the theoretical expectation is and compare your result to it.
Use logarithmic scale wherever the numerical values change by orders of magnitude!
(Bonus track: repeat with Python+Numba and/or Julia)
- Tutorial by N. Trifonidis (part 1)
- Tutorial by N. Trifonidis (part 2)
- A brief introduction by A. Kiessling
- Tutorial by S.C. Huang
- Tutorial by Texas A&M
- Tutorial by T. Mattson and L. Meadows
- Tutorial by Y. W. Li (includes Vtune examples)
- Online tutorial by B. Barney
- Online tutorial by Y. Yliluoma
- Online list of potential mistakes
- Video tutorial by C. Terboven (part 1)
- Video tutorial by C. Terboven (part 2)
- Video channel by PPCES
- Additional resources
- OpenMP 3.1 Quick Reference Card