You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue follows #1381, in which the problem was not correctly identified.
I am using Bicgstab to solve a test problem with A a 1000x1000 matrix (this is a band matrice with 19 band width stored in Csr format whose non-zeros are like 1 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2 1). B is a 1000x1000 dense matrix filled with ones.
The number of colums in B (or X) is the size n_batch=1000 of batch (but this is a batch where all individuals systems share the same A, that's why I dont use the new BatchDense or BatchCsr classes).
I compare the performance on GPU with and without Jacobi preconditionner (size 32):
Without Jacobi : 4700 iterations, 5s total execution time.
With Jacobi : 490 iterations, 20s total execution time.
So, the execution time of one iteration is much longer with preconditionner (~19ms vs ~1ms). This effect does not appears with n_batch=1 (matrix-vector system, total execution time 0.8s).
The reason is n_batch gko::kernels::cuda::jacobi::kernel::apply are called sequentially. Could it be improved ?
Regards
The text was updated successfully, but these errors were encountered:
Perhaps to side step the issue with the block Jacobi, you could create the Jacobi preconditioner with .with_max_block_size(1u). Maybe that helps you already getting a shorter runtime.
Hello,
This issue follows #1381, in which the problem was not correctly identified.
I am using Bicgstab to solve a test problem with A a 1000x1000 matrix (this is a band matrice with 19 band width stored in Csr format whose non-zeros are like 1 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2 1). B is a 1000x1000 dense matrix filled with ones.
The number of colums in B (or X) is the size
n_batch=1000
of batch (but this is a batch where all individuals systems share the same A, that's why I dont use the new BatchDense or BatchCsr classes).I compare the performance on GPU with and without Jacobi preconditionner (size 32):
So, the execution time of one iteration is much longer with preconditionner (~19ms vs ~1ms). This effect does not appears with n_batch=1 (matrix-vector system, total execution time 0.8s).
The reason is n_batch
gko::kernels::cuda::jacobi::kernel::apply
are called sequentially. Could it be improved ?Regards
The text was updated successfully, but these errors were encountered: