This file gives an overview of the main features planned for addition to CLBlast. A first-order indication time-frame for development time is provided:
Issue# | When | Who | Status | What |
---|---|---|---|---|
- | Oct '17 | CNugteren | ✔ | CUDA API for CLBlast |
#169 & #195 | Oct-Nov '17 | CNugteren | ✔ | Auto-tuning the kernel selection parameter |
#181 & #201 | Nov '17 | CNugteren | ✔ | Compilation for Android and testing on a device |
- | Nov '17 | CNugteren | ✔ | Integration of CLTune for easy testing on Android / fewer dependencies |
#128 & #205 | Nov-Dec '17 | CNugteren | ✔ | Pre-processor for loop unrolling and array-to-register-promotion for e.g. ARM Mali |
#207 | Dec '17 | CNugteren | ✔ | Tuning of the TRSM/TRSV routines |
#195 | Jan '18 | CNugteren | ✔ | Extra GEMM API with pre-allocated temporary buffer |
#95 & #237 | Jan '18 | CNugteren | ✔ | Implement strided batch GEMM |
#224 | Jan-Feb '18 | CNugteren | ✔ | Implement Hadamard product (element-wise vector-vector product) |
#233 | Feb '18 | CNugteren | ✔ | Add CLBlast to common package managers |
#223 | Feb '18 | CNugteren | ✔ | Python OpenCL interface |
#237 | Mar '18 | CNugteren | ✔ | Making tuning possible from the CLBlast API |
#228 | Mar-Apr '18 | CNugteren | ✔ | Improving performance for Qualcomm Adreno GPUs |
#267 | July '18 | CNugteren | Merge im2col and GEMM into a direct kernel | |
#270 | Aug '18 | CNugteren | Implement col2im | |
- | Aug '18 | CNugteren | Add a SYCL interface to the library | |
#136 | ?? | CNugteren | Implement xAXPBY and xSET | |
#169 | ?? | dividiti | Problem-specific tuning parameter selection |