flash-attention-3

Here are 3 public repositories matching this topic...

📖A curated list of Awesome LLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism etc. 🎉🎉

📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, 🔥🔥toy-hgemm library with WMMA, MMA and CuTe(99%~100%+ TFLOPS of cuBLAS 🎉🎉).

Toy Flash Attention implementation in torch

Add a description, image, and links to the flash-attention-3 topic page so that developers can more easily learn about it.

To associate your repository with the flash-attention-3 topic, visit your repo's landing page and select "manage topics."