Skip to content

Commit

Permalink
Deploying to gh-pages from @ f5040f7 πŸš€
Browse files Browse the repository at this point in the history
  • Loading branch information
shintaro-iwasaki committed Apr 19, 2024
1 parent 366e6c5 commit 3960ca3
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions 2024/program.html
Original file line number Diff line number Diff line change
Expand Up @@ -91,10 +91,10 @@ <h4><b>Block-based GPU Programming with Triton</b></h4>
<h4><b>Philippe Tillet, OpenAI</b></h4>
<h4><b>Abstract:</b>
<font color="#FFFFFF"><img src="pics/PhilippeTillet.jpeg" alt="Philippe Tillet" border="1" align="right" class="right" width="30%" height="auto"/></font>
Traditional single instruction, multiple threads (SIMT) programming with CUDA, for all its benefits, can be daunting to machine learning researchers in need of fast custom kernels. We'll shed light on alternative programming models capable of improving GPU programmability without too much of an impact on expressivity. Some such models have recently emerged (e.g., Exo, MLIR Affine), but these are rarely applicable beyond dense tensor algebra β€” making them a poor fit for workloads requiring (for example) custom data structures. We'll describe the design and implementation of Triton, a mid-level programming language that uses block-based abstractions to simplify kernel development and fusion for researchers without any GPU programming expertise.
Traditional single instruction, multiple threads (SIMT) programming with CUDA can be daunting to machine learning researchers in need of fast custom kernels. This can significantly slow down the evaluation of novel research ideas that cannot be neatly decomposed into a set of pre-built, vendor-optimized primitives. In this talk, we will shed light on an alternative programming model which -- while relatively high-level -- aims to be more expressive than common graph-compilers (e.g., XLA, Torch-Inductor) and enable the use of custom data-structures (e.g., linked list, block-sparse tensors, etc.). We will specifically discuss the design and implementation of Triton, a mid-level programming language that uses block-based abstractions to simplify kernel development for researchers without deep GPU programming expertise.
</h4>
<h4><b>Bio:</b>
Philippe Tillet first began working with GPUs in 2011 as a contributor to the ViennaCL library. He then received his B.S. from Telecom SudParis (France) in 2012, his M.S. from NCTU (Taiwan) in 2014, and his Ph.D. from Harvard University in 2020 with a dissertation on compilers for blocked algorithms on GPUs. He joined OpenAI full time in 2020 to pursue his work on the Triton compiler β€” a project he started in 2018 after being frustrated by the difficulty of writing auto-tuners for matrix multiplications in CUDA. Since then, he grew the Triton language into a reference for block-based programming model, and wrote all the training kernels that were used by GPT4.
Philippe Tillet first began working with GPUs in 2011 as a contributor to the ViennaCL library. He then received his B.S. from Telecom SudParis (France) in 2012, his M.S. from NCTU (Taiwan) in 2014, and his Ph.D. from Harvard University in 2020. He joined OpenAI full time in 2020 to pursue his work on the Triton compiler β€” a project he started in 2018 after being frustrated by the difficulty of writing auto-tuners for matrix multiplications in CUDA. Since then, he grew the Triton language into a reference for block-based programming model, and used it to write all the training kernels that were used by GPT4.
</h4>

<h1>Session 2: Accelerating AI/ML Workloads</h1>
Expand Down

0 comments on commit 3960ca3

Please sign in to comment.