From 3960ca368bb14754ac0b790432507cbd4feebe64 Mon Sep 17 00:00:00 2001 From: siwasaki Date: Fri, 19 Apr 2024 02:19:51 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20=20@=20f5040?= =?UTF-8?q?f77785d118aa3b00898411cf546bc7c6352=20=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- 2024/program.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/2024/program.html b/2024/program.html index df20725..9004599 100644 --- a/2024/program.html +++ b/2024/program.html @@ -91,10 +91,10 @@

Block-based GPU Programming with Triton

Philippe Tillet, OpenAI

Abstract: Philippe Tillet - Traditional single instruction, multiple threads (SIMT) programming with CUDA, for all its benefits, can be daunting to machine learning researchers in need of fast custom kernels. We'll shed light on alternative programming models capable of improving GPU programmability without too much of an impact on expressivity. Some such models have recently emerged (e.g., Exo, MLIR Affine), but these are rarely applicable beyond dense tensor algebra — making them a poor fit for workloads requiring (for example) custom data structures. We'll describe the design and implementation of Triton, a mid-level programming language that uses block-based abstractions to simplify kernel development and fusion for researchers without any GPU programming expertise. + Traditional single instruction, multiple threads (SIMT) programming with CUDA can be daunting to machine learning researchers in need of fast custom kernels. This can significantly slow down the evaluation of novel research ideas that cannot be neatly decomposed into a set of pre-built, vendor-optimized primitives. In this talk, we will shed light on an alternative programming model which -- while relatively high-level -- aims to be more expressive than common graph-compilers (e.g., XLA, Torch-Inductor) and enable the use of custom data-structures (e.g., linked list, block-sparse tensors, etc.). We will specifically discuss the design and implementation of Triton, a mid-level programming language that uses block-based abstractions to simplify kernel development for researchers without deep GPU programming expertise.

Bio: - Philippe Tillet first began working with GPUs in 2011 as a contributor to the ViennaCL library. He then received his B.S. from Telecom SudParis (France) in 2012, his M.S. from NCTU (Taiwan) in 2014, and his Ph.D. from Harvard University in 2020 with a dissertation on compilers for blocked algorithms on GPUs. He joined OpenAI full time in 2020 to pursue his work on the Triton compiler — a project he started in 2018 after being frustrated by the difficulty of writing auto-tuners for matrix multiplications in CUDA. Since then, he grew the Triton language into a reference for block-based programming model, and wrote all the training kernels that were used by GPT4. + Philippe Tillet first began working with GPUs in 2011 as a contributor to the ViennaCL library. He then received his B.S. from Telecom SudParis (France) in 2012, his M.S. from NCTU (Taiwan) in 2014, and his Ph.D. from Harvard University in 2020. He joined OpenAI full time in 2020 to pursue his work on the Triton compiler — a project he started in 2018 after being frustrated by the difficulty of writing auto-tuners for matrix multiplications in CUDA. Since then, he grew the Triton language into a reference for block-based programming model, and used it to write all the training kernels that were used by GPT4.

Session 2: Accelerating AI/ML Workloads