Skip to content

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

License

Notifications You must be signed in to change notification settings

thanhnguyen-moreh/tt-metal

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ttnn logo

TT-NN is a Python & C++ Neural Network OP library.


LLMs

Model Batch Hardware ttft (s) t/s/u Target t/s/u Release
Falcon7B-decode 32 e150 4.2 4.4
Falcon7B 32 n150 0.07 16.7 26 v0.52.0-rc2
Mistral-7B 32 n150 9.9 25 v0.51.0-rc28
Mamba-2.8B 32 n150 0.04 12.3 41 v0.51.0-rc26
LLaMA-3.1-8B 1 n150 8.3 23 v0.51.0-rc28
Falcon7B (data parallel) 256 QuietBox 0.11 13.4 26 v0.51.0-rc36
LLaMA-2-70B - (tensor parallel) 32 QuietBox 10.4 20 v0.52.0-rc14
LLaMA-3.1-70B (tensor parallel) 32 QuietBox 10.4 20 v0.52.0-rc14
Falcon40B (tensor parallel) 32 QuietBox 5.3 36 v0.52.0-rc12
Mixtral7Bx8 (tensor parallel) 32 QuietBox 0.19 13.6 33 v0.51.0-rc33
Falcon7B (data parallel) 1024 Galaxy 0.27 4.1 26 v0.52.0-rc14

Notes:

  • The reported LLM performance is for an input sequence length (number of rows filled in the KV cache) of 128 for all models except Mamba (which can accept any sequence length).
  • The t/s/u reported is the throughput of the first token generated after prefill, i.e. 1 / inter token latency.

CNNs

Model Batch Hardware fps Target fps Release
ResNet-50 (224x224) 20 e150 5,100 10,000
ResNet-50 (224x224) 16 n150 4,100 7,000
ResNet-50 (224x224) (data parallel) 128 QuietBox 32,250 56,000
ResNet-50 (224x224) (data parallel) 512 Galaxy 66,150 224,000
ResNet-50 (224x224) (data parallel) 1024 Two Galaxies 128,800 448,000
ViT 8 e150 860 2,000
Stable Diffusion 1.4 (512x512) 1 n150 0.167 0.3
Unet (shallow) 2 n150 51 1000

NLPs

Model Batch Hardware sen/sec Target sen/sec Release
BERT-Large 12 e150 370 410
BERT-Large 8 n150 270 400
T5 small e150 140
Bloom e150 70

Model Updates

For the latest model updates and features, please see MODEL_UPDATES.md

TT-NN Tech Reports


TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Getting started

Get started with simple kernels.

TT-Metalium Tech Reports

About

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 51.1%
  • Python 38.9%
  • Jupyter Notebook 5.1%
  • C 4.0%
  • Shell 0.5%
  • CMake 0.3%
  • Other 0.1%