Metal

API Changes

CreateDevice: device_id type has changed from int to chip_id_t
CreateCircularBuffer: Three previous variants which only differ by CoreCoord, CoreRange, and CoreRangeSet function parameter have been compressed into one user-facing CreateCircularBuffer function that’s parameterized with std::variant<CoreCoord,CoreRange,CoreRangeSet>. Now accepts CircularBufferConfig which specifies size, data format, and page size per buffer index. Return type updated from CircularBuffer object to CircularBufferID (uintptr_t)
GetCircularBufferConfig: New function to retrieve a reference to configuration of a CircularBuffer. This allows the CircularBuffer config to be updated. Updates will take effect on the next call to LaunchProgram.

Tools - Profiler

Tracy Python Support : Profile python side code with tracy. Similar to cProfile, the standard python profiler module, all python function calls are picked up on tracy. Additionally, TT’s binded C++ calls are also picked up automatically. The entire python script or just desired parts of it can be profiled either at function or line level.

Extra features

Runtime Compute Args: Arguments can be sent to Compute Kernels at runtime. The kernel uses the same get_arg_val<type>(<index>) API to retrieve it. The host uses the same tt_metal::SetRuntimeArgs(<program, <compute_kernel_id>, <Core,CoreRange> , <vector of u32 runtime args>) as DataMovement Kernel.

Eager (Ops)

Notes not yet available.

Models

metal_BERT_large_15: model implementation updated to use tt-DNN operation embedding that executes on GS device. Previously this model used PyTorch embedding operation executing on CPU.
Falcon7b: added end to end demo that is running on GS device. The demo takes a text prompt and returns text generated by the model to complete the prompt. The demo works by pre-filling the cache with decoded input prompts and then running decode for all users in parallel.

Host API changes

void StartDebugPrintServer(Device *device, const std::vector<CoreCoord> & cores) no longer callable

Device *CreateDevice no longer requires arch parameter

New wrapper around Buffer API so that users don't need to look inside buffer.hpp to figure out how to construct a buffer object: Buffer CreateBuffer(Device *device, std::uint64_t size, std::uint64_t page_size, const BufferType buffer_type)

LaunchKernels renamed to LaunchProgram(Device *device, Program &program) to match EnqueueProgram and removed obsolete stagger_start parameter

void WriteRuntimeArgsToDevice(Device *device, const Program &program) moved to detail namespace

bool CompileProgram(Device *device, Program &program) moved to detail namespace

bool ConfigureDeviceWithProgram(Device *device, const Program &program) moved to detail namespace

bool InitializeDevice(Device *device) removed

Feature: Runtime Compute Args

Arguments can be sent to Compute Kernels at runtime in the same way as DataMovement Kernels.

The kernel uses the same get_arg_val<type>(<index>) api to retrieve it.

The host uses the same tt_metal::SetRuntimeArgs( <program>, <compute_kernel_id>, <Core, CoreRange>, <vector of u32 runtime args>); as DataMovement Kernel communication as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal

API Changes

Tools - Profiler

Extra features

Eager (Ops)

Models

Metal

Host API changes

Profiler

Watcher

Feature: Runtime Compute Args

Eager (Ops)

Models

Releases: tenstorrent/tt-metal

v0.34.0

Metal

API Changes

Tools - Profiler

Extra features

Eager (Ops)

Models

v0.33.0

Metal

Host API changes

Profiler

Watcher

Feature: Runtime Compute Args

Eager (Ops)

Models