Metal

Wormhole

Host API changes

void StartDebugPrintServer(Device *device, const std::vector<CoreCoord> & cores) no longer callable
Device *CreateDevice no longer requires arch parameter
New wrapper around Buffer API so that users don't need to look inside buffer.hpp to figure out how to construct a buffer object: Buffer CreateBuffer(Device *device, std::uint64_t size, std::uint64_t page_size, const BufferType buffer_type)
LaunchKernels renamed to LaunchProgram(Device *device, Program &program) to match EnqueueProgram and removed obsolete stagger_start parameter
void WriteRuntimeArgsToDevice(Device *device, const Program &program) moved to detail namespace
bool CompileProgram(Device *device, Program &program) moved to detail namespace
bool ConfigureDeviceWithProgram(Device *device, const Program &program) moved to detail namespace
bool InitializeDevice(Device *device) removed

Bug fix on device side to support new FW init process in fast and slow dispatch.
RISC FW cleanup to avoid unnecessary function wrappers.

Add more way points to watcher and add access methods to soc descriptor for, eg, harvesting
Add some noc sanitization and checks
Some bug fixes: don't read registers during kernel run, don't include wh headers on gs, allow 0 length transactions

Arguments can be sent to Compute Kernels at runtime in the same way as DataMovement Kernels.
The kernel uses the same get_arg_val<type>(<index>) api to retrieve it.
The host uses the same tt_metal::SetRuntimeArgs( <program>, <compute_kernel_id>, <Core, CoreRange>, <vector of u32 runtime args>); as DataMovement Kernel communication as well.