Releases · tenstorrent/tt-metal

07 Sep 02:15

v0.52.0-rc15

3a010a7

v0.52.0-rc15 Pre-release

Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10747521160

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109
#11990: Replace ttnn.experimental.tensor.* to ttnn.* in ttnn folder
- PR: #12005
#11132 Run Post-Commit Python Tests agai...

Assets 9

06 Sep 22:49

github-actions

v0.52.0-rc14

81f9fae

v0.52.0-rc14 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10745849640

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109
#11990: Replace ttnn.experimental.tensor.* to ttnn.* in ttnn folder
- PR: #12005
#11132 Run Post-Commit Python Tests agai...

Assets 9

06 Sep 19:44

github-actions

v0.52.0-rc13

dc76271

v0.52.0-rc13 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10743792121

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109
#11990: Replace ttnn.experimental.tensor.* to ttnn.* in ttnn folder
- PR: #12005
#11132 Run Post-Commit Python Tests agai...

Assets 9

06 Sep 04:07

github-actions

v0.52.0-rc12

cb68490

v0.52.0-rc12 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10731919856

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109
#11990: Replace ttnn.experimental.tensor.* to ttnn.* in ttnn folder
- PR: #12005
#11132 Run Post-Commit Python Tests agai...

Assets 9

06 Sep 02:15

github-actions

v0.52.0-rc11

37bc0d3

v0.52.0-rc11 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10730882573

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109
#11990: Replace ttnn.experimental.tensor.* to ttnn.* in ttnn folder
- PR: #12005
#11132 Run Post-Commit Python Tests agai...

Assets 9

04 Sep 14:00

github-actions

v0.52.0-rc9

114ff02

v0.52.0-rc9 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10702489425

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109
#11990: Replace ttnn.experimental.tensor.* to ttnn.* in ttnn folder
- PR: #12005
#11132 Run Post-Commit Python Tests agai...

Assets 9

04 Sep 02:16

github-actions

v0.52.0-rc8

7fd2f78

v0.52.0-rc8 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10693502206

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109
#11990: Replace ttnn.experimental.tensor.* to ttnn.* in ttnn folder
- PR: #12005
#11132 Run Post-Commit Python Tests agai...

Assets 9

02 Sep 02:15

github-actions

v0.52.0-rc6

dcd47ef

v0.52.0-rc6 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10659227832

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113
Fused AllGather+Matmul
- PR: #11760
#12124: support moreh_nll_loss support large wight
- PR: #12126
[Bugfix] Fixed is allocated
- PR: #12109

Assets 9

31 Aug 02:15

github-actions

v0.52.0-rc5

2e14e61

v0.52.0-rc5 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10641311471

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062
#0: Fixed Conv2dConfig in broken tests
- PR: #12064
#0: Falcon40b T3K demo mismatch tokens fixed
- PR: #12105
#12069: Add catch and handling for device initialize exception, typic…
- PR: #12070
Point metal to new UMD main branch
- PR: #12097
Update CODEOWNERS
- PR: #12112
#11993: Fix offset calculation for uneven shard in reshard fast path
- PR: #12083
Update CODEOWNERS
- PR: #12114
#12117: Refactor DeviceMesh->MeshDevice, DeviceGrid->MeshShape
- PR: #12118
#11854: Move .umd that houses cluster descriptor to TT_METAL_HOME
- PR: #12113

Assets 9

30 Aug 13:30

github-actions

v0.52.0-rc4

2035fd0

v0.52.0-rc4 Pre-release

Pre-release

Note

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10632943556

📦 Uncategorized

#0: Remove run_operation from async_runtime.hpp
- PR: #11757
#11640: Include simulation device in tt_cluster
- PR: #11766
#11342: Replace tt_lib with ttnn function in experimental/functional
- PR: #11356
#11649: update tt_lib with ttnn support for non working folder
- PR: #11654
Perf dashboard and batching support for Mistral-7B and Llama3.1-8B
- PR: #11603
Adding fix for llama CI failure caused by ttnn.experimental.tensor.typecast
- PR: #11765
Fold sharded support
- PR: #11722
#9450: add env flag to skip recompiling and reloading FW
- PR: #11681
Move semaphores into kernel config ring buffer
- PR: #11764
#10874: Enable test cases for concurrent instances in CCL all gather
- PR: #10885
[Falcon7b] Remove hf reference files and import from transformers instead
- PR: #11758
#11768: Fix watcher pause feature
- PR: #11780
[Improvement] Added some graph names in the separate file
- PR: #11732
Migrate CB configs into kernel config ring buffer
- PR: #11778
#0: Feed more data to visualizer
- PR: #11400
#11490: ttnn and tt_metal shapes are mixed
- PR: #11723
Migrate sharded ops from TTL to TTNN
- PR: #11546
#8865: Port ttnn ops to dispatch profiling infra
- PR: #11698
#11700: update write_tensor with copy_host_to_device_tensor
- PR: #11701
TTNN sweep low pic unit tests
- PR: #11775
Add sweeps for ops: topk, frac, trunc, ceil to TTNN
- PR: #11771
LLK Test Coverage Follow-up
- PR: #11715
Llama3.1 70b Prefill - MLP and Attention
- PR: #11724
#10866: Read profiler buffer with EnqueueReadBuffer in fast dispatch mode
- PR: #11781
Lpremovic/0 expand llk ctest coverage
- PR: #11653
#11313: Migrate layernorm_distributed to ttnn
- PR: #11696
[Blackhole Bringup] Fixes for maxpool
- PR: #11761
#11850: Remove Llama3.1-8B output matching to avoid blocking CI
- PR: #11851
modify keys within device_info
- PR: #11852
#0: remove extra arch-wormhole labels for single-card workflows
- PR: #11785
#0: fix cloud-virtual-machine label
- PR: #11863
#11564: added test for generating sample data with many different use cases to the visualizer
- PR: #11862
#0: Remove llk_io.cc for WH and BH as well. GS was removed in 7b8e627
- PR: #11864
#9527: Moving bcast to operations/data_movement
- PR: #11599
#10332: Make ttnn::event_synchronize block only in the app thread
- PR: #11543
#11554: Replace tt_lib in sweeps, integration_tests
- PR: #11556
#11877: Make dispatch core order in the core descriptor match for E75 with 1 and 2 CQs
- PR: #11878
#11845: fix worker ring direction assignment in reduce scatter
- PR: #11846
FD Optimizations/Cleanup
- PR: #11872
#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18
- PR: #11882
Revert "#11881: Add -Wno-vla-cxx-extension to CMake to fix build on clang18"
- PR: #11887
#10163: Add backward support for remainder op
- PR: #9712
Added ttnn.hypot_bw unit test
- PR: #11843
#0: Add another codeowner for conv2d
- PR: #11849
#11334: Remove unnecessary code for previous ci/cd csvs
- PR: #11898
#0: Bump timeout for single-card perf tests to see if that helps with timeouts
- PR: #11893
Removed "" graph_consts.hpp
- PR: #11904
[Falcon7b] Re-enable decode perplexity test with seq len 2048
- PR: #11868
[Falcon7b] Fix duplicate loading of rotary embeddings in prefill/decode
- PR: #11871
[Falcon7b] Re-enable demo perf-mode tests on galaxy, update targets, prevent multinomial errors (during perf-mode) using nan-to-num
- PR: #11876
[Blackhole Bringup] Add pack_untilize tests & fixes
- PR: #11875
#0: Consolidate demo tests for single card and t3000 to use impls rather than copy
- PR: #11897
Collection of small dprint/watcer changes
- PR: #11906
#11917: disable test
- PR: #11918
#11706: Use new Conv2D API in UNet Shallow
- PR: #11902
#11925 Update ttnn.arange binding
- PR: #11926
#0: Remove test include from packet_demux
- PR: #11924
#7709: Fix exp like ops ttnn doc issues
- PR: #7879
#11126: Resnet Demo with new conv API
- PR: #11770
Added ttnn.argmax sweeps, API calls and unit tests
- PR: #11552
#10515: For matmul corner case, if CBs don't fit, choose different program config
- PR: #11892
[Mixtral8x7B] Increase demo max context length to 32k
- PR: #11777
Added ttnn.topk unit test
- PR: #11935
#0: (MINOR) Update to v0.52.0
- PR: #11946
#11847: Add tt-smi reset command environment variable for sweeps
- PR: #11901
#11000: Enable uint8 A2D and (un)pack reconfig
- PR: #11537
#0: Do not use mount-cloud-weka label because we may no longer need it as cloud fixed it
- PR: #11941
#0: fixed External Operation logging
- PR: #11958
#0: Update matmul_multi_core_reuse to support mixed precision
- PR: #11947
#11138: Move large global vars in prefetcher and dispatcher to the stack
- PR: #11922
Enabling BH L1 data cache
- PR: #11909
#0: Move Unary device operation to tmp
- PR: #11793
Moved tracked methods out of tensor
- PR: #11921
#11964: Only write branch is if the repo is not detached
- PR: #11965
#11622: add concat sweep
- PR: #11733
#0: Refactor Python dynamic modules creation
- PR: #11798
#0: Update resnet test infra to print total batch size for multi device
- PR: #11966
#11930: Increase status checks
- PR: #11945
Convs on BH
- PR: #11977
#9630: assert out concat when concatenating along padded dimensions
- PR: #11869
Use product codes for cards instead of arch for eager-package-main
- PR: #11976
#11929: Move work_split_tilize
- PR: #11932
#11693: Move DeviceModule bindings and replace ttnn.experimental APIs
- PR: #11820
#11247: Remove in-place flag in binary operations
- PR: #11604
#11591: Move hack delay from trisc.cc to trisck.cc before run_kernel
- PR: #11963
#8865: Optimize softmax dispatch time
- PR: #11889
#0: skip yolov4 failing sub_modules
- PR: #11959
#11519: Restore path reservation for mms and convs
- PR: #11520
#5337: Fix Mixtral total number of generated tokens in perf benchmark
- PR: #11994
#11883: use fixed_string.size() instead of sizeof to ensure compatiablity with newer versions of reflect
- PR: #11896
#11559: Replace tt_lib in tests/ttnn files
- PR: #11822
#11915: Add sweep vector tagging and related infra changes
- PR: #11970
#0: fix fetch q write assert by using correct data offset for enqueue write buffer
- PR: #11983
update conv path in CODEOWNERS:
- PR: #11978
enable all enablable unit tests for convs with new api
- PR: #11981
Fix size_t compilation failure
- PR: #12003
Update perf and latest features for llm models (Aug 26)
- PR: #11905
Split up n300 demo tests into functionality and performance
- PR: #11969
#10718: Fix issue with negative pipeline queue times
- PR: #12010
#11642: demux ttnn::typecast into ttnn::experimental::typecast on gra…
- PR: #11985
#11569: Enable Conv2D WH unit tests for UNet shapes
- PR: #11589
#11591: Fix race by making only unpacker zero out RISCV_DEBUG_REG_DBG_FEATURE_DISABLE at start of kernel
- PR: #12011
Update CODEOWNERS
- PR: #12048
Add missing include to graph_trace_utils.hpp
- PR: #12050
#0: Always initialize l1_banking allocator even when size is 0
- PR: #12047
update slack notification include workflow run
- PR: #12054
#8868: Fixed conv for Stride>2
- PR: #11933
#11430: Refactoring moreh_mean
- PR: #11776
#11832: Remove tracking of writes per block and only track last block
- PR: #11999
#11644: Migrate AutoFormat to TTNN Experimental
- PR: #11823
Added ttnn.i0_bw unit test
- PR: #11891
#11938: Refactoring moreh_bmm
- PR: #12000
#11646: Replace ttnn.experimental.tensor.* in models/demos
- PR: #11943
Add support for cur_pos tensor arg in sdpa decode
- PR: #11788
#5659: Add Width Sharded support to Conv2d
- PR: #11582
Remove noinline attribute from sdpa_decode compute kernel
- PR: #12060
Updated sfpi compiler to address missing SFPNOP insertion
- PR: #12061
Move compute kernel config to TTNN
- PR: #11801
Add fold to resnet
- PR: #11940
[BugFix] Fixed tensor::is_allocated.
- PR: #12071
Revert "[BugFix] Fixed tensor::is_allocated."
- PR: #12082
#8598: sinh fix
- PR: #12056
#11646: Replace ttnn.experimental.tensor.* to ttnn.* in models/experimental, tests
- PR: #11821
#10754: Add data-parallel support for UNet Shallow on N300
- PR: #12062

Assets 9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

📦 Uncategorized

Releases: tenstorrent/tt-metal

v0.52.0-rc15

📦 Uncategorized

v0.52.0-rc14

📦 Uncategorized

v0.52.0-rc13

📦 Uncategorized

v0.52.0-rc12

📦 Uncategorized

v0.52.0-rc11

📦 Uncategorized

v0.52.0-rc9

📦 Uncategorized

v0.52.0-rc8

📦 Uncategorized

v0.52.0-rc6

📦 Uncategorized

v0.52.0-rc5

📦 Uncategorized

v0.52.0-rc4

📦 Uncategorized