Releases: tenstorrent/tt-metal
v0.53.0-rc3
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11097804939
📦 Uncategorized
- #12883: Add initial unit tests for N300
- PR: #12922
- #12499: Migrate moreh_norm, moreh_norm_backward operations from tt_eager to ttnn
- PR: #12500
- #12321: Migrate moreh_bmm, moreh_bmm_backward operations from tt_eager to ttnn
- PR: #12322
- Add more eltwise sweeps, add new functions in sweep_framework/utils.py
- PR: #13003
- #12690: Port moreh_softmax and moreh_softmax_backward to ttnn
- PR: #12698
- #0: Bump falcon7b device perf test because we have a real bump
- PR: #13008
- Aliu/tech reports
- PR: #13010
- #11332: Move
ttnn/examples
ttnn/ttnn/examples
so we can enable directly calling them for users, but not meant to be part of ttnn API- PR: #11612
- Add sweeps for sign, deg2rad, rad2deg, relu6
- PR: #12994
- Revert "#10016: jit_build: link substitutes, tdma_xmov, noc"
- PR: #13009
- #12952: Update test_ccl_on_tg.cpp to work on TGG as well as TG
- PR: #12982
- [skip ci] #0: ViT report edits
- PR: #13015
- #12879: Use () so that workflow_call actually captures the call when we trigger off completed workflow runs and add them to workflows to properly capture
- PR: #13012
- [skip ci] #13019 Create remove-stale-branches.yaml
- PR: #13020
- #13019 Update remove-stale-branches.yaml
- PR: #13021
- Add tiny tile support for Tensor, matmul
- PR: #12908
- [skip ci] #13019 Add default recipient
- PR: #13023
- build tt metal in docker in CI
- PR: #11923
- Revert "build tt metal in docker in CI"
- PR: #13027
- [skip ci] #0: ViT tech report
- PR: #13032
- Mchiou/11762 build tt metal in docker
- PR: #13033
- #13013: Added tests to run in TGG unit tests workflow
- PR: #13016
- [skip ci] #13019 Update remove-stale-branches.yaml
- PR: #13025
- Mchiou/0 fix docker build storage
- PR: #13042
- #11531: Autogenerate API rst stub files, add summary table on API page
- PR: #12075
- Add --no-advice to perf report, small fixes
- PR: #13048
- preserve fp32 precision
- PR: #12794
- #0: Remove unnecessary using declarations
- PR: #13056
- #12775: Cleanup docker run action
- PR: #12777
- #0: Update to gcc-12.x, take 2
- PR: #12999
- #12945: update galaxy/n150 eth dispatch cores
- PR: #13031
- #13070: fix SD
- PR: #13073
- Update Llama codeowners
- PR: #12116
- #0: fix uncaught edge case in page update cache and added it in test suit
- PR: #13074
- #12754: Migrate moreh_nll_loss operations (reduced and unreduced) from tt_eager to ttnn
- PR: #12807
- #8633:Add TT_Fatal for full and ones op
- PR: #12921
- #12985: Expose
ttnn::ccl::Topology
at python level- PR: #12988
- #12556: Add queue_id and optional output tensors to assign_bw
- PR: #12573
- Support for increasing 1-D row major int32 tensors by one
- PR: #12773
- #12828: update ttnn matmul doc string
- PR: #13071
- Llama 3.1 8b DRAM-sharded matmuls
- PR: #12869
- Update perf and latest features for llm models (Sept 23)
- PR: #13064
- Work around CSV reporting 64 cores for DRAM-sharded matmuls
- PR: #13108
- #0: Fix PCC to correct bound
- PR: #13110
- #0: Simplify llrt/memory API
- PR: #13067
- #0: Fix caching race
- PR: #13063
- #0: Fix merge error with 80d6e48
- PR: #13112
- #11004: moreh: use env var for kernel src search path
- PR: #12541
- #12328: Fix Llama3.1-8B MLP tests running out of L1
- PR: #13113
- #11769: extend support for transposing/permuting bfloat8 tensors on n…
- PR: #13018
- #12141: Fixed matmul shape validation issue
- PR: #12989
- #0: move BufferType to device kernel accessible location
- PR: #12984
- #12658: update sweep export script and create initial graph script
- PR: #13051
- #0: ViT on WH
- PR: #13072
- [skip ci] Update README.md (ViT on n150)
- PR: #13119
- #0: Bump resnet50 ttnn 2cq compile time because it regressed likely due to gcc risc-v upgrade
- PR: #13121
- #0: Update WH Resnet compile time threshold
- PR: #13115
- Flash decode improvements r2
- PR: #13028
- #0: added support for n_heads > 1 for page cache prefill
- PR: #13117
- #0: Bump mamba compile time as it's not that important and the model is still performant, need to unblock people…
- PR: #13130
- #0: move Layout enum to device accessible location
- PR: #13118
- #0: Bump distilbert compile time because it keeps failing on it
- PR: #13135
- #13088: Cleanup set-1 unary backward ops
- PR: #13096
- #10033: Add forward support for gcd and lcm
- PR: #10241
- #13150: Cleanup LCM, GCD Macro
- PR: #13151
- Llama3.1 8b demo with tracing
- PR: #13153
- #13058: update matmul bias size validation
- PR: #13104
- #0: (MINOR) Update to v0.53.0
- PR: #13165
- #0: try with python 3.10
- PR: #13168
- #13145: Temporarily revert Resnet on Galaxy to use slower config for first conv to avoid hangs
- PR: #13146
- #0: Remove unnecessary ProgramDeleter
- PR: #13134
- #13127: Switch python get_legacy_shape to shape.with_tile_padding()
- PR: #13124
- Add sweeps for remainder, fmod, minimum, maximum, logical_and eltwise ops, rename eltwise sweeps
- PR: #13099
- Fix Yolo tests after updating weights shape in conv2d
- PR: #13163
- #13172: Use lower python version and cache dependencies
- PR: #13173
- #11830: Move l1/dram/pcie alignment into HAL
- PR: #12983
- #13014: optimize slice by adding a 4D uint32_t array implementation o…
- PR: #13125
- Add llk support for cumsum and transpose_wh_dest with relevant tests
- PR: #12925
- Add numeric stable option for softmax
- PR: #13068
- #12878: Add links to job and pipeline for CI/CD analytics
- PR: #13183
- #0: fix CCL nightly tests
- PR: #13164
- #12919: Cleanup set-2 Unary Backward ops
- PR: #13138
- #8865: Add sharded tensor support to dispatch profile infra
- PR: #12871
- #0: Update CODEOWNERS for ttnn/ttnn/operations/moreh.py
- PR: #13185
- #13137: Revise moreh_arange operation
- PR: #13139
- #13095: Refactor moreh_nll_loss operations
- PR: #13097
- #10439: ttnn implementation of vgg model
- PR: #12511
- #13175: Add new category to summary table in sweeps query tool
- PR: #13176
- #5174: Disable command buffer FIFOs on BH
- PR: #13079
- Update CODEOWNERS
- PR: #13209
- Fix demo_trace and add on-device argmax to test_llama_perf
- PR: #13201
- #0: fix program caching bug in post_all_gather
- PR: #13224
- Do not require test dispatch workflow to run on "in-service" runners
- PR: #12660
- Add description to describe typical labels one could use in test dispatch workflow
- PR: #13228
- Add an option to split dprint output by risc
- PR: #13131
- Add new "choose your own pipeline" workflow
- PR: #13230
- #11962: remove uint8 unpack reconfig code
- PR: #13218
- Add tg and tgg frequent tests to "Choose your pipeline" workflow
- PR: #13236
- Add options to select a subset of pipelines that a user would like to run
- PR: #13237
- Update names of perf-models and perf-device-models jobs
- PR: #13238
- #13086: Revising moreh_getitem
- PR: #13087
- Sweeps: log, log1p, log2, log10
- PR: #13045
- #12721: Cleanup set-3 Unary Backward ops
- PR: #13207
- #13212: Cleanup set-4 Unary backward ops
- PR: #13214
- Add initial (very limited) support for line reduce scatter
- PR: #13133
- pack kernel binary memory spans into one
- PR: #12977
v0.53.0-rc2
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11079959460
📦 Uncategorized
- #12883: Add initial unit tests for N300
- PR: #12922
- #12499: Migrate moreh_norm, moreh_norm_backward operations from tt_eager to ttnn
- PR: #12500
- #12321: Migrate moreh_bmm, moreh_bmm_backward operations from tt_eager to ttnn
- PR: #12322
- Add more eltwise sweeps, add new functions in sweep_framework/utils.py
- PR: #13003
- #12690: Port moreh_softmax and moreh_softmax_backward to ttnn
- PR: #12698
- #0: Bump falcon7b device perf test because we have a real bump
- PR: #13008
- Aliu/tech reports
- PR: #13010
- #11332: Move
ttnn/examples
ttnn/ttnn/examples
so we can enable directly calling them for users, but not meant to be part of ttnn API- PR: #11612
- Add sweeps for sign, deg2rad, rad2deg, relu6
- PR: #12994
- Revert "#10016: jit_build: link substitutes, tdma_xmov, noc"
- PR: #13009
- #12952: Update test_ccl_on_tg.cpp to work on TGG as well as TG
- PR: #12982
- [skip ci] #0: ViT report edits
- PR: #13015
- #12879: Use () so that workflow_call actually captures the call when we trigger off completed workflow runs and add them to workflows to properly capture
- PR: #13012
- [skip ci] #13019 Create remove-stale-branches.yaml
- PR: #13020
- #13019 Update remove-stale-branches.yaml
- PR: #13021
- Add tiny tile support for Tensor, matmul
- PR: #12908
- [skip ci] #13019 Add default recipient
- PR: #13023
- build tt metal in docker in CI
- PR: #11923
- Revert "build tt metal in docker in CI"
- PR: #13027
- [skip ci] #0: ViT tech report
- PR: #13032
- Mchiou/11762 build tt metal in docker
- PR: #13033
- #13013: Added tests to run in TGG unit tests workflow
- PR: #13016
- [skip ci] #13019 Update remove-stale-branches.yaml
- PR: #13025
- Mchiou/0 fix docker build storage
- PR: #13042
- #11531: Autogenerate API rst stub files, add summary table on API page
- PR: #12075
- Add --no-advice to perf report, small fixes
- PR: #13048
- preserve fp32 precision
- PR: #12794
- #0: Remove unnecessary using declarations
- PR: #13056
- #12775: Cleanup docker run action
- PR: #12777
- #0: Update to gcc-12.x, take 2
- PR: #12999
- #12945: update galaxy/n150 eth dispatch cores
- PR: #13031
- #13070: fix SD
- PR: #13073
- Update Llama codeowners
- PR: #12116
- #0: fix uncaught edge case in page update cache and added it in test suit
- PR: #13074
- #12754: Migrate moreh_nll_loss operations (reduced and unreduced) from tt_eager to ttnn
- PR: #12807
- #8633:Add TT_Fatal for full and ones op
- PR: #12921
- #12985: Expose
ttnn::ccl::Topology
at python level- PR: #12988
- #12556: Add queue_id and optional output tensors to assign_bw
- PR: #12573
- Support for increasing 1-D row major int32 tensors by one
- PR: #12773
- #12828: update ttnn matmul doc string
- PR: #13071
- Llama 3.1 8b DRAM-sharded matmuls
- PR: #12869
- Update perf and latest features for llm models (Sept 23)
- PR: #13064
- Work around CSV reporting 64 cores for DRAM-sharded matmuls
- PR: #13108
- #0: Fix PCC to correct bound
- PR: #13110
- #0: Simplify llrt/memory API
- PR: #13067
- #0: Fix caching race
- PR: #13063
- #0: Fix merge error with 80d6e48
- PR: #13112
- #11004: moreh: use env var for kernel src search path
- PR: #12541
- #12328: Fix Llama3.1-8B MLP tests running out of L1
- PR: #13113
- #11769: extend support for transposing/permuting bfloat8 tensors on n…
- PR: #13018
- #12141: Fixed matmul shape validation issue
- PR: #12989
- #0: move BufferType to device kernel accessible location
- PR: #12984
- #12658: update sweep export script and create initial graph script
- PR: #13051
- #0: ViT on WH
- PR: #13072
- [skip ci] Update README.md (ViT on n150)
- PR: #13119
- #0: Bump resnet50 ttnn 2cq compile time because it regressed likely due to gcc risc-v upgrade
- PR: #13121
- #0: Update WH Resnet compile time threshold
- PR: #13115
- Flash decode improvements r2
- PR: #13028
- #0: added support for n_heads > 1 for page cache prefill
- PR: #13117
- #0: Bump mamba compile time as it's not that important and the model is still performant, need to unblock people…
- PR: #13130
- #0: move Layout enum to device accessible location
- PR: #13118
- #0: Bump distilbert compile time because it keeps failing on it
- PR: #13135
- #13088: Cleanup set-1 unary backward ops
- PR: #13096
- #10033: Add forward support for gcd and lcm
- PR: #10241
- #13150: Cleanup LCM, GCD Macro
- PR: #13151
- Llama3.1 8b demo with tracing
- PR: #13153
- #13058: update matmul bias size validation
- PR: #13104
- #0: (MINOR) Update to v0.53.0
- PR: #13165
- #0: try with python 3.10
- PR: #13168
- #13145: Temporarily revert Resnet on Galaxy to use slower config for first conv to avoid hangs
- PR: #13146
- #0: Remove unnecessary ProgramDeleter
- PR: #13134
- #13127: Switch python get_legacy_shape to shape.with_tile_padding()
- PR: #13124
- Add sweeps for remainder, fmod, minimum, maximum, logical_and eltwise ops, rename eltwise sweeps
- PR: #13099
- Fix Yolo tests after updating weights shape in conv2d
- PR: #13163
- #13172: Use lower python version and cache dependencies
- PR: #13173
- #11830: Move l1/dram/pcie alignment into HAL
- PR: #12983
- #13014: optimize slice by adding a 4D uint32_t array implementation o…
- PR: #13125
- Add llk support for cumsum and transpose_wh_dest with relevant tests
- PR: #12925
- Add numeric stable option for softmax
- PR: #13068
- #12878: Add links to job and pipeline for CI/CD analytics
- PR: #13183
- #0: fix CCL nightly tests
- PR: #13164
- #12919: Cleanup set-2 Unary Backward ops
- PR: #13138
- #8865: Add sharded tensor support to dispatch profile infra
- PR: #12871
- #0: Update CODEOWNERS for ttnn/ttnn/operations/moreh.py
- PR: #13185
- #13137: Revise moreh_arange operation
- PR: #13139
- #13095: Refactor moreh_nll_loss operations
- PR: #13097
- #10439: ttnn implementation of vgg model
- PR: #12511
- #13175: Add new category to summary table in sweeps query tool
- PR: #13176
- #5174: Disable command buffer FIFOs on BH
- PR: #13079
- Update CODEOWNERS
- PR: #13209
- Fix demo_trace and add on-device argmax to test_llama_perf
- PR: #13201
- #0: fix program caching bug in post_all_gather
- PR: #13224
- Do not require test dispatch workflow to run on "in-service" runners
- PR: #12660
- Add description to describe typical labels one could use in test dispatch workflow
- PR: #13228
- Add an option to split dprint output by risc
- PR: #13131
- Add new "choose your own pipeline" workflow
- PR: #13230
v0.53.0-rc1
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11062971957
📦 Uncategorized
- #12883: Add initial unit tests for N300
- PR: #12922
- #12499: Migrate moreh_norm, moreh_norm_backward operations from tt_eager to ttnn
- PR: #12500
- #12321: Migrate moreh_bmm, moreh_bmm_backward operations from tt_eager to ttnn
- PR: #12322
- Add more eltwise sweeps, add new functions in sweep_framework/utils.py
- PR: #13003
- #12690: Port moreh_softmax and moreh_softmax_backward to ttnn
- PR: #12698
- #0: Bump falcon7b device perf test because we have a real bump
- PR: #13008
- Aliu/tech reports
- PR: #13010
- #11332: Move
ttnn/examples
ttnn/ttnn/examples
so we can enable directly calling them for users, but not meant to be part of ttnn API- PR: #11612
- Add sweeps for sign, deg2rad, rad2deg, relu6
- PR: #12994
- Revert "#10016: jit_build: link substitutes, tdma_xmov, noc"
- PR: #13009
- #12952: Update test_ccl_on_tg.cpp to work on TGG as well as TG
- PR: #12982
- [skip ci] #0: ViT report edits
- PR: #13015
- #12879: Use () so that workflow_call actually captures the call when we trigger off completed workflow runs and add them to workflows to properly capture
- PR: #13012
- [skip ci] #13019 Create remove-stale-branches.yaml
- PR: #13020
- #13019 Update remove-stale-branches.yaml
- PR: #13021
- Add tiny tile support for Tensor, matmul
- PR: #12908
- [skip ci] #13019 Add default recipient
- PR: #13023
- build tt metal in docker in CI
- PR: #11923
- Revert "build tt metal in docker in CI"
- PR: #13027
- [skip ci] #0: ViT tech report
- PR: #13032
- Mchiou/11762 build tt metal in docker
- PR: #13033
- #13013: Added tests to run in TGG unit tests workflow
- PR: #13016
- [skip ci] #13019 Update remove-stale-branches.yaml
- PR: #13025
- Mchiou/0 fix docker build storage
- PR: #13042
- #11531: Autogenerate API rst stub files, add summary table on API page
- PR: #12075
- Add --no-advice to perf report, small fixes
- PR: #13048
- preserve fp32 precision
- PR: #12794
- #0: Remove unnecessary using declarations
- PR: #13056
- #12775: Cleanup docker run action
- PR: #12777
- #0: Update to gcc-12.x, take 2
- PR: #12999
- #12945: update galaxy/n150 eth dispatch cores
- PR: #13031
- #13070: fix SD
- PR: #13073
- Update Llama codeowners
- PR: #12116
- #0: fix uncaught edge case in page update cache and added it in test suit
- PR: #13074
- #12754: Migrate moreh_nll_loss operations (reduced and unreduced) from tt_eager to ttnn
- PR: #12807
- #8633:Add TT_Fatal for full and ones op
- PR: #12921
- #12985: Expose
ttnn::ccl::Topology
at python level- PR: #12988
- #12556: Add queue_id and optional output tensors to assign_bw
- PR: #12573
- Support for increasing 1-D row major int32 tensors by one
- PR: #12773
- #12828: update ttnn matmul doc string
- PR: #13071
- Llama 3.1 8b DRAM-sharded matmuls
- PR: #12869
- Update perf and latest features for llm models (Sept 23)
- PR: #13064
- Work around CSV reporting 64 cores for DRAM-sharded matmuls
- PR: #13108
- #0: Fix PCC to correct bound
- PR: #13110
- #0: Simplify llrt/memory API
- PR: #13067
- #0: Fix caching race
- PR: #13063
- #0: Fix merge error with 80d6e48
- PR: #13112
- #11004: moreh: use env var for kernel src search path
- PR: #12541
- #12328: Fix Llama3.1-8B MLP tests running out of L1
- PR: #13113
- #11769: extend support for transposing/permuting bfloat8 tensors on n…
- PR: #13018
- #12141: Fixed matmul shape validation issue
- PR: #12989
- #0: move BufferType to device kernel accessible location
- PR: #12984
- #12658: update sweep export script and create initial graph script
- PR: #13051
- #0: ViT on WH
- PR: #13072
- [skip ci] Update README.md (ViT on n150)
- PR: #13119
- #0: Bump resnet50 ttnn 2cq compile time because it regressed likely due to gcc risc-v upgrade
- PR: #13121
- #0: Update WH Resnet compile time threshold
- PR: #13115
- Flash decode improvements r2
- PR: #13028
- #0: added support for n_heads > 1 for page cache prefill
- PR: #13117
- #0: Bump mamba compile time as it's not that important and the model is still performant, need to unblock people…
- PR: #13130
- #0: move Layout enum to device accessible location
- PR: #13118
- #0: Bump distilbert compile time because it keeps failing on it
- PR: #13135
- #13088: Cleanup set-1 unary backward ops
- PR: #13096
- #10033: Add forward support for gcd and lcm
- PR: #10241
- #13150: Cleanup LCM, GCD Macro
- PR: #13151
- Llama3.1 8b demo with tracing
- PR: #13153
- #13058: update matmul bias size validation
- PR: #13104
- #0: (MINOR) Update to v0.53.0
- PR: #13165
- #0: try with python 3.10
- PR: #13168
- #13145: Temporarily revert Resnet on Galaxy to use slower config for first conv to avoid hangs
- PR: #13146
- #0: Remove unnecessary ProgramDeleter
- PR: #13134
- #13127: Switch python get_legacy_shape to shape.with_tile_padding()
- PR: #13124
- Add sweeps for remainder, fmod, minimum, maximum, logical_and eltwise ops, rename eltwise sweeps
- PR: #13099
- Fix Yolo tests after updating weights shape in conv2d
- PR: #13163
- #13172: Use lower python version and cache dependencies
- PR: #13173
- #11830: Move l1/dram/pcie alignment into HAL
- PR: #12983
- #13014: optimize slice by adding a 4D uint32_t array implementation o…
- PR: #13125
- Add llk support for cumsum and transpose_wh_dest with relevant tests
- PR: #12925
- Add numeric stable option for softmax
- PR: #13068
- #12878: Add links to job and pipeline for CI/CD analytics
- PR: #13183
- #0: fix CCL nightly tests
- PR: #13164
v0.52.1-rc1
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11043931400
📦 Uncategorized
- #12883: Add initial unit tests for N300
- PR: #12922
- #12499: Migrate moreh_norm, moreh_norm_backward operations from tt_eager to ttnn
- PR: #12500
- #12321: Migrate moreh_bmm, moreh_bmm_backward operations from tt_eager to ttnn
- PR: #12322
- Add more eltwise sweeps, add new functions in sweep_framework/utils.py
- PR: #13003
- #12690: Port moreh_softmax and moreh_softmax_backward to ttnn
- PR: #12698
- #0: Bump falcon7b device perf test because we have a real bump
- PR: #13008
- Aliu/tech reports
- PR: #13010
- #11332: Move
ttnn/examples
ttnn/ttnn/examples
so we can enable directly calling them for users, but not meant to be part of ttnn API- PR: #11612
- Add sweeps for sign, deg2rad, rad2deg, relu6
- PR: #12994
- Revert "#10016: jit_build: link substitutes, tdma_xmov, noc"
- PR: #13009
- #12952: Update test_ccl_on_tg.cpp to work on TGG as well as TG
- PR: #12982
- [skip ci] #0: ViT report edits
- PR: #13015
- #12879: Use () so that workflow_call actually captures the call when we trigger off completed workflow runs and add them to workflows to properly capture
- PR: #13012
- [skip ci] #13019 Create remove-stale-branches.yaml
- PR: #13020
- #13019 Update remove-stale-branches.yaml
- PR: #13021
- Add tiny tile support for Tensor, matmul
- PR: #12908
- [skip ci] #13019 Add default recipient
- PR: #13023
- build tt metal in docker in CI
- PR: #11923
- Revert "build tt metal in docker in CI"
- PR: #13027
- [skip ci] #0: ViT tech report
- PR: #13032
- Mchiou/11762 build tt metal in docker
- PR: #13033
- #13013: Added tests to run in TGG unit tests workflow
- PR: #13016
- [skip ci] #13019 Update remove-stale-branches.yaml
- PR: #13025
- Mchiou/0 fix docker build storage
- PR: #13042
- #11531: Autogenerate API rst stub files, add summary table on API page
- PR: #12075
- Add --no-advice to perf report, small fixes
- PR: #13048
- preserve fp32 precision
- PR: #12794
- #0: Remove unnecessary using declarations
- PR: #13056
- #12775: Cleanup docker run action
- PR: #12777
- #0: Update to gcc-12.x, take 2
- PR: #12999
- #12945: update galaxy/n150 eth dispatch cores
- PR: #13031
- #13070: fix SD
- PR: #13073
- Update Llama codeowners
- PR: #12116
- #0: fix uncaught edge case in page update cache and added it in test suit
- PR: #13074
- #12754: Migrate moreh_nll_loss operations (reduced and unreduced) from tt_eager to ttnn
- PR: #12807
- #8633:Add TT_Fatal for full and ones op
- PR: #12921
- #12985: Expose
ttnn::ccl::Topology
at python level- PR: #12988
- #12556: Add queue_id and optional output tensors to assign_bw
- PR: #12573
- Support for increasing 1-D row major int32 tensors by one
- PR: #12773
- #12828: update ttnn matmul doc string
- PR: #13071
- Llama 3.1 8b DRAM-sharded matmuls
- PR: #12869
- Update perf and latest features for llm models (Sept 23)
- PR: #13064
- Work around CSV reporting 64 cores for DRAM-sharded matmuls
- PR: #13108
- #0: Fix PCC to correct bound
- PR: #13110
- #0: Simplify llrt/memory API
- PR: #13067
- #0: Fix caching race
- PR: #13063
- #0: Fix merge error with 80d6e48
- PR: #13112
- #11004: moreh: use env var for kernel src search path
- PR: #12541
- #12328: Fix Llama3.1-8B MLP tests running out of L1
- PR: #13113
- #11769: extend support for transposing/permuting bfloat8 tensors on n…
- PR: #13018
- #12141: Fixed matmul shape validation issue
- PR: #12989
- #0: move BufferType to device kernel accessible location
- PR: #12984
- #12658: update sweep export script and create initial graph script
- PR: #13051
- #0: ViT on WH
- PR: #13072
- [skip ci] Update README.md (ViT on n150)
- PR: #13119
- #0: Bump resnet50 ttnn 2cq compile time because it regressed likely due to gcc risc-v upgrade
- PR: #13121
- #0: Update WH Resnet compile time threshold
- PR: #13115
- Flash decode improvements r2
- PR: #13028
- #0: added support for n_heads > 1 for page cache prefill
- PR: #13117
- #0: Bump mamba compile time as it's not that important and the model is still performant, need to unblock people…
- PR: #13130
- #0: move Layout enum to device accessible location
- PR: #13118
- #0: Bump distilbert compile time because it keeps failing on it
- PR: #13135
v0.52.0
Note
This is a verified, real release, however the release notes are under construction. Thank you for understanding.
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11036234439
📦 Uncategorized
- #12323: delete ctor of AllGatherFusedOpSignaler
- PR: #12324
- #0: Revert "#0: Update to gcc-12.x (#12332)"
- PR: #12522
- #12448: Update 1d matmul sweep test to use CoreRangeSet for core range parameters
- PR: #12523
- #0: [skip ci] Fix demo invocation in Llama README
- PR: #12526
- #0: Update device creations functions to use num_command_queues instead of num_hw_cqs to match mesh_device creation functions
- PR: #12517
- #12273: Move full wheel build on GitHub runners and 22.04 to scheduled job and fix related
- PR: #12535
- Update perf and latest features for llm models (Sept 11)
- PR: #12515
- #12532: Change sweep new vector checking to use the serialized vector…
- PR: #12534
- #12451: add negative ends support for slice with list splicing format
- PR: #12469
- fix llama t3k demo invoke in CI
- PR: #12537
- Yugao/doc
- PR: #12540
- #10855: Add single-device perf measurements to sweep infra
- PR: #12338
- #9340: Add optional output tensor support for assign
- PR: #12057
- #0: Add ccl multichip stack overview
- PR: #12551
- #12371: Migrate
moreh_getitem
operation fromtt_eager
tottnn
- PR: #12372
- #11651: Remove type_caster
- PR: #11702
- #12375: Add qid and optional tensor output to ttnn.gelu_bw
- PR: #12509
- #8865: Optimized ttnn.bcast dispatch times
- PR: #12383
- #12196: Use split readers wherever possible in UNet Shallow
- PR: #12441
- Replace exact output match with tight pcc check in post-commit
- PR: #12446
- #12148: Add queue_id and optional output tensors to ttnn.mul_bw
- PR: #12162
- Fix start_pos in get_rot_mat() in llama galaxy model
- PR: #12493
- Yieldthought/llama31 8b/ttembed
- PR: #12560
- #8865: Fix non working ops in dispatch profiling infra
- PR: #12564
- #0: Remove myself from tt_lib/csrc codeowners
- PR: #12567
- Update workload theoretical ethernet numbers
- PR: #12570
- #12524: Update fmt and unify logging API
- PR: #12464
- #0: Update fmt and unify logging API
- PR: #12587
- #11133: Improve various things about the wheel, including removal of
patchelf
and linking runtime assets to cwd- PR: #11884
- Support for initializing with 0s for SUM reduction WHB0
- PR: #12238
- #12376: Support for non-32 Height in Width Sharded Conv2d
- PR: #12382
- #0: Optimize context switch decision
- PR: #12545
- #0: Correct #!/bin script headers
- PR: #12582
- #12538: Separate out wheel tests from build so that other wheel-dependent jobs aren't blocked by the wheel smoke tests
- PR: #12594
- #0: Create Blackhole Bring-Up Programming Guide
- PR: #12610
- #12552: Fix indentation pybind files
- PR: #12543
- #0: Add FD nightly single-card pipeline to data pipeline
- PR: #12618
- #0: [skip_ci] Updating BH bring-up programming guide
- PR: #12620
- Update owner of T3K ttnn unit tests
- PR: #12622
- #0: change default reduce scatter num buffers per channel to 2
- PR: #12616
- #12436: port
moreh_sum
fromtt_dnn
tottnn
- PR: #12437
- #12026: add permute sweep tests for trace
- PR: #12571
- #12514: port
moreh_mean
andmoreh_mean_backward
from tt_dnn to ttnn- PR: #12519
- #12207: Port moreh_dot to ttnn
- PR: #12265
- #12259: Move moreh dot backward
- PR: #12261
- #12164: Add queue_id and optional output tensors to backward ops
- PR: #12255
- #12439: Migrate moreh_nll_loss_bwd operations (reduced and unreduced) from tt_eager to ttnn
- PR: #12494
- #12578: Update Mixtral t/s/u in README
- PR: #12629
- #12373: Add queue_id and optional output tensors to rsqrt_bw op
- PR: #12404
- remove todos from doc
- PR: #12636
- add code language formatting CclDeveloperGuide.md
- PR: #12639
- #0: Update multi-chip Resnet perf numbers after dispatch optimizations
- PR: #12621
- #0: Remove unused _init, _fini
- PR: #12593
- #0: remove unused variable
- PR: #12646
- Contiguous pages support in Reduce Scatter read/write
- PR: #12477
- #12628: Resolve arithmetic error in test_multi_cq_multi_dev causing T3K multi-CQ tests to fail
- PR: #12653
- #12619: Update matmul sweep timeout and core range set usage
- PR: #12655
- Run on custom dispatch commands on in-service runners only
- PR: #12659
- #12544: support wide channels (> 256) in maxpool
- PR: #12625
- #12605: Implement recommendations for Llama readme
- PR: #12657
- #0: Point UMD back to
main
instead ofmetal-main
- PR: #12478
- #0: ViT Trace+2CQ implementation
- PR: #12623
- #0: Add BH to custom test dispatch workflow
- PR: #12667
- Update ViT on GS perf
- PR: #12670
- LLama selfout specific optimizations for fused all_gather_matmul op
- PR: #12292
- #12520: Adding noc_async_writes_flushed between mcast writes and mcast semaphore sets for BH
- PR: #12627
- #11144: Upgrade
pip
version to21.2.4
to get around 22.04 import error- PR: #12673
- Remove duplicate from sfpu_split_includes.h
- PR: #12665
- #12250: port moreh_matmul from tt_dnn to ttnn
- PR: #12251
- #12297: Add queue_id and optional output tensors to add_bw op
- PR: #12358
- #12392: Use shallow convolution in upblock3 of UNet Shallow
- PR: #12562
- #0: Make CoreRangeSet thread safe
- PR: #12679
- mm_sfence->tt_driver_atomics::sfence();
- PR: #12617
- [New Op] Added dropout unary op
- PR: #12474
- #12392: Shallov conv unet uts
- PR: #12568
- Pkeller/memmap profiler
- PR: #12067
- #0: Set WH_ARCH_YAML only if we have a wormhole machine
- PR: #12704
- All gather expose params
- PR: #12389
- Generalize nlp create head decode
- PR: #12663
- #0: Remove CCL stalls, since Fabric VC support is merged
- PR: #12720
- #0: Remove incorrect norelax option
- PR: #12717
- #12668: SWOC bugfix
- PR: #12674
- Fix start pos in get_rot_mat
- PR: #12728
- #0: Remove unused CRT_START label
- PR: #12722
- #12701: Split nightly tests into specific models for better reading
- PR: #12733
- #0: Relax host bound tg threshold for Resnet
- PR: #12708
- Rename tt::tt_metal::Shape to LegacyShape to not conflict with TTNN
- PR: #12742
- #12374: Add optional output tensor support for ttnn.full_like
- PR: #12689
- YoloV4 pipeline update
- PR: #12503
- #12425: Add queue_id and optional output tensors to zeros_like
- PR: #12561
- #12497: ttnn.empty to use create_device_tensor
- PR: #12542
- #12266: Cleanup ternary backward
- PR: #12691
- #0: Use absolute addressing in startup
- PR: #12723
- #12595: Run profiler gather after every sweep test regardless of status
- PR: #12606
- #12730: bert slice support unit tests
- PR: #12737
- Reduce scatter perf sweep
- PR: #12391
- #12778: Speed up sweeps parameter generation
- PR: #12780
- #0: DPrint bugfix for which dispatch cores are included in 'all'
- PR: #12745
- #12730: bert slice support unit tests correction
- PR: #12779
- #5783: Remove watcher dependency on generated headers
- PR: #12686
- #0: Update GS Resnet perf thresholds. Seeing large variation in CI
- PR: #12744
- Fix issue w/ CBs getting allocated on ETH cores
- PR: #12792
- #12802: add tracy option to build_metal.sh
- PR: #12803
- #12748: Cleanup clamp_bw op
- PR: #12762
- #12224: Add optional output tensor support for lt_bw
- PR: #12693
- #12387: Workaround to_layout for height sharded tensor
- PR: #12641
- #12196: Use split_reader and act db
- PR: #12769
- #12508: Skip failing test in CI
- PR: #12761
- #11512: Add frac, ceil and trunc sweeps
- PR: #12760
- #0: Don't overwrite CMake flags in
build_metal.sh
- PR: #12824
- Add subtract, subalpha and rsub sweeps, interleaved
- PR: #12822
- Llama tg/sharded ccls
- PR: #12814
- Update peak dram speed to 288GB/s
- PR: #12528
- #11169: Watcher to report if eth link retraining occurred during teardown
- PR: #12801
- #0: adding jaykru-tt as codeowner for data_movement operations
- PR: #12139
- Mamba CI hanging on Untilize fix
- PR: #12677
- #12749: Update Test files
- PR: #12751
- #12799: Add handling for pytest errors, especially those at the beginning, and expose their messages
- PR: #12838
- #12529: Update comment of dataflow api for mcast loopback functions
- PR: #12825
- Fix failure in llama perf on CI
- PR: #12669
- fix typo - mention higher level multichip API above CCL ops
- PR: #12836
- Add Mamba unit tests to post-commit test suite
- PR: #12129
- #12529: Add check for in0_mcast_num_cores=1 for noc_async_write_multicast_loopback_src
- PR: #12796
- #0: Change all ops which support page_table to enable non-log2 shapes
- PR: #12842
- #12198: Add 2CQ and trace support for UNet Shallow
- PR: #12820
- Add supports/examples for placing Reads and Writes on CQ1
- PR: #12821
- #9370: Workaround: replace WRCFG with RMWCIB instructions in reduce_revert_delta
- PR: #12832
- Remove UNet from landing page
- PR: #12856
- #12750: Replace zeros_like with empty_like in backward ops
- PR: #12766
- #12840: Add more handling more multiple attempts by restricting the space of
github_job_id
s we're looking to only the ones in the workflow run attempt in questi...
v0.52.0-rc33
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11024689338
- no changes
v0.52.0-rc32
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11005335923
- no changes
v0.52.0-rc31
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10986217781
- no changes
v0.52.0-rc30
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10968559558
- no changes
v0.52.0-rc29
Note
If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.
The changelog will now follow, showing the changes from last release.
This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10951618664
- no changes