Skip to content

Releases: tenstorrent/tt-metal

v0.53.1-rc6

30 Nov 01:58
Compare
Choose a tag to compare
v0.53.1-rc6 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12092456407

🚀 Features

📦 Uncategorized

  • [CCL] Add negative dim support
  • #12151: Replace avg_pool2d with global_avg_pool2d
  • Update Qwen README.md to remove Llama references
  • Optimized Llama 3.x perf with sharded residual
  • #15247: Add unit test to show segfault with sharded config problem
  • Fix num cores for dram sharded MM
  • #0: Dispatch RTAs early in some cases
  • #0: New RiscV architecture extension attributes
  • #15361: Conv2d width sharded fails with tilized input
  • #6659: remove dead code
  • CB Size Validation Fix Rollout
  • #12979: Merge erisc data & bss sections
  • Fold batches into channels and use grouped convolutions in UNet Shallow
  • [TT-Train] Added Yaml Configs support
  • #7493: Accidently added two tests that should have been deleted durin…
  • #0: Add InsertBraces: true to .clang-format
  • Update unary doc examples set2
  • Update unary doc examples set3
  • Add all gather perf to pipeline for TG
  • #0: update ref links to eltwise pytorch2 sweeps
  • Relase metadata blocks on allocator destruction
  • #15440: fix stack overflow in vc_packet_router
  • Revert changes to Falcon7b matmul configs to fix CI tests after default matmul configs were modified
  • Put the Git repo in a happy state before attempting to checkout
  • Publish Release Images
  • Rename Tutorial - Add Two Integers in a Baby RISC-V.md to Tutorial_Ad…
  • Create Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • UMD create_mock_cluster and fix PhysicalCoordinate
  • Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • #15123 fix performance unnecessary value param
  • [tt-train] Add nanogpt tests with AdamW and MorehAdamW
  • [tt-train]Rremoved slow odd option for fp16 host->device conversion.
  • Remove dead param
  • Allow builds to run to their natural end
  • [skip ci] Update .clang-format
  • Formatting pass on ttnn directory, only files not in open PRs
  • Formatting pass on tt_metal directory, only files not in open PRs
  • #15171: Better parallelization strategy
  • [skip ci] Add clang-format to CI
  • Update .clang-format-ignore
  • Update ignore revs
  • Add perf report docs and slightly improve output
  • #13332: add ttnn implementation for Bert-Tiny model
  • #0: LLM Tech Report: Intro
  • Fix the test for whether to install the wheel, and also exit the script on the first error
  • Populate the version based on Git Describe
  • LLM tech report performance analysis
  • #13875: fix tilize and attn matmul on BH
  • Yolo Optimization
  • #0: Port eltwise and some misc ops to use TensorSpec
  • [tt-train] Implement composite LayerNorm
  • #14974: ttnn::{full,empty}_like Tensor creation API for MeshDevice
  • Add support to change FD cores from row to col placement
  • [skip ci] Fix Formatting of host_api.hpp tables
  • Use UMD's public API - no more fishing into private paths
  • Use the post-merge commit of UMD
  • [skip ci] Disable clang-format in pre-commit and move version ahead
  • Support for new matmul1d op with gather_in0
  • Remove dead includes that break without proper include paths
  • More formatting of ttnn
  • Make MEM_LOCAL_BASE accessible behind Hal
  • Add support for rank-n tensors to tilize and untilize
  • Add reduce scatter perf to tg
  • Fix Circular Buffer Allocation in untilize_with_halo
  • [skip ci] use git-clang-format in CI
  • #13676: i1 op kernel implementation and improve i0_bw pcc
  • Increase perf margin on unet
  • #12558: TTNN implementation of MNIST model
  • Remove unsupported shapes to make pipeline green
  • #13401: Add data parallel support for Bert-Tiny model
  • #15297: Allow MeshDevice to be initialized for chips without eth coordinates
  • #0: Disable clang-format precommit check once again due to errors
  • #15337: Fix incorrectly sized cb in remote cb microbenchmark
  • [skip ci] Update CONTRIBUTING.md with pre-commit info
  • Remove ClusterDescriptor path from constructor
  • Add performance and accuracy configurations to Llama 3
  • Disable upblock 3 and 4 unet unit tests
  • Re-enable git-clang-format for pre-commit again
  • Fix race condition in DRAM sharded MM
  • Fix to concat support for tensors with tile padding
  • Adjust perf test targets for Falcon7b-t3k-decode-noasync to account for CI instability

v0.53.1-rc5

29 Nov 01:59
5af6427
Compare
Choose a tag to compare
v0.53.1-rc5 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12077551995

🚀 Features

📦 Uncategorized

  • [CCL] Add negative dim support
  • #12151: Replace avg_pool2d with global_avg_pool2d
  • Update Qwen README.md to remove Llama references
  • Optimized Llama 3.x perf with sharded residual
  • #15247: Add unit test to show segfault with sharded config problem
  • Fix num cores for dram sharded MM
  • #0: Dispatch RTAs early in some cases
  • #0: New RiscV architecture extension attributes
  • #15361: Conv2d width sharded fails with tilized input
  • #6659: remove dead code
  • CB Size Validation Fix Rollout
  • #12979: Merge erisc data & bss sections
  • Fold batches into channels and use grouped convolutions in UNet Shallow
  • [TT-Train] Added Yaml Configs support
  • #7493: Accidently added two tests that should have been deleted durin…
  • #0: Add InsertBraces: true to .clang-format
  • Update unary doc examples set2
  • Update unary doc examples set3
  • Add all gather perf to pipeline for TG
  • #0: update ref links to eltwise pytorch2 sweeps
  • Relase metadata blocks on allocator destruction
  • #15440: fix stack overflow in vc_packet_router
  • Revert changes to Falcon7b matmul configs to fix CI tests after default matmul configs were modified
  • Put the Git repo in a happy state before attempting to checkout
  • Publish Release Images
  • Rename Tutorial - Add Two Integers in a Baby RISC-V.md to Tutorial_Ad…
  • Create Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • UMD create_mock_cluster and fix PhysicalCoordinate
  • Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • #15123 fix performance unnecessary value param
  • [tt-train] Add nanogpt tests with AdamW and MorehAdamW
  • [tt-train]Rremoved slow odd option for fp16 host->device conversion.
  • Remove dead param
  • Allow builds to run to their natural end
  • [skip ci] Update .clang-format
  • Formatting pass on ttnn directory, only files not in open PRs
  • Formatting pass on tt_metal directory, only files not in open PRs
  • #15171: Better parallelization strategy
  • [skip ci] Add clang-format to CI
  • Update .clang-format-ignore
  • Update ignore revs
  • Add perf report docs and slightly improve output
  • #13332: add ttnn implementation for Bert-Tiny model
  • #0: LLM Tech Report: Intro
  • Fix the test for whether to install the wheel, and also exit the script on the first error
  • Populate the version based on Git Describe
  • LLM tech report performance analysis
  • #13875: fix tilize and attn matmul on BH
  • Yolo Optimization
  • #0: Port eltwise and some misc ops to use TensorSpec
  • [tt-train] Implement composite LayerNorm
  • #14974: ttnn::{full,empty}_like Tensor creation API for MeshDevice
  • Add support to change FD cores from row to col placement
  • [skip ci] Fix Formatting of host_api.hpp tables
  • Use UMD's public API - no more fishing into private paths
  • Use the post-merge commit of UMD
  • [skip ci] Disable clang-format in pre-commit and move version ahead
  • Support for new matmul1d op with gather_in0
  • Remove dead includes that break without proper include paths
  • More formatting of ttnn
  • Make MEM_LOCAL_BASE accessible behind Hal
  • Add support for rank-n tensors to tilize and untilize
  • Add reduce scatter perf to tg
  • Fix Circular Buffer Allocation in untilize_with_halo
  • [skip ci] use git-clang-format in CI
  • #13676: i1 op kernel implementation and improve i0_bw pcc
  • Increase perf margin on unet
  • #12558: TTNN implementation of MNIST model
  • Remove unsupported shapes to make pipeline green
  • #13401: Add data parallel support for Bert-Tiny model
  • #15297: Allow MeshDevice to be initialized for chips without eth coordinates
  • #0: Disable clang-format precommit check once again due to errors
  • #15337: Fix incorrectly sized cb in remote cb microbenchmark

v0.53.1-rc4

28 Nov 15:53
db6343e
Compare
Choose a tag to compare
v0.53.1-rc4 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12071750053

📦 Uncategorized

  • [CCL] Add negative dim support
  • #12151: Replace avg_pool2d with global_avg_pool2d
  • Update Qwen README.md to remove Llama references
  • Optimized Llama 3.x perf with sharded residual
  • #15247: Add unit test to show segfault with sharded config problem
  • Fix num cores for dram sharded MM
  • #0: Dispatch RTAs early in some cases
  • #0: New RiscV architecture extension attributes
  • #15361: Conv2d width sharded fails with tilized input
  • #6659: remove dead code
  • CB Size Validation Fix Rollout
  • #12979: Merge erisc data & bss sections
  • Fold batches into channels and use grouped convolutions in UNet Shallow
  • [TT-Train] Added Yaml Configs support
  • #7493: Accidently added two tests that should have been deleted durin…
  • #0: Add InsertBraces: true to .clang-format
  • Update unary doc examples set2
  • Update unary doc examples set3
  • Add all gather perf to pipeline for TG
  • #0: update ref links to eltwise pytorch2 sweeps
  • Relase metadata blocks on allocator destruction
  • #15440: fix stack overflow in vc_packet_router
  • Revert changes to Falcon7b matmul configs to fix CI tests after default matmul configs were modified
  • Put the Git repo in a happy state before attempting to checkout
  • Publish Release Images
  • Rename Tutorial - Add Two Integers in a Baby RISC-V.md to Tutorial_Ad…
  • Create Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • UMD create_mock_cluster and fix PhysicalCoordinate
  • Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • #15123 fix performance unnecessary value param
  • [tt-train] Add nanogpt tests with AdamW and MorehAdamW
  • [tt-train]Rremoved slow odd option for fp16 host->device conversion.
  • Remove dead param
  • Allow builds to run to their natural end
  • [skip ci] Update .clang-format
  • Formatting pass on ttnn directory, only files not in open PRs
  • Formatting pass on tt_metal directory, only files not in open PRs
  • #15171: Better parallelization strategy
  • [skip ci] Add clang-format to CI
  • Update .clang-format-ignore
  • Update ignore revs
  • Add perf report docs and slightly improve output
  • #13332: add ttnn implementation for Bert-Tiny model
  • #0: LLM Tech Report: Intro
  • Fix the test for whether to install the wheel, and also exit the script on the first error
  • Populate the version based on Git Describe
  • LLM tech report performance analysis
  • #13875: fix tilize and attn matmul on BH
  • Yolo Optimization
  • #0: Port eltwise and some misc ops to use TensorSpec
  • [tt-train] Implement composite LayerNorm
  • #14974: ttnn::{full,empty}_like Tensor creation API for MeshDevice
  • Add support to change FD cores from row to col placement
  • [skip ci] Fix Formatting of host_api.hpp tables
  • Use UMD's public API - no more fishing into private paths
  • Use the post-merge commit of UMD
  • [skip ci] Disable clang-format in pre-commit and move version ahead
  • Support for new matmul1d op with gather_in0
  • Remove dead includes that break without proper include paths
  • More formatting of ttnn
  • Make MEM_LOCAL_BASE accessible behind Hal
  • Add support for rank-n tensors to tilize and untilize
  • Add reduce scatter perf to tg
  • Fix Circular Buffer Allocation in untilize_with_halo
  • [skip ci] use git-clang-format in CI
  • #13676: i1 op kernel implementation and improve i0_bw pcc
  • Increase perf margin on unet
  • #12558: TTNN implementation of MNIST model
  • Remove unsupported shapes to make pipeline green

v0.53.1-rc2

27 Nov 01:59
0390e0c
Compare
Choose a tag to compare
v0.53.1-rc2 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12042084031

📦 Uncategorized

  • [CCL] Add negative dim support
  • #12151: Replace avg_pool2d with global_avg_pool2d
  • Update Qwen README.md to remove Llama references
  • Optimized Llama 3.x perf with sharded residual
  • #15247: Add unit test to show segfault with sharded config problem
  • Fix num cores for dram sharded MM
  • #0: Dispatch RTAs early in some cases
  • #0: New RiscV architecture extension attributes
  • #15361: Conv2d width sharded fails with tilized input
  • #6659: remove dead code
  • CB Size Validation Fix Rollout
  • #12979: Merge erisc data & bss sections
  • Fold batches into channels and use grouped convolutions in UNet Shallow
  • [TT-Train] Added Yaml Configs support
  • #7493: Accidently added two tests that should have been deleted durin…
  • #0: Add InsertBraces: true to .clang-format
  • Update unary doc examples set2
  • Update unary doc examples set3
  • Add all gather perf to pipeline for TG
  • #0: update ref links to eltwise pytorch2 sweeps
  • Relase metadata blocks on allocator destruction
  • #15440: fix stack overflow in vc_packet_router
  • Revert changes to Falcon7b matmul configs to fix CI tests after default matmul configs were modified
  • Put the Git repo in a happy state before attempting to checkout
  • Publish Release Images
  • Rename Tutorial - Add Two Integers in a Baby RISC-V.md to Tutorial_Ad…
  • Create Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • UMD create_mock_cluster and fix PhysicalCoordinate
  • Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • Update Tutorial_Add_Two_Integers_in_a_Compute_Kernel.md
  • #15123 fix performance unnecessary value param
  • [tt-train] Add nanogpt tests with AdamW and MorehAdamW
  • [tt-train]Rremoved slow odd option for fp16 host->device conversion.

v0.53.1-rc1

26 Nov 01:59
78075c6
Compare
Choose a tag to compare
v0.53.1-rc1 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12022082193

📦 Uncategorized

  • [CCL] Add negative dim support
  • #12151: Replace avg_pool2d with global_avg_pool2d
  • Update Qwen README.md to remove Llama references
  • Optimized Llama 3.x perf with sharded residual
  • #15247: Add unit test to show segfault with sharded config problem
  • Fix num cores for dram sharded MM
  • #0: Dispatch RTAs early in some cases
  • #0: New RiscV architecture extension attributes
  • #15361: Conv2d width sharded fails with tilized input
  • #6659: remove dead code
  • CB Size Validation Fix Rollout
  • #12979: Merge erisc data & bss sections
  • Fold batches into channels and use grouped convolutions in UNet Shallow
  • [TT-Train] Added Yaml Configs support
  • #7493: Accidently added two tests that should have been deleted durin…
  • #0: Add InsertBraces: true to .clang-format

v0.53.0-rc51

25 Nov 19:05
Compare
Choose a tag to compare
v0.53.0-rc51 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12016640880

  • no changes

v0.53.0-rc50

25 Nov 01:59
Compare
Choose a tag to compare
v0.53.0-rc50 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12001584495

  • no changes

v0.53.0

25 Nov 19:09
Compare
Choose a tag to compare
v0.53.0 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/12016702477

📦 Uncategorized

  • #14773: Set default to true when getting active ethernet cores
  • #11795: Update test_pgm_dispatch and sweep
  • #14880: Ternary composite op clean up
  • #14928: Ternary backward clean up
  • #14930: Complex backward op clean up
  • #0: Update Mixtral target
  • #14665: add new moreh_clip_grad_norm and test in ttnn
  • #14730: Support unequal ranked inputs for eltwise binary
  • Fix double deallocate in llama3 attention
  • #14862: fp32 support in unary
  • Angle op fix
  • Fix a non-c-typedef-for-linkage error
  • Add experimental fused qk ROPE
  • [skip ci] #14001: Add an ALIAS target for consuming TTNN
  • #0: Disable llama test_model from all-post-commit CI pipeline
  • float32 tilize support
  • Move NUM_CIRCULAR_BUFFERS to hw/inc
  • Mchiou/14961 disable gs profiler ring buffer
  • #14990: Address feedback in Programming Mesh of Devices Tech Report
  • #11512: Add sweep test for ttnn.transformers.attention_softmax
  • #14826: Remove misoptimizations from init code
  • Use cluster desc yaml on BH and pass PCIe NoC endpoint to device
  • Increase packer precision for bfp8 formats
  • Revert "Angle op fix"
  • use do_crt1 like other cores
  • Fixed incorrect mem size for DebugIErisc
  • Dvartanians/mbahnas/yolov4 web demo traced
  • [skip ci] Update CODEOWNERS
  • Added tt-train to the tt-metal monorepo
  • #0: Disable Unity builds to detect bitrot
  • Update Resnet50 perf on n150
  • [skip ci] Add GEMM techreport to explain WH performance
  • Alignment fix for BH in I2S and S2I
  • [skip ci] Update README.md (MM FLOPS)
  • FD refactor + sub device support
  • #0: Provide script for installing system dependencies
  • Build with unity in build-artifact.yaml, don't use unity in build.yaml
  • Move NOC_0_X/Y behind Hal
  • Add reduce_scatter t3k perf to pipeline
  • add initial fabric erisc data mover (EDM) impl
  • Revert "Alignment fix for BH in I2S and S2I"
  • Revert "use do_crt1 like other cores"
  • Revert "#14826: Remove misoptimizations from init code"
  • Reduce dependence on ARCH_NAME in dev_msgs.h
  • graph trace update - extract_circular_buffers_peak_size_per_core
  • Llama-Vision: Enable tracing, refactor generation code
  • [tt-train] Added mesh support
  • #13655: Fix sub-device tests for BH
  • LlamaVision: Move xattn cache generation to text prefill forward
  • Revert "Reduce dependence on ARCH_NAME in dev_msgs.h"
  • Alignment fix for BH on I2S and S2I (fix after revert)
  • Update size.hpp
  • [skip ci] #0: update yolov4 READMEs
  • #15073: Fix use after move in ttnn run_operation
  • Restructure supported params table for ternary ops
  • Change tt_SiliconDevice to tt::umd::Cluster
  • #14546: Fix moreh_adamw power_tile reduce performance
  • Update documentation for LERP
  • Restructure supported params table for ternary backward ops
  • #14999: Update scatter golden function
  • Update ternary and backward ternary pybind examples
  • (REDO) Reduce dependence on ARCH_NAME in dev_msgs.h
  • #13521: New sweep for pytorch tracing - ttnn.add
  • #14590: Move sfpi off LFS
  • Add multi-block support for matmul_2d
  • Enable CCache for builds
  • Enable clang-tidy check for use after move
  • #14688 Scan the repo with clang-tidy as part of post-commit
  • Add tunneler tests to ci
  • #11795: Added tests that dispatch randomly-generated Programs and alternate between using trace and not using trace
  • #14895: enable gp-rel in kernels
  • Add entry to MM benchmark
  • Fix use-after-move
  • #15123 This check is clean
  • #5174: Uplifitng microbenchmarks to run on BH
  • Relax Max Pool Requirement For C To Be Power Of 2
  • Remove LFS from tt-train
  • #14985: Update the examples for binary backward doc
  • Add integer support for eltwise ops
  • #0: Use logical shape in validation check
  • [CCL] Compute device utilization percentage
  • #15144: Increase trace region for yolo to fix
  • #15079: make ProgramCache::is_enabled_ initialized out-of-line
  • [skip ci] Update GEMM_FLOPS.md
  • [skip ci] Update README.md
  • [skip ci] Update GEMM_FLOPS.md
  • [skip ci] Add files via upload
  • #14474: Fix OoO issues for Llama3 tests on CI
  • #0: Revert "#14730: Support unequal ranked inputs for eltwise binary (#14803)"
  • Manually address an issue that local clang-tidy trips over
  • Add Qwen2-7B model on N150
  • Add support for new logical sharding + alignment in TensorLayout and tensor creation
  • Support dst_full_sync_en flag in the WH compute kernel config pybind
  • Revert "Add tunneler tests to ci"
  • #14634: Remove usage of ARCH_NAME sp constants MEM_L1_SIZE
  • tilize_op float32 access
  • Add build config struct to HAL with base FW and local init addrs
  • Update test_pgm_dispatch_script
  • #15123 Fix performance-for-range-copy
  • #0: Improve functional generality of ttnn.concat
  • #15167: explicitly check for rank 4 in reduce special cases
  • #14985: Update binary bw example, Use logical shape
  • Disable test from running on t3k
  • Update CODEOWNERS
  • #14985: Update bias_gelu_bw example, implementation
  • Update Lerp op
  • Update Qwen expected compile time
  • #14985: Update binary bw docs
  • #13676: Add unit tests for io_bw, tan_bw, and lerp
  • Move llama single-device demo tests to perf pipeline for dashboard support
  • #14826: reorganize crt startup
  • #13929: Update the input range for ldexp test
  • #0: Remove duplicate single-card demo llama3 tests
  • #0: Add eth dispatch to test_pgm_dispatch sweeps
  • Add a Debug preset
  • #13127: Add physical_shard_shape to ShardSpec attributes
  • #13720 Make reshape-view 0 cost when possible
  • Convert Hal into a Singleton
  • Add support for arrays in CoreRangeSet
  • #0: Fix typo causing spurious perf warnings for concat
  • Update perf and latest features for llm models (Nov 18)
  • #15145: Add support for multi-device tensors in grouped convolution weight preprocessing
  • [tt-train] Fix tt-train in main branch
  • #15144: Up timeout for mamba to an obscene number because we seem to take longer for some reason that I don't understand
  • #14985: Update examples for binary backward ops
  • #15228: Fix error message in BaseShape when index is out of bounds
  • Allow Concrete Hal Translation Units to have unique include paths
  • Update binary examples and supported params Set 2
  • Add TT-NN roadmap and overview
  • Add data formats to perf report
  • Mo/14961 remove op alignment check
  • Organize contributing docs in a subdir and add notes on clang-tidy
  • #13675: update supported range for tan_bw
  • Fix N150 llama3 demo CI tests to properly save perf information to superset
  • #0: Add sweep for rw bw test
  • [tt-train] Free graph during backward pass
  • Update binary examples
  • #14974: ttnn::empty Tensor creation API for MeshDevice
  • #14427: increase erisc kernel code size
  • Update remove-stale-branches.yaml
  • Consolidate action back into this repo
  • Fix usage of deleted branch
  • #15234: disable sharded tests on Blackhole until fix is introduced
  • #15140: Fix UAF error when MeshDevice.close_devices() not invoked
  • Fix s2i op when shard grid is larger than actual used grid
  • Add a padding-aware, interleaved, tiled transpose HC with a fused padding value parameter
  • Update examples of unary backward
  • Remove CMake variable UMD_HOME
  • #0: Remove alignment requirements for Row Major tensors
  • #15078: Update clamp_bw, clip_bw with min, max tensor
  • Add forward support for...
Read more

v0.53.0-rc49

23 Nov 01:59
0074d79
Compare
Choose a tag to compare
v0.53.0-rc49 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11982863618

  • no changes

v0.53.0-rc48

22 Nov 01:58
e7cd350
Compare
Choose a tag to compare
v0.53.0-rc48 Pre-release
Pre-release

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/11964798351

  • no changes