Skip to content

v0.51.0-rc8

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 20 Jul 02:17
· 2503 commits to main since this release
d75e0eb

Note

If you are installing from a release, please refer to the README, INSTALLATION instructions, and any other documentation packaged with the release, not on the main branch. There may be differences between the latest main and the previous release.

The changelog will now follow, showing the changes from last release.

This release was generated by the CI workflow https://github.com/tenstorrent/tt-metal/actions/runs/10016484977

📦 Uncategorized

  • Migrate Pad Device and All references
  • #0: Multi-CQ support for R-Chip
  • #10028: Remove skip and reduce test case for moreh_groupnorm test
  • #10005: Change input tensor parameter to optional in moreh_sum_backward
  • #10004: Revise bias tensor usage in moreh_linear_backward
  • #9663: support moreh_nll_loss_unreduced
  • #8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
  • #0: Update README.md grammar for idiomatic description of TT-NN
  • #9767: removed more no longer needed manually specified attributes for reflection
  • Add distributed layernorm kernel documentation
  • #10031: Fix -Werror=return-type error in composite_ops
  • #9492: update matmul path in CODEOWNERS
  • #9450: change silicon fixtures to session scope
  • Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
  • #9441: add all typecasts to unit test
  • #9801: Add cb alignment fix for blackhole that was missed in rebase
  • #9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
  • #10052: Add metal pack untilize test
  • Add ttnn matmul tests to TG unit tests
  • Add ssm_prefix_scan test coverage for N=16
  • Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
  • #8450: Cleanup items pending from PR #9068
  • #10030: fix moreh_nll_loss hang
  • #7736: Remove unused reduce dim & type from reduce_init*
  • #9871: Update backward files
  • #9874: Move Unary Backward ops to TTNN
  • Update op_perf_results
  • #9962: Enable flags for profiler globals in jit build
  • Added prefill mode for mamba modules
  • Increase timeout for Mamba full model tests
  • Support multiple user indices in paged_update_cache
  • #10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
  • Pack runtime arguments across brisc/ncrisc/trisc
  • Llama Demo Refactor
  • #5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
  • #0: Move t3k demo tests to perf pipeline because it requires perf governor
  • #5424: Delegated sfpu reciprocal calls to gs submodule functions
  • Add trace and multi cq implementations/tests for WH Resnet
  • #0: (MINOR) Update to v0.51.0
  • #0: bump python3.8 venv versioning since apt repos updated
  • #10099: fix semaphores init for packet mux/demux
  • #10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
  • Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
  • #0: Remove stray assert forcing single CQ on R-Chips
  • #9490: Replace tt_dnn op's usage in C++ with TTNN
  • #9874: Merge Next set of unary backward ops to TTNN
  • #10073: Move unary backward ops to TTNN
  • Unary backward op migration
  • #10087: update tt-umd submodule
  • #9959: Migrated pad to ttnn sweeps
  • Adding distributed layernorm to llama prefill
  • Add pytest xdist multiprocess to single-chip demo tests
  • Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
  • #10071 : Move second set of Unary Backward ops to TTNN
  • #10083: added tt::stl::json::to_json and tt::stl::json::from_json
  • #10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
  • #5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
  • #5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
  • #0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
  • Fix undefined memory bug in ssm_prefix_scan
  • removed weight copies from DRAM to L1
  • fix syntax issues with test dispatch workflow
  • #9609: Reorganize libs into ttnn
  • #10165: Fix build error with g++-12
  • Adding support for dram sharded matmuls
  • #10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
  • #10072: Move next set of Unary Backward ops to TTNN
  • #9082: ping individual falcon member since slack user group is not wo…
  • #8681: Add Floor, Trunc blocker ops
  • #9419: use memcpy to avoid mem misalignment
  • #10079: Move Unary Backward ops to TTNN
  • Migrate unary ops to TTNN
  • #9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
  • #10045: use struct for matmul parameter passing and update doc string
  • #10045: remove use_1d_systolic_array from ttnn matmul
  • Ngrujic/profiling
  • #9319: Upload benchmark data for t3k falcon 7b tests
  • Aliu/build opt
  • #10107: Fix hangs w/ launch_msg size >32bytes
  • [CCL] Making buffer size dynamic to input slice
  • #7617: remove failing experimental model test
  • #7618: delete failing experimental model test
  • #0: fix prefill CI for mamba
  • Move Mamba tests to wh_b0_only_eth pipeline
  • #9747: Implement ttnn::tilize in C++
  • Aliu/prevent aho tanking
  • #10045: fix up missed parameter change in mamba block model
  • #9490: Added ttnn support for unary ops py file
  • #10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
  • Update README.md
  • #0: Fix imports after tt_lib change
  • #10226: [Blackhole Bringup] Add new sfpu files
  • Suppress g++-12 build errors with -Wno flags
  • #0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
  • #10077: Migrate Unary comparison backward ops to TTNN with Overloading
  • #10175: Remove std::function and restructure ternary_bw
  • Falcon40b attn mask optimization
  • #10074: Move Unary backward ops to TTNN
  • Replace all TT Lib Unpad with TTNN Slice
  • #10082: Migrate unary bw ops to TTNN and remove std::function
  • #9715: Use build artifacts for profiler tests
  • #9021: adding resnet api into ci.
  • Update README.md
  • Move pad_on_host/unpad_on_host to host function in TTNN
  • #9874: Move polygamma_bw to TTNN
  • #5337: increase t3k frequent test timeout
  • Update falcon40b readme
  • #0: add layernorm rmsnorm pybind, move to ttnn
  • #0: Re-enable read cache in llama_model_optimized.
  • Update Mistral/Mixtral README files
  • #0: Update LLama2/3 readme with demo details
  • #0: resnet perf fix
  • Update Mamba README.md
  • OPT convs in RN50 to get better device perf
  • Increase timeout for N300 WH-only model pipeline
  • Prefill+Decode Demo Functional Implementation
  • [Falcon7b] Add wormhole demo perf mode and output verification tests
  • Update Falcon7/40b READMEs with details on model functionality and perf-mode
  • bump python 3.8 venv package version
  • Git bisect workflow on CI runners
  • #9613: scaffolding for weekly scheduled t3k perplexity tests
  • fix syntax issue with bisect script
  • #10231: Clean up t3k runs-on tags to minimum
  • #9490: Remove tt_eager unary ops and bindings
  • only build for arch that a dispatched workflow is running for
  • Allow overloading of job name with user-defined name for new dispatch workflows
  • #10242: Migrate unary bw ops with a generalized structure to TTNN
  • #10322: commented out failing t3k tests
  • #9491: Add structure for ternary ops in ttnn
  • Move downsample from tt_eager to ttnn
  • #10250: Migrate unary backward ops with a generalized structure to TTNN
  • #10280: Mistral README update
  • #9911: Add structure and migrate 20 composite unary ops
  • #0: fix rn50 block padding
  • #10300: get the correct operation id on subsequent run
  • #0: Move host tensor construction for halo into create_program to only happen on uncached runs
  • Flash decode v2
  • #9751: Restructure ttnn transformers to new folder structure
  • #10181: Disable test_reduce_h due to sporadic failures in slow dispatch
  • #10181: Disable test_reduce_h
  • Update README.md
  • move groupnorm from ttlib to ttnn
  • Update README.md
  • Update README.md - missing footnote
  • #0: Update ttnn resnet 2cq bound due to variability
  • #7528: add new ethernet microbenchmark, cleanup and re-enable others
  • #10238: migrate 7 unary ops into ttnn
  • #10333: Migrate prod_bw to TTNN
  • #10320: Enable falcon40b tests again
  • Add fused layernorm to falcon40b
  • #8342: Add info to matmul that tensors need to be on device
  • #10254: Enable preserve_fp32_precision flag in moreh_sum op
  • #10305: Add INSTALLING.md to release assets and create new custom release notes with an installation and pipeline ID
  • #9901: Refactoring moreh norm
  • Ngrujic/profiling
  • #9747: Implement ttnn.tilize(_with_val_padding) Python bindings
  • Add fixture for checking if in CI env and invoke Falcon7b demo tests with only filename
  • Add native caching for Mamba convolution/hidden states
  • Add skip-first option to op perf results script
  • #10083: added unit tests for JSON serialization
  • #10323: Reenable Llama perf test in CI
  • #9747: Delete tilize ops from tt_eager
  • #8764: More docs changes for WH readiness, Part 5
  • Move Mamba embeddings onto device
  • #10257: Add ttnn binding for UnaryWithParam, UnaryOpType
  • #10224: Update offsets for GO signal commands to use sizeof prefetch/dispatch cmd rather than pcie aligned size
  • #10166: add device mesh apis to query by row and col
  • #10380: Migrate set 1,2 complex ops to TTNN
  • #9527: continue removing bcast
  • #10334: Migrate 7 Type 2 unary complex bw ops to TTNN
  • #9806: Migrate Complex binary backward ops to TTNN
  • Relu max sweep migration - TTLIB to TTNN
  • Refactor: common RMSNorm for Mixtral and Mistral
  • #9874: Update clamp_bw to match PyTorch API
  • [Falcon7b] Add perplexity tests to new pipeline and restructure pytests to invoke with filenames
  • #10322: Re-enable Mixtral CI tests due to corrupted cache in CI machine
  • Migration of relu_min from tt_eager to ttnn
  • #10147: Migrate addcmul_bw to ttnn
  • #10403: Use is_ci_env fixture instead of env variable
  • Optimized MLP with W/H fracturing, sharding and ReduceScatter
  • Minor refactoring
  • Revert "Migration of relu_min from tt_eager to ttnn"
  • Add falcon40b demo test with token matching
  • #10130: Delete scan op in favor of ssm_prefix_scan
  • Support any sequence length in Mamba prefill
  • #10200: Update umd for mmio flush array overrun bugfix
  • #10467: Move tt_eager folder content into ttnn/experimental
  • Mixtral prefill 128-32k
  • #8865: Update reference times for dispatch time measuring
  • Migrate unpad sweep to TTNN
  • Update demo token matching reference for falcon40b
  • TTNN fmod sweeps added
  • #10147: Migrated eltwise_relu_min to ttnn
  • #0: Upgrade WH and T3000 WH KMD and FW versions to v1.27.1 and v80.10.0.0 respectively
  • #0: Move concatenate heads into ttnn experimental
  • #9628: Update Test files with golden function
  • #9490: Remove eltwise_unary in tt_eager
  • #10137: Add structure for composite binary ops in ttnn
  • Fix/re-enable a few watcher tests
  • #10471: Fixed GCC13 compile time issue
  • #7887: remove deprecated device_pool
  • Add llama galaxy mlp to TG frequent tests
  • #0: Update CODEOWNERS
  • Move Mamba demo to models/demos/wormhole
  • #10180: Use last column for FD on BH
  • #10052: [Blackhole bringup] Add pack untilize