Skip to content

v0.51.0-rc3

Pre-release
Pre-release
Compare
Choose a tag to compare
@github-actions github-actions released this 16 Jul 02:20
· 2728 commits to main since this release
e1835e2

📦 Uncategorized

  • Migrate Pad Device and All references
  • #0: Multi-CQ support for R-Chip
  • #10028: Remove skip and reduce test case for moreh_groupnorm test
  • #10005: Change input tensor parameter to optional in moreh_sum_backward
  • #10004: Revise bias tensor usage in moreh_linear_backward
  • #9663: support moreh_nll_loss_unreduced
  • #8865: Switch ported ops from tt_lib to ttnn for host dispatch time m…
  • #0: Update README.md grammar for idiomatic description of TT-NN
  • #9767: removed more no longer needed manually specified attributes for reflection
  • Add distributed layernorm kernel documentation
  • #10031: Fix -Werror=return-type error in composite_ops
  • #9492: update matmul path in CODEOWNERS
  • #9450: change silicon fixtures to session scope
  • Uplift UMD to grab support for configuring static TLBs and Hugepage for BH
  • #9441: add all typecasts to unit test
  • #9801: Add cb alignment fix for blackhole that was missed in rebase
  • #9973: Fix addrmod for reduce scalar, port over missing narrow tile c…
  • #10052: Add metal pack untilize test
  • Add ttnn matmul tests to TG unit tests
  • Add ssm_prefix_scan test coverage for N=16
  • Add PyBind to TTNN Slice (Formerly Referred to Unpad in TT Lib)
  • #8450: Cleanup items pending from PR #9068
  • #10030: fix moreh_nll_loss hang
  • #7736: Remove unused reduce dim & type from reduce_init*
  • #9871: Update backward files
  • #9874: Move Unary Backward ops to TTNN
  • Update op_perf_results
  • #9962: Enable flags for profiler globals in jit build
  • Added prefill mode for mamba modules
  • Increase timeout for Mamba full model tests
  • Support multiple user indices in paged_update_cache
  • #10085: Make ttnn::Buffer deallocate execute without querying a potentially destroyed buffer instance
  • Pack runtime arguments across brisc/ncrisc/trisc
  • Llama Demo Refactor
  • #5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions
  • #0: Move t3k demo tests to perf pipeline because it requires perf governor
  • #5424: Delegated sfpu reciprocal calls to gs submodule functions
  • Add trace and multi cq implementations/tests for WH Resnet
  • #0: (MINOR) Update to v0.51.0
  • #0: bump python3.8 venv versioning since apt repos updated
  • #10099: fix semaphores init for packet mux/demux
  • #10112: Drop hard pin for installation instructions for python3.8-venv in dependencies
  • Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions"
  • #0: Remove stray assert forcing single CQ on R-Chips
  • #9490: Replace tt_dnn op's usage in C++ with TTNN
  • #9874: Merge Next set of unary backward ops to TTNN
  • #10073: Move unary backward ops to TTNN
  • Unary backward op migration
  • #10087: update tt-umd submodule
  • #9959: Migrated pad to ttnn sweeps
  • Adding distributed layernorm to llama prefill
  • Add pytest xdist multiprocess to single-chip demo tests
  • Revert "Revert "#5424: Delegated sfpu reciprocal calls to wh_b0 submodule functions""
  • #10071 : Move second set of Unary Backward ops to TTNN
  • #10083: added tt::stl::json::to_json and tt::stl::json::from_json
  • #10086: Add logic for splitting cmds that exceed the subcmd limit into separate cmds for semaphores
  • #5424: Delegated sqrt api call to thirdparty gs submodule sqrt call
  • #5424: Delegated sfpu api call to sqrt for wh to submodule sqrt call
  • #0: Fix galaxy eth dispatch init to only init the specified number of cqs (galaxy only supports single cq)
  • Fix undefined memory bug in ssm_prefix_scan
  • removed weight copies from DRAM to L1
  • fix syntax issues with test dispatch workflow
  • #9609: Reorganize libs into ttnn
  • #10165: Fix build error with g++-12
  • Adding support for dram sharded matmuls
  • #10076: Migrate Unary bw ops and replace tt_eager ops with ttnn ops
  • #10072: Move next set of Unary Backward ops to TTNN
  • #9082: ping individual falcon member since slack user group is not wo…
  • #8681: Add Floor, Trunc blocker ops
  • #9419: use memcpy to avoid mem misalignment
  • #10079: Move Unary Backward ops to TTNN
  • Migrate unary ops to TTNN
  • #9945: Skip SD for nightly FD, device perf tests, and single-card demos as it hangs on di/dt
  • #10045: use struct for matmul parameter passing and update doc string
  • #10045: remove use_1d_systolic_array from ttnn matmul
  • Ngrujic/profiling
  • #9319: Upload benchmark data for t3k falcon 7b tests
  • Aliu/build opt
  • #10107: Fix hangs w/ launch_msg size >32bytes
  • [CCL] Making buffer size dynamic to input slice
  • #7617: remove failing experimental model test
  • #7618: delete failing experimental model test
  • #0: fix prefill CI for mamba
  • Move Mamba tests to wh_b0_only_eth pipeline
  • #9747: Implement ttnn::tilize in C++
  • Aliu/prevent aho tanking
  • #10045: fix up missed parameter change in mamba block model
  • #9490: Added ttnn support for unary ops py file
  • #10101: [Blackhole Bringup] Revert Zeroacc to legacy behaviour
  • Update README.md
  • #0: Fix imports after tt_lib change
  • #10226: [Blackhole Bringup] Add new sfpu files
  • Suppress g++-12 build errors with -Wno flags
  • #0: Fix BH regression caused by unaligned L1_UNRESERVED_BASE
  • #10077: Migrate Unary comparison backward ops to TTNN with Overloading
  • #10175: Remove std::function and restructure ternary_bw
  • Falcon40b attn mask optimization
  • #10074: Move Unary backward ops to TTNN
  • Replace all TT Lib Unpad with TTNN Slice
  • #10082: Migrate unary bw ops to TTNN and remove std::function
  • #9715: Use build artifacts for profiler tests
  • #9021: adding resnet api into ci.
  • Update README.md
  • Move pad_on_host/unpad_on_host to host function in TTNN
  • #9874: Move polygamma_bw to TTNN
  • #5337: increase t3k frequent test timeout
  • Update falcon40b readme
  • #0: add layernorm rmsnorm pybind, move to ttnn
  • #0: Re-enable read cache in llama_model_optimized.
  • Update Mistral/Mixtral README files
  • #0: Update LLama2/3 readme with demo details
  • #0: resnet perf fix
  • Update Mamba README.md
  • OPT convs in RN50 to get better device perf
  • Increase timeout for N300 WH-only model pipeline
  • Prefill+Decode Demo Functional Implementation
  • [Falcon7b] Add wormhole demo perf mode and output verification tests
  • Update Falcon7/40b READMEs with details on model functionality and perf-mode
  • bump python 3.8 venv package version
  • Git bisect workflow on CI runners
  • #9613: scaffolding for weekly scheduled t3k perplexity tests
  • fix syntax issue with bisect script
  • #10231: Clean up t3k runs-on tags to minimum
  • #9490: Remove tt_eager unary ops and bindings
  • only build for arch that a dispatched workflow is running for