Releases
v0.48.0
📦 Uncategorized
#7744 : Add support for non-4D tensor in moreh_sum, moreh_sum_backward
#5544 : Add output tensors parameter to moreh_nll_loss op
#5544 : Add output tensors parameter to moreh_sgd op
#5544 : Fix package build error
#5544 : Add output tensors parameter to moreh_linear op
#5544 : Prevent eager unit test failures
#7997 : Support non-4D tensor in moreh_softmax
#7816 : Bump SD perf target
#8098 : Remove temp buffer copying when reading from hugepage to host buffer
#0: Specify DEBUG_STATUS as a string literal instead of multiple chars
#8212 : Fix uneven shards for interleaved_to_sharded op
#0: Refactor unpad tile to modify rt args in place and remove dynamic…
#7838 : Add support for non-4D tensor in moreh_linear OPs
#0: Use split_work_for_tilize in both tilize and untilize
#8131 : resnet-50 fix for b20.
Add support for multiple parameters in EltwiseUnary
#7625 : Enable multicore for tilize with padding by default
Trace Support
#0: Switch set runtime args assertion for if kernel was placed on core to TT_ASSERT
#7179 : enabling test case. The issue was not reproducible on 8.12 dri…
#4625 : Multicore runs for untilize with unpadding on interleaved tensors
#0: Cache program cmds, convert cb configs from write linear to write packed
#0: Make skip and xfail optional in defining sweep tests
Shwetank tt/bcast op
#8364 : Disable implicit fallback for ttnn.pad
#8513 : Add slack notifications to several more pipelines
#0: Update common RT args to use no stride flag for packed cmd.
#0: Option to write compile_commands.json from CMake
#8718 : eltwise testing for bfloat8
Add support for bfloat8 input tensors in Mamba SSM block custom kernels
#8460 : Enable Clang-17
#0: Remove overhead in calling functions wrapped in tensor_impl_wrapper
#0: Updating the perf thresold to incorporate Merge back uneven reshard commit.
#6365 : Add ttnn host tests
#6365 : Revert "#6365 : Add ttnn host tests (#8210 )"
#4382 : fix GH reported vulnerabilities
#0: bump C++ timeout limit to 45 minutes
update unpad doc for slice generality
Convert Falcon7b tt_lib ops and tensors to ttnn.experimental
#6365 : Fix ttnn host wheel tests
Add git bisect script
#0: Move falcon40b ci unit tests to different pipeline
#8437 : remove default matmul program config
#0: Add myself to ttnn codeowners
#0: Update README.md to include mention of TTNN_CONFIG_OVERRIDES
#0: Fix typos and add TTNN_CONFIG_OVERRIDES parameter descriptions to readme
#0: Add basic sanity checks during matmul program config creation
#8907 : Sweep tests for tilize/untilize
#8902 : Fixed program caching bug in nlp load slice op and added additional test cases for the op
#8917 : Add sweep test for the fold op
#0: Properly support trivial single core case for 1D matmuls
#6343 : updated test_perf with test for bloom causal_lm
#6343 : Add functional_bloom test_demo
Update README.md
Enable optimised attention by default in falcon prefill.
Replace FreeList shared_ptr with local_shared_ptr
Add dummy_weights mode for mixtral tests
Refactor operation calls: Replace operation::run() with operation::launch_op()
Use HiFi2 to bump Falcon7b prefill PCC
#8902 : add input and attn_mask del
#8930 : Disable llama perf test
#0: Add third codeowner to matmul path
#0: Add create_venv.sh as environment option in installation instructions
#7083 : Composite conv fix for relu called after matmul
#7525 : Skip batch 7 metal BERT on WH B0 because it still hangs too often
#8871 : Add initial infra/support for dram sharding
#8531 : delete all makefiles
#0: Delete dead code from work_split.hpp
#8853 : Uplift SFPI to latest w/ BH support
#8725 : Warn user if kernel cache is enabled
#0: Minor test_prefetcher fixes
#5389 : Move ttnn.repeat to c++
#8131 : temp fix for PCC issue on W0.
Optimize e2e perf Falcon40b modifying layernorm
#0: Relax Falcon7b perf target
#0: Resolve segfault in llama async mode
Resnet Optimizations
Create Falcon7b perplexity test and utility functions for text-gen datasets
Revert "#8131 : temp fix for PCC issue on W0."
bmm dram sharded opt
#8943 : Clean up profiler python_env build flow
#8904 : Add slack notifications for T3000 unit-tests
Add unet shallow functional, performance and demo test files
#8932 : Multi-Device Mixtral Argmax Support
#8264 : Worker thread optimizations:
TTNN tests for bf8 with mk tiled scalar
Ihamer/7468 inject noc delays
Support changed csv row orderings in Mixtral's op_perf_results.py
Correct merge issue in op_perf_results.py
#0: Add kernel groups to test_pgm_dispatch
#0: Add docs requirements to python env cache key because it can change the environment as well
#0: Add helper function to create CBs
#8973 : Remove TT_METAL_ENV because we don't need it anymore
#5773 : Move SD model to demo folder
#6938 : Implement softplus as a single kernel
Model team/rotary embeddings llama
#8735 : Fix hw/inc/blackhole files for compilation
Improve Mixtral perf with ttlib
Update README.md
#3712 : fix old version of GN test
#0: Don't error on unused functions in compiler call
Revert " #8904 : Add slack notifications for T3000 unit-tests"
Rtawfik/bh llk api
#0: Added interactive demo
Move Falcon7b before Mixtral in demo pipeline to workaround issue
#8112 : Add support for ND tensors to matmul
#0: fix dram read benchmark
Fix bug in utility_functions::Profiler
Remove 1x1 matmul fallback on convolution and generalize convo…
#5389 : Remove ttnn.split
#8767 : decouple build folder name from build.cpp
#8735 : Update common flags for BH build after sfpi module update
#8895 : Fix ttnn.as_tensor(..) method for placing tensors on-device
#8539 : Add cq_id to run_operation function args
#8632 : Support fp32 dest acc en in moreh_sum and moreh_sum_backward
#5044 : Add optional output tensor and remove autoformat in eltwise binary ops
#8895 : Fix failing regression test in dump_tensor(...) API
More Resnet Optimizations
#4858 : add typecast fp32 to uint32 op
#8995 : refactoring moreh arange
#0: Add ccache option to build_metal.sh
Update Mixtral perf figures
#8349 : Use BFP4_B for attention mask in falcon7b optimised prefill.
#0: Add CODEOWNERS for build_metal.sh
Rtawfik/add binary reuse metal
Update watcher.rst - use double backticks
Falcon40b tt_lib to ttnn.experimental
#0: fix dram sharded program cache
#7083 : New halo fix for enabled program cache
#9051 : Enable Llama model perf test
#8764 : Single card WH demo tests
#8764 : Various docs fixes for WH release
#0: Correct script locations for nightly single card
#8764 : Use new device_l1_small_size fixture for SD demo interactive test
#9059 : Update matmul test pcc
#0: Ensure weka mount is active for demo tests otherwise it won't run
#0: remove reserve to avoid bad alloc
#8764 : Separate n150/n300 demo tests to not run BERT 11 on N150
Remove unnecessary llk sfpu param files
#9059 : Add fallback for getting matmul program config
Add grouped convolution support
#8282 : Support non-4d tensor and fp32_dest_acc_en for moreh nllloss backward
#8976 : moreh_getitem receive signed integer index tensors
#9049 : fix moreh_sgd callback and add callback test
#0: Remove argmax multi-device test due to segfault
#7724 : Add prototype for autonomous streams for use in tunneller
#9036 : GS & BH --> Combine llk param files using variable args
#0: optimize allgather for small tensor sizes
Enable weight caching for long running Mamba tests
#5389 : removed early return from validate when enable_fast_runtime_mo…
Removed unucessary ttnn.to_device() from Mixtral code
Add 2 cq implementation for Resnet
#9084 : Rename dockerfile and added virtualenv installation
#0: Watcher interval to not include polling time
#0: Revert "#8264 : Worker thread optimizations:"
#5389 : disabled failing moreh tests
#5389 : disabled failing moreh tests
#5389 : disabled failing moreh tests
#0: Update Resnet perf numbers
Split dispatcher commands into packets+prefetcher relay_linear bug fix and test improvments
#6448 : re-enable all-gather bidir for dim 0,1
#8890 : Reduce size of pack_src|dst_format constexprs
#0: merge all kernels into one group
#7724 : Disable a test to reduce runtime
ttnn multi-chip changes for galaxy support
#9026 : Fix FD dispatcher wait on wrapped value
#0: Add back Async Mode optimizations
Add support for bfloat8 activations in Mamba
#9118 : Fix moreh getitem, moreh nllloss validation error
Update ViT E2E number in README.md
#4858 : enable typecast fp16b to uint16
#8540 : Upgrade eltwise binary ops to support queue_id /output_tensor / uint output dtype
#9095 : implement callback helper function
#5044 : Add optional output to where op
#0: enable multi-device tensor support for moreh sum op
#5337 : Mixtral dense matmul after all-gather
Update Mamba decode performance metrics
#8683 : Add Unary right shift
Snijjar/issue 7724
#5044 : add optional output to BW ops EQ, add, addalpha, mul
build UMD with same compiler used to compile metal and remove clang 6 as a dependency
#0: change silicon param to session scope
Mo/8223 fd2 dispatch core profiler support
#9006 : single-core topk extension to include larger width and height
#9088 : fix ttnn_falcon_7b single-device regression in decoder module
#7586 : Create unstable branch of WH single card nightly FD
#9143 : BH -> Remove unused reduce args
#8563 : sweep split_query_key_value_and_split_heads, split and concat
#8407 : Remove 1x1 matmul fallback on convolution and generalize convo…
#4252 : Update to C++20
#9110 : Move typecast to ttnn
Update TTNN sweeps - concatenate heads, embeddings
#9016 : adjust nightly t3000 demo test pipeline to run Mon/Wed/Fri
#9088 : fix ttnn_falcon_7b single-device regression in attention
#9167 : sped up compute program hash
#9109 : Add q_id to Eltwise binary EQ
#8662 : add initial argmax op single core kernel implementation
#8424 : Add new llk-wormhole-b0 commit: remove assert for fp32 zeroacc
#9059 : adjust matmul parameters for rounding up in some scenarios
#5389 : Move ttnn.repeat_interleave to c++
#9167 : updated llama3 ops to not use attributes method and instead to use attribute_names + attributes_values
#8681 : Add Floor , Trunc dependant ops
Fuse Mamba block residual projection with activation
#9167 : sped up compute program hash
Add trace 2cq version of Resnet
#9167 : changed program cache to use unique_any as the value type
#8683 : Add Unary left shift
Mixtral: Add EoS token stop to demo
#0: Update Falcon7b CODEOWNERS
#8764 : Part 2 fixes for docs for wormhole readiness
Correctly block for the current EP when blocking=true
Applying Llama2 Decode and Prefill Kernels to experimentals folder
#9198 : Fix minor regression in some nightly tests due to small packet optimization
Fix softmax sharded program cache hit
#0: add suuport for in1 dram sharded matmul2d
#0: Fix repack_weights.py script for llama writing params.json contents using out_dir as a file
#8965 : deallocate all buffers on device when closing
#0: Update noc_async_read/write docs to not specify only dram coords
#9137 : clean target will now remove entire built
folder
#9142 : BH -> Fix pack api, add constant vector
Standardize llk sfpu inits
#0: Fix jupyterlab pinned to two different versions
#4858 : add uint16 to fp16b typecast support
#0: pad subbblock size, allow mixtral shapes reach 240GB/s
#7083 : conv config cleanup in python and c++ changes
#0: Add option to validate program binaries on device before enqueuing program in debug mode
#7822 : Fix conditionals for bmm multi core reuse optimized for when to update rt args
#8764 : Set TTNN_CONFIG_OVERRIDES if it exists in the ttnn workflow
#9270 : tracy linking error fix
#9200 : Use project paths in CMake
#0: Make numa node based binding opt-in
#5337 : Add extrapolation and skipping to op_perf_results
Update Mistral perf figures
Improve mistral perf test for 1024 seqlen and on-device profiling
Fix log message typo (importting -> importing)
#7586 : Move current wh b0 only single-card nightly tests to the ln model
[Falcon7b] Add support for 2k kv-cache size for decode l1-sharded configuration
#0: Update Llama experimental readme
#8725 : Update warning for persistent kernel cache
[Falcon7b] Add option to run huggingface model in perplexity test, and add perplexity test to demo ci
#0: Skip failing resnet tests
#8658 : Migrate composite unary ops to C++
#5389 : updated ShardSpec to use attribute_names + attribute_values instead of attributes
#8764 : Run ttnn ipynb tutorials on N150/N300
#8837 : Fix Resnet trace 2cq version to write inputs on cq 1
#753 : Syncing device host times for tracy profiler
#8940 : Get rid of source code directories in local environment to ensure that end to end environment is valid
Fix Mixtral ttnn.eq dtype
#8764 : ttnn examples in ci
Binary dest accumulation
Move program configs out of runtime codepath.
#0: Fix import error for skipping ttnn resnet tests
#0: opt dram u-bench to 267GB/s
Add ttnn argmax op
#0: Cleanup bmm multi core reuse optimized ORTAs
#9080 : Migrate pipeline owners
TTNN split removal fix
Update sweeps documentation
You can’t perform that action at this time.