Merging the new op: nlp_kv_cache_unpad_to_sharded into main #8459

caixunshiren · 2024-05-14T18:14:06Z

This new op fused the unpadding and following i2s op together, with optimized noc_async_read_barrier placement, to speed up llama3 model's kv cache unpadding by 5-6x. The same op can be applied to many other transformer models.

davorchap · 2024-05-14T18:17:35Z

Perhaps a more generic name
kv_cache_load_slice ?

caixunshiren · 2024-05-14T18:20:02Z

Perhaps a more generic name
kv_cache_load_slice ?

I think this is a great name! @cglagovichTT what do you think?

...ry/nlp_tms/kernels/dataflow/reader_unary_unpad_dims_interleaved_start_id_shard_optimized.cpp

tt_eager/tt_dnn/op_library/nlp_tms/nlp_kv_cache_unpad_to_sharded.cpp

tt_eager/tt_dnn/op_library/nlp_tms/nlp_tms.hpp

tt_eager/tt_lib/csrc/tt_lib_bindings_tensor_custom_bmm_ops.cpp

tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_kv_cache_load_slice.py

cglagovichTT · 2024-05-14T19:54:24Z

Looks good!

caixunshiren · 2024-05-14T21:07:38Z

All Post commit run passed. Good to merge. Would be good if you can approve it? @TT-BrianLiu @tt-aho

TT-BrianLiu

Approved. Please address comments

tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_kv_cache_load_slice.py

…ng on long seqlen; removed debug printout; added bfp8 test cases

…unecessary variables and conditions in host code

…pile time args

…leaned up host side code

caixunshiren · 2024-05-16T15:43:51Z

All post commit tests passed. Rebased to main and merged

caixunshiren requested review from TT-BrianLiu, tt-aho and mywoodstock as code owners May 14, 2024 18:14

caixunshiren requested a review from cglagovichTT May 14, 2024 18:14

caixunshiren self-assigned this May 14, 2024

cglagovichTT reviewed May 14, 2024

View reviewed changes

...ry/nlp_tms/kernels/dataflow/reader_unary_unpad_dims_interleaved_start_id_shard_optimized.cpp Show resolved Hide resolved

cglagovichTT requested changes May 14, 2024

View reviewed changes

caixunshiren force-pushed the nlp-unpad2sharded branch from 56333e8 to 6e96e86 Compare May 14, 2024 19:21

caixunshiren requested a review from cglagovichTT May 14, 2024 19:45

caixunshiren temporarily deployed to dev May 14, 2024 19:46 — with GitHub Actions Inactive

cglagovichTT requested changes May 14, 2024

View reviewed changes

tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_kv_cache_load_slice.py Outdated Show resolved Hide resolved

cglagovichTT approved these changes May 14, 2024

View reviewed changes

caixunshiren force-pushed the nlp-unpad2sharded branch from 7d0a8d7 to 945104e Compare May 14, 2024 21:08

caixunshiren temporarily deployed to dev May 14, 2024 21:08 — with GitHub Actions Inactive

caixunshiren force-pushed the nlp-unpad2sharded branch from 945104e to 0c3704d Compare May 15, 2024 17:24

TT-BrianLiu approved these changes May 15, 2024

View reviewed changes

caixunshiren added 9 commits May 16, 2024 11:10

#7379: Added new nlp tm, unpad to sharded for kv cache

0626976

#7379: added support for arbitrary number of batch-heads; resolved ha…

76e80cf

…ng on long seqlen; removed debug printout; added bfp8 test cases

#7379: optimized kernel to use less host side arguements and removed …

e6eac31

…unecessary variables and conditions in host code

#7379: clean up host side code and moved static runtime args into com…

6276f21

…pile time args

#7379: improved kernel perf with num core dependent async barriers; c…

653a499

…leaned up host side code

#7379: resolved formatting issues

e69e825

#7379: renamed op to nlp_kv_cache_load_slice

480cbfc

#7379: changed copy right to 2024

43930e6

#7379: Added proper testing method for program caching

17ae8f4

caixunshiren force-pushed the nlp-unpad2sharded branch from 677b273 to 17ae8f4 Compare May 16, 2024 15:10

caixunshiren temporarily deployed to dev May 16, 2024 15:13 — with GitHub Actions Inactive

caixunshiren temporarily deployed to dev May 16, 2024 15:17 — with GitHub Actions Inactive

caixunshiren temporarily deployed to production May 16, 2024 15:35 — with GitHub Actions Inactive

caixunshiren merged commit 2399c77 into main May 16, 2024
77 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging the new op: nlp_kv_cache_unpad_to_sharded into main #8459

Merging the new op: nlp_kv_cache_unpad_to_sharded into main #8459

caixunshiren commented May 14, 2024

davorchap commented May 14, 2024

caixunshiren commented May 14, 2024

cglagovichTT commented May 14, 2024

caixunshiren commented May 14, 2024

TT-BrianLiu left a comment

caixunshiren commented May 16, 2024

Merging the new op: nlp_kv_cache_unpad_to_sharded into main #8459

Merging the new op: nlp_kv_cache_unpad_to_sharded into main #8459

Conversation

caixunshiren commented May 14, 2024

davorchap commented May 14, 2024

caixunshiren commented May 14, 2024

cglagovichTT commented May 14, 2024

caixunshiren commented May 14, 2024

TT-BrianLiu left a comment

Choose a reason for hiding this comment

caixunshiren commented May 16, 2024