请问下可以支持llama和bloom推理加速吗 #502

HuiResearch · 2023-04-18T11:23:52Z

No description provided.

Taka152 · 2023-04-20T10:48:57Z

It is not supported currently.

hexisyztem · 2023-04-24T09:49:15Z

It will be supported in May, and it is expected that V100-32G can be deployed.

frankxyy · 2023-06-12T09:31:40Z

@hexisyztem Hi, can flash attention be used on V100?

hexisyztem · 2023-06-12T09:47:04Z

As you can see in https://github.com/HazyResearch/flash-attention, flash attention doesn't support V100. From: ***@***.***> Date: Mon, Jun 12, 2023, 17:32 Subject: [External] Re: [bytedance/lightseq] 请问下可以支持llama和bloom推理加速吗 (Issue #502) To: ***@***.***> Cc: ***@***.***>, "Mention"< ***@***.***> @hexisyztem <https://github.com/hexisyztem> Hi, can flash attention be used on V100? — Reply to this email directly, view it on GitHub <#502 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAOAOACCGYKLGTXC26OSE3XK3OYRANCNFSM6AAAAAAXCPO7VA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback