Releases: Nexesenex/croco.cpp
Kobold.CPP_Frankenstein_v1.65c_b2843
Frankenstein 1.65c "Fork" of KoboldCPP Experimental up to the 10/05/2024, 20h GMT+2.
Based on Llama.CPP b2843.
- SmartContext preserved : to force its use, use --smartcontext and --noshift flags together in command line.
- More detailled benchmarks (better to rename your old benchmark results file).
- Jart's SILU/Softmax PR and JohannesGaessler's FP32 FA Vector Kernel PR merged.
All credits go to LostRuins and the other contributors to KoboldCPP, and to GGermanov and all the other contributors to LlamaCPP.
Both builds (Cublas 12.3 and standard) include OpenBLAS, CLBLAST, and Vulkan support provided by the devs.
What's Changed
- b2843 by @Nexesenex in #110
- ggml : rewrite silu and softmax for cpu by @Nexesenex in #111
- CUDA: add FP32 FlashAttention vector kernel by @Nexesenex in #113
Full Changelog: v1.65b_b2836...v1.65c_b2843
Kobold.CPP_Frankenstein_v1.65b_b2836
Frankenstein 1.65b "Fork" of KoboldCPP Experimental up to the 10/05/2024, 12h GMT+2.
Based on Llama.CPP b2836.
- SmartContext preserved : to force its use, use --smartcontext and --noshift flags together in command line.
All credits go to LostRuins and the other contributors to KoboldCPP, and to GGermanov and all the other contributors to LlamaCPP.
Both builds (Cublas 12.3 and standard) include OpenBLAS, CLBLAST, and Vulkan support provided by the devs.
Full Changelog: v1.65a_b2824...v1.65b_b2836
Kobold.CPP_Frankenstein_v1.65a_b2824
Frankenstein 1.65a "Fork" of KoboldCPP Experimental up to the 9/05/2024, 13h GMT+2.
Based on Llama.CPP b2824.
All credits go to LostRuins and the other contributors to KoboldCPP, and to GGermanov and all the other contributors to LlamaCPP.
Both builds (Cublas 12.3 and standard) include OpenBLAS, CLBLAST, and Vulkan support provided by the devs.
Full Changelog: v1.64b_b2775...v1.65a_b2824
Kobold.CPP_Frankenstein_v1.64b_b2775_FlashAtt
Frankenstein 1.64b "Fork" of KoboldCPP Experimental up to the 1/05/2024, 6h GMT+2.
Based on Llama.CPP b2775, with Flash Attention merged.
I didn't test it yet, just sharing for the impatient folks like me.
Edit : Flash Attention works.
Ex : On Llama 70b model 👍used with BBS128 FA, blas buffer size divided by 6.5 for the same performance than without FA.
At BBS256 FA, 1.5x performances for 1/3 of the blas buffer size of the BBS128 buffer without FA.
At BBS512 FA, 2x performances, and it's still a smaller blas buffer (around 2/3 size) than BBS128 without FA.
All credits go to LostRuins and the other contributors to KoboldCPP, and to GGermanov and all the other contributors to LlamaCPP.
Both builds (Cublas 12.3 and standard) include OpenBLAS, CLBLAST, and Vulkan support provided by the devs.
Full Changelog: v1.64a_b2749...v1.64b_b2775
Kobold.CPP_Frankenstein_v1.64a_b2749
Frankenstein 1.64a "Fork" of KoboldCPP Experimental up to the 27/04/2024, 17h GMT+2.
Based on LlamaCPP b2749.
Full Changelog: v1.63d_b2723...v1.64a_b2749
v1.63d_b2723
Full Changelog: v1.63c_b2716...v1.63d_b2723
Kobold.CPP_Frankenstein_v1.63c_b2716
Last release of KoboldCPP Experimental (23/04/2024, 20h GMT+2) with LCPP b2716 as a base.
Cuda version compiled wih Cublas 12.3
Full Changelog: v1.63b_b2699...v1.63c_b2716
Kobold.CPP_Frankenstein_v1.63b_b2699_FastMOE
Last release of KoboldCPP Experimental (19/04/2024, 20h GMT+2) with LCPP b2699 as a base.
Cuda version compiled wih Cublas 12.3.
Kobold.CPP_Frankenstein_v1.63a_b2690_fastMOE
KCPP experimental 1.63a, with LCPP b2690, up to date on the 18/04/2024, 00h01.
With Slaren's PR accelerating MOE.
Cuda version compiled with Cublas 12.3.
Full Changelog: v1.62.2a_b2650...v1.63a_b2690
Kobold.CPP_Frankenstein_v1.62.2b_b2650_fastMOE
Requested release, compiled with Cublas 12.3.
Included LlamaCPP b2650, KCPP 1.62.2 last experimental version of the 11/04/2024 at 20h GMT+2.
And Slaren's MOE speed-bump.
Untested, feedback about speed will be appreciated, to compare with to the last released Frankenstein version compiled with Cublas 12.3 (1.59d) before this one.
Koboldcpp_nocuda.exe : standard script of Lostuins.
Koboldcpp.exe : PSutils (high CPU priority mode) added.
PSutil is also integrated in my Cublas build.
What's Changed
- Sl/moe rework 2 bis by @Nexesenex in #105
- b2651 by @Nexesenex in #107
Full Changelog: v1.62.1a_b2637...v1.62.2a_b2650