Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync Prepare pr of minicpm v2.6 #26

Merged
merged 370 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
370 commits
Select commit Hold shift + click to select a range
b81ba1f
finetune: Rename command name in README.md (#8343)
standby24x7 Jul 7, 2024
d39130a
py : use cpu-only torch in requirements.txt (#8335)
compilade Jul 7, 2024
b504008
llama : fix n_rot default (#8348)
ggerganov Jul 7, 2024
905942a
llama : support glm3 and glm4 (#8031)
youth123 Jul 7, 2024
f7cab35
gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#…
mofosyne Jul 7, 2024
f1948f1
readme : update bindings list (#8222)
andy-tai Jul 7, 2024
4090ea5
ci : add checks for cmake,make and ctest in ci/run.sh (#8200)
AlexsCode Jul 7, 2024
a8db2a9
Update llama-cli documentation (#8315)
dspasyuk Jul 7, 2024
3fd62a6
py : type-check all Python scripts with Pyright (#8341)
compilade Jul 7, 2024
04ce3a8
readme : add supported glm models (#8360)
youth123 Jul 8, 2024
ffd0079
common : avoid unnecessary logits fetch (#8358)
kevmo314 Jul 8, 2024
6f0dbf6
infill : assert prefix/suffix tokens + remove old space logic (#8351)
ggerganov Jul 8, 2024
470939d
common : preallocate sampling token data vector (#8363)
kevmo314 Jul 8, 2024
fde13b3
feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854)
balisujohn Jul 2, 2024
6847d54
tests : fix whitespace (#0)
ggerganov Jul 8, 2024
2ee44c9
sync : ggml
ggerganov Jul 8, 2024
3f2d538
scripts : fix sync for sycl
ggerganov Jul 8, 2024
2ec846d
sycl : fix powf call in device code (#8368)
Alcpz Jul 8, 2024
c4dd11d
readme : fix web link error [no ci] (#8347)
b4b4o Jul 8, 2024
a130ecc
labeler : updated sycl to match docs and code refactor (#8373)
Alcpz Jul 8, 2024
7fdb6f7
flake.lock: Update (#8342)
ggerganov Jul 8, 2024
7d0e23d
gguf-py : do not use internal numpy types (#7472)
compilade Jul 9, 2024
9beb2dd
readme : fix typo [no ci] (#8389)
daghanerdonmez Jul 9, 2024
9925ca4
cmake : allow external ggml (#8370)
iboB Jul 9, 2024
5b0b8d8
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372)
Alcpz Jul 9, 2024
a03e8dd
make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392)
JohannesGaessler Jul 9, 2024
e500d61
Deprecation warning to assist with migration to new binary names (#8283)
HanClinto Jul 9, 2024
fd560fe
Update README.md to fix broken link to docs (#8399)
andysalerno Jul 9, 2024
a59f8fd
Server: Enable setting default sampling parameters via command-line (…
HanClinto Jul 9, 2024
8f0fad4
py : fix extra space in convert_hf_to_gguf.py (#8407)
laik Jul 10, 2024
e4dd31f
py : fix converter for internlm2 (#8321)
RunningLeon Jul 10, 2024
a8be1e6
llama : add assert about missing llama_encode() call (#8400)
fairydreaming Jul 10, 2024
7a80710
msvc : silence codecvt c++17 deprecation warnings (#8395)
iboB Jul 10, 2024
cc61948
llama : C++20 compatibility for u8 strings (#8408)
iboB Jul 10, 2024
83321c6
gguf-py rel pipeline (#8410)
monatis Jul 10, 2024
0f1a39f
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)
Dibakar Jul 10, 2024
6b2a849
ggml : move sgemm sources to llamafile subfolder (#8394)
ggerganov Jul 10, 2024
f4444d9
[SYCL] Use multi_ptr to clean up deprecated warnings (#8256)
Jul 10, 2024
dd07a12
Name Migration: Build the deprecation-warning 'main' binary every tim…
HanClinto Jul 10, 2024
278d0e1
Initialize default slot sampling parameters from the global context. …
HanClinto Jul 11, 2024
7a221b6
llama : use F32 precision in Qwen2 attention and no FA (#8412)
ggerganov Jul 11, 2024
9a55ffe
tokenize : add --no-parse-special option (#8423)
compilade Jul 11, 2024
a977c11
gitignore : deprecated binaries
ggerganov Jul 11, 2024
808aba3
CUDA: optimize and refactor MMQ (#8416)
JohannesGaessler Jul 11, 2024
b078c61
cuda : suppress 'noreturn' warn in no_device_code (#8414)
danbev Jul 11, 2024
3686456
ggml : add NVPL BLAS support (#8329) (#8425)
nicholaiTukanov Jul 11, 2024
b549a1b
[SYCL] fix the mul_mat_id ut issues (#8427)
ClarkChin08 Jul 12, 2024
370b1f7
ggml : minor naming changes (#8433)
ggerganov Jul 12, 2024
71c1121
examples : sprintf -> snprintf (#8434)
ggerganov Jul 12, 2024
5aefbce
convert : remove fsep token from GPTRefactForCausalLM (#8237)
jpodivin Jul 12, 2024
8a4441e
docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441)
kriation Jul 12, 2024
c3ebcfa
server : ensure batches are either all embed or all completion (#8420)
iamlemec Jul 12, 2024
f532262
llama : suppress unary minus operator warning (#8448)
danbev Jul 12, 2024
6af51c0
main : print error on empty input (#8456)
ggerganov Jul 12, 2024
4e24cff
server : handle content array in chat API (#8449)
ggerganov Jul 12, 2024
c917b67
metal : template-ify some of the kernels (#8447)
ggerganov Jul 13, 2024
17eb6aa
vulkan : cmake integration (#8119)
bandoti Jul 13, 2024
fa79495
llama : fix pre-tokenization of non-special added tokens (#8228)
compilade Jul 14, 2024
e236528
gguf_hash.py: Add sha256 (#8470)
mofosyne Jul 14, 2024
73cf442
llama : fix Gemma-2 Query scaling factors (#8473)
ggerganov Jul 14, 2024
aaab241
flake.lock: Update (#8475)
ggerganov Jul 14, 2024
090fca7
pydantic : replace uses of __annotations__ with get_type_hints (#8474)
compilade Jul 14, 2024
bda62d7
Vulkan MMQ Fix (#8479)
0cc4m Jul 15, 2024
3dfda05
llama : de-duplicate deepseek2 norm
ggerganov Jul 15, 2024
16bdfa4
[SYCL] add concat through dim 1/2 (#8483)
airMeng Jul 15, 2024
fc690b0
docs: fix links in development docs [no ci] (#8481)
NikolaiLyssogor Jul 15, 2024
9104bc2
common : add --no-cont-batching arg (#6358)
ggerganov Jul 15, 2024
f17f39f
server: update README.md with llama-server --help output [no ci] (#8472)
maruel Jul 15, 2024
8fac431
ggml : suppress unknown pragma 'GCC' on windows (#8460)
danbev Jul 15, 2024
4db8f60
fix ci (#8494)
ngxson Jul 15, 2024
97bdd26
Refactor lora adapter support (#8332)
ngxson Jul 15, 2024
7acfd4e
convert_hf : faster lazy safetensors (#8482)
compilade Jul 16, 2024
0efec57
llama : valign + remove unused ftype (#8502)
ggerganov Jul 16, 2024
37b12f9
export-lora : handle help argument (#8497)
sbonds Jul 16, 2024
1666f92
gguf-hash : update clib.json to point to original xxhash repo (#8491)
mofosyne Jul 16, 2024
5e116e8
make/cmake: add missing force MMQ/cuBLAS for HIP (#8515)
JohannesGaessler Jul 16, 2024
d65a836
llama : disable context-shift for DeepSeek v2 (#8501)
ggerganov Jul 17, 2024
da3913d
batched: fix n_predict parameter (#8527)
msy-kato Jul 17, 2024
1bdd8ae
[CANN] Add Ascend NPU backend (#6035)
hipudding Jul 17, 2024
30f80ca
CONTRIBUTING.md : remove mention of noci (#8541)
mofosyne Jul 17, 2024
b328344
build : Fix docker build warnings (#8535) (#8537)
amochkin Jul 17, 2024
e02b597
lookup: fibonacci hashing, fix crashes (#8548)
JohannesGaessler Jul 17, 2024
3807c3d
server : respect `--special` cli arg (#8553)
RunningLeon Jul 18, 2024
672a6f1
convert-*.py: GGUF Naming Convention Refactor and Metadata Override R…
mofosyne Jul 18, 2024
0d2c732
server: use relative routes for static files in new UI (#8552)
EZForever Jul 18, 2024
705b7ec
cmake : install all ggml public headers (#8480)
65a Jul 18, 2024
a15ef8f
CUDA: fix partial offloading for ne0 % 256 != 0 (#8572)
JohannesGaessler Jul 18, 2024
3d0e436
convert-*.py: add general.name kv override (#8571)
mofosyne Jul 19, 2024
f299aa9
fix: typo of chatglm4 chat tmpl (#8586)
thxCode Jul 19, 2024
b57eb9c
ggml : add friendlier error message to fopen errors (#8575)
HanClinto Jul 19, 2024
be0cfb4
readme : fix server badge
ggerganov Jul 19, 2024
d197545
llama : bump max layers from 256 to 512 (#8530)
ggerganov Jul 19, 2024
57b1d4f
convert-*.py: remove add_name from ChatGLMModel class (#8590)
mofosyne Jul 19, 2024
87e397d
ggml : fix quant dot product with odd number of blocks (#8549)
slaren Jul 19, 2024
c3776ca
gguf_dump.py: fix markddown kv array print (#8588)
mofosyne Jul 20, 2024
69b9945
llama.swiftui: fix end of generation bug (#8268)
ho2103 Jul 20, 2024
9403622
llama : add support for Tekken pre-tokenizer (#8579)
m18coppola Jul 20, 2024
07283b1
gguf : handle null name during init (#8587)
ggerganov Jul 20, 2024
69c487f
CUDA: MMQ code deduplication + iquant support (#8495)
JohannesGaessler Jul 20, 2024
c69c630
convert_hf : fix Gemma v1 conversion (#8597)
compilade Jul 21, 2024
328884f
gguf-py : fix some metadata name extraction edge cases (#8591)
compilade Jul 21, 2024
22f281a
examples : Rewrite pydantic_models_to_grammar_examples.py (#8493)
maruel Jul 21, 2024
45f2c19
flake.lock: Update (#8610)
ggerganov Jul 21, 2024
b7c11d3
examples: fix android example cannot be generated continuously (#8621)
devojony Jul 22, 2024
04bab6b
ggml: fix compile error for RISC-V (#8623)
zqb-all Jul 22, 2024
6281544
server : update doc to clarify n_keep when there is bos token (#8619)
kaetemi Jul 22, 2024
50e0535
llama : add Mistral Nemo inference support (#8604)
iamlemec Jul 22, 2024
e093dd2
tests : re-enable tokenizer tests (#8611)
ggerganov Jul 22, 2024
6f11a83
llama : allow overrides for tokenizer flags (#8614)
ggerganov Jul 22, 2024
566daa5
*.py: Stylistic adjustments for python (#8233)
jpodivin Jul 22, 2024
d94c6e0
llama : add support for SmolLm pre-tokenizer (#8609)
Stillerman Jul 22, 2024
081fe43
llama : fix codeshell support (#8599)
hankeke303 Jul 22, 2024
063d99a
[SYCL] fix scratch size of softmax (#8642)
luoyu-intel Jul 23, 2024
e7e6487
contrib : clarify PR squashing + module names (#8630)
ggerganov Jul 23, 2024
46e4741
Allow all RDNA2 archs to use sdot4 intrinsic (#8629)
jeroen-mostert Jul 23, 2024
751fcfc
Vulkan IQ4_NL Support (#8613)
0cc4m Jul 23, 2024
938943c
llama : move vocab, grammar and sampling into separate files (#8508)
ggerganov Jul 23, 2024
64cf50a
sycl : Add support for non-release DPC++ & oneMKL (#8644)
joeatodd Jul 23, 2024
b841d07
server : fix URL.parse in the UI (#8646)
0x4139 Jul 23, 2024
de28008
examples : Fix `llama-export-lora` example (#8607)
ngxson Jul 23, 2024
b115105
add llama_lora_adapter_clear (#8653)
ngxson Jul 24, 2024
79167d9
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667)
joeatodd Jul 24, 2024
96952e7
llama : fix `llama_chat_format_single` for mistral (#8657)
ngxson Jul 24, 2024
3a7ac53
readme : update UI list [no ci] (#8505)
SommerEngineering Jul 24, 2024
f19bf99
Build Llama SYCL Intel with static libs (#8668)
joeatodd Jul 24, 2024
68504f0
readme : update games list (#8673)
MorganRO8 Jul 24, 2024
8a4bad5
llama: use sliding window for phi3 (#8627)
FanShupei Jul 25, 2024
4b0eff3
docs : Quantum -> Quantized (#8666)
Ujjawal-K-Panchal Jul 25, 2024
be6d7c0
examples : remove `finetune` and `train-text-from-scratch` (#8669)
ngxson Jul 25, 2024
eddcb52
ggml : add and use ggml_cpu_has_llamafile() (#8664)
ggerganov Jul 25, 2024
ed67bcb
[SYCL] fix multi-gpu issue on sycl (#8554)
ClarkChin08 Jul 25, 2024
88954f7
tests : fix printfs (#8068)
ggerganov Jul 25, 2024
bf5a81d
ggml : fix build on Windows with Snapdragon X (#8531)
AndreasKunar Jul 25, 2024
4226a8d
llama : fix build + fix fabs compile warnings (#8683)
ggerganov Jul 25, 2024
49ce0ab
ggml: handle ggml_init failure to fix NULL pointer deref (#8692)
DavidKorczynski Jul 25, 2024
41cd47c
examples : export-lora : fix issue with quantized base models (#8687)
ngxson Jul 25, 2024
01aec4a
server : add Speech Recognition & Synthesis to UI (#8679)
ElYaiko Jul 25, 2024
01245f5
llama : fix order of parameters (#8706)
foldl Jul 26, 2024
2b1f616
ggml : reduce hash table reset cost (#8698)
slaren Jul 27, 2024
bfb4c74
cann: Fix Multi-NPU execution error (#8710)
wangshuai09 Jul 27, 2024
9d03d08
common : add --no-warmup option for main/llama-cli (#8712)
danbev Jul 27, 2024
92090ec
llama : add function for model-based max number of graph nodes (#8622)
ggerganov Jul 27, 2024
b5e9546
llama : add support for llama 3.1 rope scaling factors (#8676)
jmorganca Jul 27, 2024
c12b6e8
ggml : remove unnecessary UNUSED macro call (ggml/880)
danbev Jul 8, 2024
d2b851b
cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (…
iboB Jul 12, 2024
203b7f1
vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/…
neobrain Jul 20, 2024
9f77d89
ggml: add support for float16 input tensors in pooling operations (gg…
vanaka11 Jul 22, 2024
a05ca93
ggml : loop tiling optimizations for scalar path (ggml/898)
heshpdx Jul 25, 2024
ae7985c
sync : ggml
ggerganov Jul 27, 2024
345c8c0
ggml : add missing semicolon (#0)
ggerganov Jul 27, 2024
56f20aa
scripts : sync ggml-aarch64 sources
ggerganov Jul 27, 2024
5e2727f
scripts : sync vulkan-shaders (#0)
ggerganov Jul 27, 2024
e54c35e
feat: Support Moore Threads GPU (#8383)
yeahdongcn Jul 27, 2024
4c676c8
llama : refactor session file management (#8699)
compilade Jul 28, 2024
4730fac
chore : Fix vulkan related compiler warnings, add help text, improve …
teleprint-me Jul 28, 2024
6eeaeba
cmake: use 1 more thread for non-ggml in CI (#8740)
JohannesGaessler Jul 28, 2024
0832de7
[SYCL] add conv support (#8688)
airMeng Jul 29, 2024
439b3fc
cuda : organize vendor-specific headers into vendors directory (#8746)
yeahdongcn Jul 29, 2024
75af08c
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector…
CarterLi999 Jul 29, 2024
c887d8b
[SYCL] Add `TIMESTEP_EMBEDDING` OP (#8707)
zhentaoyu Jul 30, 2024
6e2b600
cann: update cmake (#8765)
wangshuai09 Jul 30, 2024
140074b
flake.lock: Update (#8729)
ggerganov Jul 30, 2024
7c27a19
added android implementation of ggml_print_backtrace_symbols (#8751)
l3utterfly Jul 30, 2024
7e72aa7
py: add_array() will not add to kv store if value is an empty array (…
mofosyne Jul 30, 2024
268c566
nix: cuda: rely on propagatedBuildInputs (#8772)
SomeoneSerge Jul 30, 2024
44d28dd
cmake : fix use of external ggml (#8787)
iboB Jul 31, 2024
398ede5
Adding Gemma 2 2B configs (#8784)
pculliton Jul 31, 2024
ed9d285
Build: Fix potential race condition (#8781)
HanClinto Jul 31, 2024
afbbcf3
server : update llama-server embedding flag documentation (#8779)
okigan Jul 31, 2024
c8a0090
cann: support q8_0 for Ascend backend (#8805)
wangshuai09 Aug 1, 2024
7a11eb3
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800)
slaren Aug 1, 2024
b7a08fd
Build: Only include execinfo.h on linux systems that support it (#8783)
acon96 Aug 1, 2024
afbb4c1
ggml-cuda: Adding support for unified memory (#8035)
matteoserva Aug 1, 2024
0fbbd88
[SYCL] Fixing wrong VDR iq4nl value (#8812)
OuadiElfarouki Aug 2, 2024
e09a800
cann: Fix ggml_cann_im2col for 1D im2col (#8819)
MengqingCao Aug 2, 2024
b72c20b
Fix conversion of unnormalized BF16->BF16 weights (#7843)
CISC Aug 2, 2024
76614f3
ggml : reading the runtime sve config of the cpu (#8709)
jdomke Aug 3, 2024
4b77ea9
flake.lock: Update (#8847)
ggerganov Aug 4, 2024
01aae2b
baby-llama : remove duplicate vector include
danbev Aug 3, 2024
ecf6b7f
batched-bench : handle empty `-npl` (#8839)
cunnie Aug 4, 2024
978ba3d
Server: Don't ignore llama.cpp params (#8754)
ardfork Aug 4, 2024
0d6fb52
Install curl in runtime layer (#8693)
bsquizz Aug 4, 2024
c02b0a8
cann: support q4_0 model (#8822)
wangshuai09 Aug 5, 2024
655858a
ggml : move c parameter comment to ggml_rope_ext (ggml/901)
danbev Jul 29, 2024
a3738b2
vulkan : implement Stable Diffusion operators (ggml/904)
0cc4m Aug 4, 2024
5587e57
sync : ggml
ggerganov Aug 4, 2024
064cdc2
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855)
0cc4m Aug 5, 2024
f1ea514
llama : better replace_all (#8852)
ggerganov Aug 5, 2024
400ae6f
readme : update model list (#8851)
BarfingLemurs Aug 5, 2024
e31a4f6
cmake: fix paths for vulkan shaders compilation on Windows (#8573)
stduhpf Aug 5, 2024
d3f0c71
Stop the generation when <|eom_id|> token is encountered - needed for…
fairydreaming Aug 5, 2024
1ef14b3
py: Add more authorship metadata from model card (#8810)
mofosyne Aug 5, 2024
b9dfc25
ggml : fix overflows in elu function (#8866)
jart Aug 5, 2024
b42978e
readme : add ramalama to the availables UI (#8811)
ericcurtin Aug 5, 2024
bc0f887
cann: fix buffer_num and runtime speed slowly error (#8865)
wangshuai09 Aug 5, 2024
0a4ce78
common : Changed tuple to struct (TODO fix) (#8823)
Septa2112 Aug 5, 2024
d4ff847
[SYCL] correct cmd name (#8877)
arthw Aug 6, 2024
c21a896
[CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871)
MengqingCao Aug 6, 2024
cdd1889
convert : add support for XLMRoberta embedding models (#8658)
iamlemec Aug 6, 2024
2d5dd7b
ggml : add epsilon as a parameter for group_norm (#8818)
MollySophia Aug 6, 2024
0bf16de
contributing : add note about write access
ggerganov Aug 6, 2024
efda90c
[Vulkan] Fix compilation of `vulkan-shaders-gen` on w64devkit after `…
MaggotHATE Aug 6, 2024
db20f50
cmake : Link vulkan-shaders-gen with pthreads (#8835)
Patater Aug 6, 2024
5f4dcb1
simple : update name of executable to llama-simple (#8885)
danbev Aug 6, 2024
641f5dd
CUDA: fix padding logic for FP16/FP32 (#8884)
JohannesGaessler Aug 6, 2024
1e6f655
server : add lora hotswap endpoint (WIP) (#8857)
ngxson Aug 6, 2024
3195854
typo correction (#8891)
Nexesenex Aug 6, 2024
725e3d9
quantize : update usage comment in quantize.cpp (#8889)
danbev Aug 6, 2024
506122d
llama-bench : add support for getting cpu info on Windows (#8824)
kylo5aby Aug 7, 2024
a8dbc6f
CUDA/HIP: fix tests/test-backend-ops (#8896)
JohannesGaessler Aug 7, 2024
0478174
[SYCL] Updated SYCL device filtering (#8901)
OuadiElfarouki Aug 7, 2024
be55695
ggml-backend : fix async copy from CPU (#8897)
slaren Aug 7, 2024
15fa07a
make : use C compiler to build metal embed object (#8899)
slaren Aug 7, 2024
ebd541a
make : clean llamafile objects (#8923)
DrDub Aug 8, 2024
85fca8d
metal : add abort callback (ggml/905)
conradev Aug 7, 2024
5b33ea1
metal : fix struct name (ggml/912)
ggerganov Aug 7, 2024
f93d49a
ggml : ignore more msvc warnings (ggml/906)
iboB Aug 7, 2024
e44a561
sync : ggml
ggerganov Aug 8, 2024
366d486
scripts : fix sync filenames (#0)
ggerganov Aug 8, 2024
afd27f0
scripts : sync cann files (#0)
ggerganov Aug 8, 2024
3a14e00
gguf-py : simplify support for quant types (#8838)
compilade Aug 8, 2024
345a686
llama : reduce useless copies when saving session (#8916)
compilade Aug 9, 2024
daef3ab
server : add one level list nesting for embeddings (#8936)
gelim Aug 9, 2024
6f6496b
llama : fix typo in llama_tensor_get_type comment [no ci] (#8937)
danbev Aug 9, 2024
5b2c04f
embedding : add --pooling option to README.md [no ci] (#8934)
danbev Aug 9, 2024
70c0ea3
whisper : use vulkan as gpu backend when available (whisper/2302)
mstephenson6 Jul 16, 2024
4305b57
sync : ggml
ggerganov Aug 9, 2024
3071c0a
llava : support MiniCPM-V-2.5 (#7599)
tc-mb Aug 9, 2024
45a55b9
llama : better replace_all (cont) (#8926)
ggerganov Aug 9, 2024
272e3bd
make : fix llava obj file race (#8946)
ggerganov Aug 9, 2024
6afd1a9
llama : add support for lora adapters in T5 model (#8938)
fairydreaming Aug 9, 2024
b72942f
Merge commit from fork
ggerganov Aug 9, 2024
911b437
gguf-py : fix double call to add_architecture() (#8952)
tarilabs Aug 10, 2024
ea0c828
modify convert
tc-mb Aug 10, 2024
fc1c860
Merge branch 'prepare-PR-of-minicpm-v2.6' into master
tc-mb Aug 10, 2024
ce0d1a6
Merge pull request #24 from OpenBMB/master
tc-mb Aug 10, 2024
6cad864
modify convert
tc-mb Aug 10, 2024
fe39ecc
add readme
tc-mb Aug 10, 2024
bffbe1c
add resampler of v2.6
tc-mb Aug 10, 2024
28d6a0f
modify clip
tc-mb Aug 10, 2024
4a87d1d
modify readme
tc-mb Aug 10, 2024
32b47f6
fix type-check
tc-mb Aug 10, 2024
662d4c1
fix type-check
tc-mb Aug 12, 2024
a945b3c
fix type-check
tc-mb Aug 12, 2024
89d378c
fix type-check
tc-mb Aug 12, 2024
1ec79f0
modify convert script and readme
tc-mb Aug 12, 2024
1123376
fix convert script and readme
tc-mb Aug 12, 2024
f30c5e1
fix convert
tc-mb Aug 12, 2024
47eb0a5
fix num in convert
tc-mb Aug 12, 2024
1ca3f06
fix type-check
tc-mb Aug 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 2 additions & 2 deletions .devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG CUDA_VERSION=11.7.1
# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} as build
FROM ${BASE_CUDA_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
ARG CUDA_DOCKER_ARCH=all
Expand All @@ -27,7 +27,7 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

Expand Down
4 changes: 2 additions & 2 deletions .devops/full-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG ROCM_VERSION=5.6
# Target the CUDA build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

FROM ${BASE_ROCM_DEV_CONTAINER} as build
FROM ${BASE_ROCM_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
Expand Down Expand Up @@ -36,7 +36,7 @@ COPY . .
# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

Expand Down
2 changes: 1 addition & 1 deletion .devops/full.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=22.04

FROM ubuntu:$UBUNTU_VERSION as build
FROM ubuntu:$UBUNTU_VERSION AS build

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git libcurl4-openssl-dev libgomp1
Expand Down
6 changes: 3 additions & 3 deletions .devops/llama-cli-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VER
# Target the CUDA runtime image
ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} as build
FROM ${BASE_CUDA_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
ARG CUDA_DOCKER_ARCH=all
Expand All @@ -21,11 +21,11 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1

RUN make -j$(nproc) llama-cli

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime
FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime

RUN apt-get update && \
apt-get install -y libgomp1
Expand Down
16 changes: 9 additions & 7 deletions .devops/llama-cli-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,23 +1,25 @@
ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION as build
FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build

ARG LLAMA_SYCL_F16=OFF
ARG GGML_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git

WORKDIR /app

COPY . .

RUN if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
echo "GGML_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
cmake -B build -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx ${OPT_SYCL_F16} && \
echo "Building with static libs" && \
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx \
${OPT_SYCL_F16} -DBUILD_SHARED_LIBS=OFF && \
cmake --build build --config Release --target llama-cli

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime
FROM intel/oneapi-basekit:$ONEAPI_VERSION AS runtime

COPY --from=build /app/build/bin/llama-cli /llama-cli

Expand Down
4 changes: 2 additions & 2 deletions .devops/llama-cli-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG ROCM_VERSION=5.6
# Target the CUDA build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

FROM ${BASE_ROCM_DEV_CONTAINER} as build
FROM ${BASE_ROCM_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
Expand Down Expand Up @@ -36,7 +36,7 @@ COPY . .
# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

Expand Down
4 changes: 2 additions & 2 deletions .devops/llama-cli-vulkan.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=jammy

FROM ubuntu:$UBUNTU_VERSION as build
FROM ubuntu:$UBUNTU_VERSION AS build

# Install build tools
RUN apt update && apt install -y git build-essential cmake wget libgomp1
Expand All @@ -14,7 +14,7 @@ RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key
# Build it
WORKDIR /app
COPY . .
RUN cmake -B build -DLLAMA_VULKAN=1 && \
RUN cmake -B build -DGGML_VULKAN=1 && \
cmake --build build --config Release --target llama-cli

# Clean up
Expand Down
4 changes: 2 additions & 2 deletions .devops/llama-cli.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ARG UBUNTU_VERSION=22.04

FROM ubuntu:$UBUNTU_VERSION as build
FROM ubuntu:$UBUNTU_VERSION AS build

RUN apt-get update && \
apt-get install -y build-essential git
Expand All @@ -11,7 +11,7 @@ COPY . .

RUN make -j$(nproc) llama-cli

FROM ubuntu:$UBUNTU_VERSION as runtime
FROM ubuntu:$UBUNTU_VERSION AS runtime

RUN apt-get update && \
apt-get install -y libgomp1
Expand Down
84 changes: 0 additions & 84 deletions .devops/llama-cpp-clblast.srpm.spec

This file was deleted.

2 changes: 1 addition & 1 deletion .devops/llama-cpp-cuda.srpm.spec
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ CPU inference for Meta's Lllama2 models using default options.
%setup -n llama.cpp-master

%build
make -j LLAMA_CUDA=1
make -j GGML_CUDA=1

%install
mkdir -p %{buildroot}%{_bindir}/
Expand Down
10 changes: 6 additions & 4 deletions .devops/llama-server-cuda.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VER
# Target the CUDA runtime image
ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} as build
FROM ${BASE_CUDA_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
ARG CUDA_DOCKER_ARCH=all
Expand All @@ -21,17 +21,19 @@ COPY . .
# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable CUDA
ENV LLAMA_CUDA=1
ENV GGML_CUDA=1
# Enable cURL
ENV LLAMA_CURL=1

RUN make -j$(nproc) llama-server

FROM ${BASE_CUDA_RUN_CONTAINER} as runtime
FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev libgomp1
apt-get install -y libcurl4-openssl-dev libgomp1 curl

COPY --from=build /app/llama-server /llama-server

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
19 changes: 11 additions & 8 deletions .devops/llama-server-intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,29 +1,32 @@
ARG ONEAPI_VERSION=2024.1.1-devel-ubuntu22.04

FROM intel/oneapi-basekit:$ONEAPI_VERSION as build
FROM intel/oneapi-basekit:$ONEAPI_VERSION AS build

ARG LLAMA_SYCL_F16=OFF
ARG GGML_SYCL_F16=OFF
RUN apt-get update && \
apt-get install -y git libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN if [ "${LLAMA_SYCL_F16}" = "ON" ]; then \
echo "LLAMA_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DLLAMA_SYCL_F16=ON"; \
RUN if [ "${GGML_SYCL_F16}" = "ON" ]; then \
echo "GGML_SYCL_F16 is set" && \
export OPT_SYCL_F16="-DGGML_SYCL_F16=ON"; \
fi && \
cmake -B build -DLLAMA_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
echo "Building with dynamic libs" && \
cmake -B build -DGGML_SYCL=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DLLAMA_CURL=ON ${OPT_SYCL_F16} && \
cmake --build build --config Release --target llama-server

FROM intel/oneapi-basekit:$ONEAPI_VERSION as runtime
FROM intel/oneapi-basekit:$ONEAPI_VERSION AS runtime

RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y libcurl4-openssl-dev curl

COPY --from=build /app/build/bin/llama-server /llama-server

ENV LC_ALL=C.utf8

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
8 changes: 5 additions & 3 deletions .devops/llama-server-rocm.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ARG ROCM_VERSION=5.6
# Target the CUDA build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

FROM ${BASE_ROCM_DEV_CONTAINER} as build
FROM ${BASE_ROCM_DEV_CONTAINER} AS build

# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
Expand Down Expand Up @@ -36,15 +36,17 @@ COPY . .
# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV GGML_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

# Enable cURL
ENV LLAMA_CURL=1
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y libcurl4-openssl-dev curl

RUN make -j$(nproc) llama-server

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/app/llama-server" ]
14 changes: 6 additions & 8 deletions .devops/llama-server-vulkan.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,24 +1,20 @@
ARG UBUNTU_VERSION=jammy

FROM ubuntu:$UBUNTU_VERSION as build
FROM ubuntu:$UBUNTU_VERSION AS build

# Install build tools
RUN apt update && apt install -y git build-essential cmake wget

# Install Vulkan SDK
# Install Vulkan SDK and cURL
RUN wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
apt update -y && \
apt-get install -y vulkan-sdk

# Install cURL
RUN apt-get update && \
apt-get install -y libcurl4-openssl-dev
apt-get install -y vulkan-sdk libcurl4-openssl-dev curl

# Build it
WORKDIR /app
COPY . .
RUN cmake -B build -DLLAMA_VULKAN=1 -DLLAMA_CURL=1 && \
RUN cmake -B build -DGGML_VULKAN=1 -DLLAMA_CURL=1 && \
cmake --build build --config Release --target llama-server

# Clean up
Expand All @@ -28,4 +24,6 @@ RUN cp /app/build/bin/llama-server /llama-server && \

ENV LC_ALL=C.utf8

HEALTHCHECK CMD [ "curl", "-f", "http://localhost:8080/health" ]

ENTRYPOINT [ "/llama-server" ]
Loading
Loading