-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync Prepare pr of minicpm v2.6 #26
Commits on Jul 7, 2024
-
finetune: Rename command name in README.md (ggerganov#8343)
Rename an old command name "finetune" to "llama-finetune" in README.md Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for b81ba1f - Browse repository at this point
Copy the full SHA b81ba1fView commit details -
Configuration menu - View commit details
-
Copy full SHA for d39130a - Browse repository at this point
Copy the full SHA d39130aView commit details -
Configuration menu - View commit details
-
Copy full SHA for b504008 - Browse repository at this point
Copy the full SHA b504008View commit details -
llama : support glm3 and glm4 (ggerganov#8031)
* add chatglm3-6b model support huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * fix lint error Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * optimize convert-hf-to-gguf.py for chatglm model Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * support glm-4-9b-chat Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * fix eos tokens to glm4 * remove unused log * add preprocess to chatglm3 and chatglm4 * add eos_id_list to llama.cpp * fix code style * fix code style * fix conflicts * fix conflicts * Revert "add eos_id_list to llama.cpp" This reverts commit 3a4d579. * set <|endoftext|> as eos and <|user|> as eot * fix chat template bug * add comment to glm prefix and suffix * fix conflicts and add rope_ratio & ChatGLMForConditionalGeneration * fix chat template bug * fix codestyle * fix conflicts * modified the general name of glm model * fix conflicts * remove prefix and suffix * use normal glm4 chattempalte & use LLM_FFN_SWIGLU in phi3 * fix: resolve Flake8 errors in `convert-hf-to-gguf.py` - Fix E302 by adding two blank lines before top-level function definitions - Replace print statements to fix NP100 - Fix E303 by ensuring only one blank line between lines of code * fix rope ratio to solve incorrect answers * fix by comments --------- Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> Co-authored-by: XingXing Qiao <qiaoxx@dingdao.com> Co-authored-by: Umpire2018 <138990495+Umpire2018@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 905942a - Browse repository at this point
Copy the full SHA 905942aView commit details -
gguf-hash: model wide and per tensor hashing using xxhash and sha1 (g…
…gerganov#8048) CLI to hash GGUF files to detect difference on a per model and per tensor level The hash type we support is: - `--xxh64`: use xhash 64bit hash mode (default) - `--sha1`: use sha1 - `--uuid`: use uuid - `--sha256`: use sha256 While most POSIX systems already have hash checking programs like sha256sum, it is designed to check entire files. This is not ideal for our purpose if we want to check for consistency of the tensor data even if the metadata content of the gguf KV store has been updated. This program is designed to hash a gguf tensor payload on a 'per tensor layer' in addition to a 'entire tensor model' hash. The intent is that the entire tensor layer can be checked first but if there is any detected inconsistencies, then the per tensor hash can be used to narrow down the specific tensor layer that has inconsistencies. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for f7cab35 - Browse repository at this point
Copy the full SHA f7cab35View commit details -
readme : update bindings list (ggerganov#8222)
* adding guile_llama_cpp to binding list * fix formatting * fix formatting
Configuration menu - View commit details
-
Copy full SHA for f1948f1 - Browse repository at this point
Copy the full SHA f1948f1View commit details -
ci : add checks for cmake,make and ctest in ci/run.sh (ggerganov#8200)
* Added checks for cmake,make and ctest * Removed erroneous whitespace
Configuration menu - View commit details
-
Copy full SHA for 4090ea5 - Browse repository at this point
Copy the full SHA 4090ea5View commit details -
Update llama-cli documentation (ggerganov#8315)
* Update README.md * Update README.md * Update README.md fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas * Update README.md * Update README.md * Update README.md
Configuration menu - View commit details
-
Copy full SHA for a8db2a9 - Browse repository at this point
Copy the full SHA a8db2a9View commit details -
py : type-check all Python scripts with Pyright (ggerganov#8341)
* py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.
Configuration menu - View commit details
-
Copy full SHA for 3fd62a6 - Browse repository at this point
Copy the full SHA 3fd62a6View commit details
Commits on Jul 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 04ce3a8 - Browse repository at this point
Copy the full SHA 04ce3a8View commit details -
Configuration menu - View commit details
-
Copy full SHA for ffd0079 - Browse repository at this point
Copy the full SHA ffd0079View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f0dbf6 - Browse repository at this point
Copy the full SHA 6f0dbf6View commit details -
common : preallocate sampling token data vector (ggerganov#8363)
`emplace_back` repeatedly-called is slower than preallocating the vector to the vocab size and directly inserting the data. Some rudimentary profiling with `chrono` improves the performance of this block of code from ~500us/op to ~40us/op. Overall, this slightly improves the sampling performance which has a more substantial impact for the `examples/lookahead` implementation -- I am able to see a ~10% performance boost in lookahead inference.
Configuration menu - View commit details
-
Copy full SHA for 470939d - Browse repository at this point
Copy the full SHA 470939dView commit details -
feat: cuda implementation for
ggml_conv_transpose_1d
(ggml/854)* conv transpose 1d passing test for 1d input and kernel * working for different input and output channel counts, added test for variable stride * initial draft appears to work with stride other than 1 * working with all old and new conv1d tests * added a test for large tensors * removed use cuda hardcoding * restored test-conv-transpose.c * removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail * fixed accumulator bug * added test to test-backend-ops * fixed mistake * addressed review * fixed includes * removed blank lines * style and warning fixes * return failure when test fails * fix supports_op --------- Co-authored-by: slaren <slarengh@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for fde13b3 - Browse repository at this point
Copy the full SHA fde13b3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6847d54 - Browse repository at this point
Copy the full SHA 6847d54View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ee44c9 - Browse repository at this point
Copy the full SHA 2ee44c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3f2d538 - Browse repository at this point
Copy the full SHA 3f2d538View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ec846d - Browse repository at this point
Copy the full SHA 2ec846dView commit details -
Configuration menu - View commit details
-
Copy full SHA for c4dd11d - Browse repository at this point
Copy the full SHA c4dd11dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a130ecc - Browse repository at this point
Copy the full SHA a130eccView commit details -
flake.lock: Update (ggerganov#8342)
Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/2a55567fcf15b1b1c7ed712a2c6fadaec7412ea8?narHash=sha256-iKzJcpdXih14qYVcZ9QC9XuZYnPc6T8YImb6dX166kw%3D' (2024-06-01) → 'github:hercules-ci/flake-parts/9227223f6d922fee3c7b190b2cc238a99527bbb7?narHash=sha256-pQMhCCHyQGRzdfAkdJ4cIWiw%2BJNuWsTX7f0ZYSyz0VY%3D' (2024-07-03) • Updated input 'flake-parts/nixpkgs-lib': 'https://github.com/NixOS/nixpkgs/archive/eb9ceca17df2ea50a250b6b27f7bf6ab0186f198.tar.gz?narHash=sha256-lIbdfCsf8LMFloheeE6N31%2BBMIeixqyQWbSr2vk79EQ%3D' (2024-06-01) → 'https://github.com/NixOS/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz?narHash=sha256-Fm2rDDs86sHy0/1jxTOKB1118Q0O3Uc7EC0iXvXKpbI%3D' (2024-07-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/b2852eb9365c6de48ffb0dc2c9562591f652242a?narHash=sha256-C8e9S7RzshSdHB7L%2Bv9I51af1gDM5unhJ2xO1ywxNH8%3D' (2024-06-27) → 'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 7fdb6f7 - Browse repository at this point
Copy the full SHA 7fdb6f7View commit details
Commits on Jul 9, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 7d0e23d - Browse repository at this point
Copy the full SHA 7d0e23dView commit details -
readme : fix typo [no ci] (ggerganov#8389)
Bakus-Naur --> Backus-Naur
Configuration menu - View commit details
-
Copy full SHA for 9beb2dd - Browse repository at this point
Copy the full SHA 9beb2ddView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9925ca4 - Browse repository at this point
Copy the full SHA 9925ca4View commit details -
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (ggerganov#8372)
* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment
Configuration menu - View commit details
-
Copy full SHA for 5b0b8d8 - Browse repository at this point
Copy the full SHA 5b0b8d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for a03e8dd - Browse repository at this point
Copy the full SHA a03e8ddView commit details -
Deprecation warning to assist with migration to new binary names (gge…
…rganov#8283) * Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from ggerganov#7809 and migrate to the new filenames. * Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.
Configuration menu - View commit details
-
Copy full SHA for e500d61 - Browse repository at this point
Copy the full SHA e500d61View commit details -
Update README.md to fix broken link to docs (ggerganov#8399)
Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'
Configuration menu - View commit details
-
Copy full SHA for fd560fe - Browse repository at this point
Copy the full SHA fd560feView commit details -
Server: Enable setting default sampling parameters via command-line (g…
…gerganov#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment
Configuration menu - View commit details
-
Copy full SHA for a59f8fd - Browse repository at this point
Copy the full SHA a59f8fdView commit details
Commits on Jul 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 8f0fad4 - Browse repository at this point
Copy the full SHA 8f0fad4View commit details -
py : fix converter for internlm2 (ggerganov#8321)
* update internlm2 * remove unused file * fix lint
Configuration menu - View commit details
-
Copy full SHA for e4dd31f - Browse repository at this point
Copy the full SHA e4dd31fView commit details -
llama : add assert about missing llama_encode() call (ggerganov#8400)
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for a8be1e6 - Browse repository at this point
Copy the full SHA a8be1e6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7a80710 - Browse repository at this point
Copy the full SHA 7a80710View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc61948 - Browse repository at this point
Copy the full SHA cc61948View commit details -
gguf-py rel pipeline (ggerganov#8410)
* Upd gguf-py/readme * Bump patch version for release
Configuration menu - View commit details
-
Copy full SHA for 83321c6 - Browse repository at this point
Copy the full SHA 83321c6View commit details -
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (ggerganov#5780)
* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions * Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files * Arm AArch64: minor code refactoring for rebase * Arm AArch64: minor code refactoring for resolving a build issue with cmake * Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code change for resolving a build issue with server-windows * retrigger checks * Arm AArch64: minor code changes for rebase * Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits * Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig * Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8 * Arm AArch64: minor code refactoring * Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat * Arm AArch64: minimize changes in ggml_compute_forward_mul_mat * Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * Arm AArch64: minor code refactoring * rebase on the latest master commit 3fd62a6 and adapt to the new directory structure * Arm AArch64: remove a redundant comment * Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off * Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels * Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type
Configuration menu - View commit details
-
Copy full SHA for 0f1a39f - Browse repository at this point
Copy the full SHA 0f1a39fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6b2a849 - Browse repository at this point
Copy the full SHA 6b2a849View commit details -
[SYCL] Use multi_ptr to clean up deprecated warnings (ggerganov#8256)
AidanBeltonS authoredJul 10, 2024 Configuration menu - View commit details
-
Copy full SHA for f4444d9 - Browse repository at this point
Copy the full SHA f4444d9View commit details -
Name Migration: Build the deprecation-warning 'main' binary every time (
ggerganov#8404) * Modify the deprecation-warning 'main' binary to build every time, instead of only when a legacy binary is present. This is to help users of tutorials and other instruction sets from knowing what to do when the 'main' binary is missing and they are trying to follow instructions. * Adjusting 'server' name-deprecation binary to build all the time, similar to the 'main' legacy name binary.
Configuration menu - View commit details
-
Copy full SHA for dd07a12 - Browse repository at this point
Copy the full SHA dd07a12View commit details
Commits on Jul 11, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 278d0e1 - Browse repository at this point
Copy the full SHA 278d0e1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7a221b6 - Browse repository at this point
Copy the full SHA 7a221b6View commit details -
tokenize : add --no-parse-special option (ggerganov#8423)
This should allow more easily explaining how parse_special affects tokenization.
Configuration menu - View commit details
-
Copy full SHA for 9a55ffe - Browse repository at this point
Copy the full SHA 9a55ffeView commit details -
Configuration menu - View commit details
-
Copy full SHA for a977c11 - Browse repository at this point
Copy the full SHA a977c11View commit details -
CUDA: optimize and refactor MMQ (ggerganov#8416)
* CUDA: optimize and refactor MMQ * explicit q8_1 memory layouts, add documentation
Configuration menu - View commit details
-
Copy full SHA for 808aba3 - Browse repository at this point
Copy the full SHA 808aba3View commit details -
cuda : suppress 'noreturn' warn in no_device_code (ggerganov#8414)
* cuda : suppress 'noreturn' warn in no_device_code This commit adds a while(true) loop to the no_device_code function in common.cuh. This is done to suppress the warning: ```console /ggml/src/ggml-cuda/template-instances/../common.cuh:346:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn] 346 | } | ^ ``` The motivation for this is to reduce the number of warnings when compilng with GGML_HIPBLAS=ON. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! cuda : suppress 'noreturn' warn in no_device_code Update __trap macro instead of using a while loop to suppress the warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for b078c61 - Browse repository at this point
Copy the full SHA b078c61View commit details -
ggml : add NVPL BLAS support (ggerganov#8329) (ggerganov#8425)
* ggml : add NVPL BLAS support * ggml : replace `<BLASLIB>_ENABLE_CBLAS` with `GGML_BLAS_USE_<BLASLIB>` --------- Co-authored-by: ntukanov <ntukanov@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 3686456 - Browse repository at this point
Copy the full SHA 3686456View commit details
Commits on Jul 12, 2024
-
[SYCL] fix the mul_mat_id ut issues (ggerganov#8427)
* fix part of mul_mat_id * skip the bfloat 16 sycl ut Signed-off-by: Chen Xi <xi2chen@intel.com> --------- Signed-off-by: Chen Xi <xi2chen@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com> Co-authored-by: Chen Xi <xi2chen@intel.com>
Configuration menu - View commit details
-
Copy full SHA for b549a1b - Browse repository at this point
Copy the full SHA b549a1bView commit details -
ggml : minor naming changes (ggerganov#8433)
* ggml : minor naming changes ggml-ci * ggml : use PRId64 [no ci] * ggml : revert FA K/Q names
Configuration menu - View commit details
-
Copy full SHA for 370b1f7 - Browse repository at this point
Copy the full SHA 370b1f7View commit details -
examples : sprintf -> snprintf (ggerganov#8434)
* examples : sprintf -> snprintf ggml-ci * examples : use sizeof() instead of hardcoded constants
Configuration menu - View commit details
-
Copy full SHA for 71c1121 - Browse repository at this point
Copy the full SHA 71c1121View commit details -
convert : remove fsep token from GPTRefactForCausalLM (ggerganov#8237)
The <filename> token used by Refact doesn't serve the same purpose as the <file_separator> from CodeGemma. Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 5aefbce - Browse repository at this point
Copy the full SHA 5aefbceView commit details -
docker : fix filename for convert-hf-to-gguf.py in tools.sh (ggergano…
Configuration menu - View commit details
-
Copy full SHA for 8a4441e - Browse repository at this point
Copy the full SHA 8a4441eView commit details -
server : ensure batches are either all embed or all completion (ggerg…
…anov#8420) * make sure batches are all embed or all non-embed * non-embedding batch for sampled tokens; fix unused params warning
Configuration menu - View commit details
-
Copy full SHA for c3ebcfa - Browse repository at this point
Copy the full SHA c3ebcfaView commit details -
llama : suppress unary minus operator warning (ggerganov#8448)
This commit updates the _try_copy lambda and moves the unary minus operator to after the cast to int32_t. The motivation for this that currently the following warning is generated on windows: ```console llama.cpp\src\llama.cpp(21147,30): warning C4146: unary minus operator applied to unsigned type, result still unsigned ```
Configuration menu - View commit details
-
Copy full SHA for f532262 - Browse repository at this point
Copy the full SHA f532262View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6af51c0 - Browse repository at this point
Copy the full SHA 6af51c0View commit details -
server : handle content array in chat API (ggerganov#8449)
* server : handle content array in chat API * Update examples/server/utils.hpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 4e24cff - Browse repository at this point
Copy the full SHA 4e24cffView commit details
Commits on Jul 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c917b67 - Browse repository at this point
Copy the full SHA c917b67View commit details -
vulkan : cmake integration (ggerganov#8119)
* Add Vulkan to CMake pkg * Add Sycl to CMake pkg * Add OpenMP to CMake pkg * Split generated shader file into separate translation unit * Add CMake target for Vulkan shaders * Update README.md * Add make target for Vulkan shaders * Use pkg-config to locate vulkan library * Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow * Clean up tabs * Move sudo to apt-key invocation * Forward GGML_EXTRA_LIBS to CMake config pkg * Update vulkan obj file paths * Add shaderc to nix pkg * Add python3 to Vulkan nix build * Link against ggml in cmake pkg * Remove Python dependency from Vulkan build * code review changes * Remove trailing newline * Add cflags from pkg-config to fix w64devkit build * Update README.md * Remove trailing whitespace * Update README.md * Remove trailing whitespace * Fix doc heading * Make glslc required Vulkan component * remove clblast from nix pkg
Configuration menu - View commit details
-
Copy full SHA for 17eb6aa - Browse repository at this point
Copy the full SHA 17eb6aaView commit details
Commits on Jul 14, 2024
-
llama : fix pre-tokenization of non-special added tokens (ggerganov#8228
) * llama : fix mpt and olmo pre-tokenizer * llama : pre-tokenize non-special user-defined tokens first * llama : fix detection of control-like user-defined tokens * convert_hf : identify which user-defined tokens are control tokens Only used in _set_vocab_gpt2() for now. * convert_hf : identify more added control tokens for SPM tokenziers This makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly, including HTML tags and consecutive spaces, but it unfortunately requires model re-conversion. There seems to be a weird behavior of the HF tokenizer for Gemma, which prefers to use the 16-space token over more lengthy space tokens, while using the SentencePiece tokenizer does not do this. (the implementation in llama.cpp has the same behavior as SentencePiece) * llama : fix wrong pre-tokenization of byte tokens * llama : fix Viking pre-tokenizer regex The order was previously wrong, which caused errors in some tests. * llama : fix command-r detokenization * convert_hf : reduce usages of the UNKNOWN token type * llama : add UNKNOWN tokens in the special tokens cache * convert_hf : reduce usages of UNKNOWN for InternLM2 This makes the changes from ggerganov#8321 more consistent with the other changes made here. * test-tokenizer-random : reduce potential confilcts with ggerganov#8379 * test-tokenizer-random : add a failing edge case for falcon
Configuration menu - View commit details
-
Copy full SHA for fa79495 - Browse repository at this point
Copy the full SHA fa79495View commit details -
gguf_hash.py: Add sha256 (ggerganov#8470)
* gguf_hash.py: Add sha256 * gguf_hash.py: rename string UUIDv5 --> uuid * Apply suggestions from code review Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for e236528 - Browse repository at this point
Copy the full SHA e236528View commit details -
llama : fix Gemma-2 Query scaling factors (ggerganov#8473)
* 9B - query_pre_attn_scalar = 256 not 224 See google/gemma_pytorch@03e6575 Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads) * llama : fix Gemma-2 Query scaling factor ggml-ci --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 73cf442 - Browse repository at this point
Copy the full SHA 73cf442View commit details -
flake.lock: Update (ggerganov#8475)
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03) → 'github:NixOS/nixpkgs/7e7c39ea35c5cdd002cd4588b03a3fb9ece6fad9?narHash=sha256-EYekUHJE2gxeo2pM/zM9Wlqw1Uw2XTJXOSAO79ksc4Y%3D' (2024-07-12) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for aaab241 - Browse repository at this point
Copy the full SHA aaab241View commit details -
pydantic : replace uses of __annotations__ with get_type_hints (ggerg…
…anov#8474) * pydantic : replace uses of __annotations__ with get_type_hints * pydantic : fix Python 3.9 and 3.10 support
Configuration menu - View commit details
-
Copy full SHA for 090fca7 - Browse repository at this point
Copy the full SHA 090fca7View commit details
Commits on Jul 15, 2024
-
Vulkan MMQ Fix (ggerganov#8479)
* Fix incoherence by adding missing LOAD_VEC_A parameter * Fix Vulkan op result checker build error
Configuration menu - View commit details
-
Copy full SHA for bda62d7 - Browse repository at this point
Copy the full SHA bda62d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3dfda05 - Browse repository at this point
Copy the full SHA 3dfda05View commit details -
[SYCL] add concat through dim 1/2 (ggerganov#8483)
* add concat through dim 1/2
Configuration menu - View commit details
-
Copy full SHA for 16bdfa4 - Browse repository at this point
Copy the full SHA 16bdfa4View commit details -
docs: fix links in development docs [no ci] (ggerganov#8481)
Fixes a few links to within the repo that were broken in the reorganization of the documentation in ggerganov#8325.
Configuration menu - View commit details
-
Copy full SHA for fc690b0 - Browse repository at this point
Copy the full SHA fc690b0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9104bc2 - Browse repository at this point
Copy the full SHA 9104bc2View commit details -
server: update README.md with llama-server --help output [no ci] (gge…
…rganov#8472) The README.md had a stale information. In particular, the --ctx-size "defaults to 512" confused me and I had to check the code to confirm this was false. This the server is evolving rapidly, it's probably better to keep the source of truth at a single place (in the source) and generate the README.md based on that. Did: make llama-server ./llama-server --help > t.txt vimdiff t.txt examples/server/README.md I copied the content inside a backquote block. I would have preferred proper text but it would require a fair amount of surgery to make the current output compatible with markdown. A follow up could be to automate this process with a script. No functional change.
Configuration menu - View commit details
-
Copy full SHA for f17f39f - Browse repository at this point
Copy the full SHA f17f39fView commit details -
ggml : suppress unknown pragma 'GCC' on windows (ggerganov#8460)
This commit adds a macro guard to pragma GCC to avoid the following warning on windows: ```console C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068: unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj] ```
Configuration menu - View commit details
-
Copy full SHA for 8fac431 - Browse repository at this point
Copy the full SHA 8fac431View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4db8f60 - Browse repository at this point
Copy the full SHA 4db8f60View commit details -
Refactor lora adapter support (ggerganov#8332)
* lora: load to devide buft * add patch tensor function * correct tensor patch * llama_lora_adapter_apply * correct ggml_backend_tensor_copy * add llm_build_mm * fix auto merge * update based on review comments * add convert script * no more transpose A * add f16 convert * add metadata check * add sanity check * fix ftype * add requirements * fix requirements * fix outfile * conversion: only allow selected models * fix types * cuda : do not use dmmv if the tensor does not have enough cols * llama : lora fixes * do not disable mmap with lora Co-authored-by: slaren <slarengh@gmail.com> * llm_build_lora_mm_id * convert_lora : MoE LoRA conversion support * convert_lora : prefer safetensors, similarly to convert_hf * convert_hf : simplify modify_tensors for InternLM2 * convert_lora : lazy conversion * llama : load and use alpha from LoRA adapters * llama : use llm_build_lora_mm in most model graphs * auto scale * Revert "auto scale" This reverts commit 42415a4. * remove redundant params * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * change kv metadata * move add_type to __init__ * convert_hf : move add_type to main() * convert_lora : use the GGUFWriter from Model instead of overwriting it --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Francis Couture-Harpin <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for 97bdd26 - Browse repository at this point
Copy the full SHA 97bdd26View commit details
Commits on Jul 16, 2024
-
convert_hf : faster lazy safetensors (ggerganov#8482)
* convert_hf : faster lazy safetensors This makes '--dry-run' much, much faster. * convert_hf : fix memory leak in lazy MoE conversion The '_lazy' queue was sometimes self-referential, which caused reference cycles of objects old enough to avoid garbage collection until potential memory exhaustion.
Configuration menu - View commit details
-
Copy full SHA for 7acfd4e - Browse repository at this point
Copy the full SHA 7acfd4eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0efec57 - Browse repository at this point
Copy the full SHA 0efec57View commit details -
export-lora : handle help argument (ggerganov#8497)
The --help option on export-lora isn't accepted as valid. The help still gets displayed by default, but the script exits with an error message and nonzero status.
Configuration menu - View commit details
-
Copy full SHA for 37b12f9 - Browse repository at this point
Copy the full SHA 37b12f9View commit details -
gguf-hash : update clib.json to point to original xxhash repo (ggerga…
…nov#8491) * Update clib.json to point to Cyan4973 original xxhash Convinced Cyan4973 to add clib.json directly to his repo, so can now point the clib package directly to him now. Previously pointed to my fork with the clib.json package metadata Cyan4973/xxHash#954 * gguf-hash: readme update to point to Cyan4973 xxHash repo [no ci]
Configuration menu - View commit details
-
Copy full SHA for 1666f92 - Browse repository at this point
Copy the full SHA 1666f92View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e116e8 - Browse repository at this point
Copy the full SHA 5e116e8View commit details
Commits on Jul 17, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d65a836 - Browse repository at this point
Copy the full SHA d65a836View commit details -
Configuration menu - View commit details
-
Copy full SHA for da3913d - Browse repository at this point
Copy the full SHA da3913dView commit details -
[CANN] Add Ascend NPU backend (ggerganov#6035)
* [CANN] Add Ascend NPU backend Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI. Co-authored-by: wangshuai09 <391746016@qq.com> * delete trailing whitespaces * Modify the code based on review comment * Rename LLAMA_CANN to GGML_CANN * Make ggml-common.h private * add ggml_cann prefix for acl funcs * Add logging for CANN backend * Delete Trailing whitespace --------- Co-authored-by: wangshuai09 <391746016@qq.com>
Configuration menu - View commit details
-
Copy full SHA for 1bdd8ae - Browse repository at this point
Copy the full SHA 1bdd8aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 30f80ca - Browse repository at this point
Copy the full SHA 30f80caView commit details -
Configuration menu - View commit details
-
Copy full SHA for b328344 - Browse repository at this point
Copy the full SHA b328344View commit details -
Configuration menu - View commit details
-
Copy full SHA for e02b597 - Browse repository at this point
Copy the full SHA e02b597View commit details
Commits on Jul 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 3807c3d - Browse repository at this point
Copy the full SHA 3807c3dView commit details -
convert-*.py: GGUF Naming Convention Refactor and Metadata Override R…
…efactor (ggerganov#7499) Main thing is that the default output filename will take this form {name}{parameters}{finetune}{version}{encoding}{kind} In addition this add and remove some entries in the KV store and adds a metadata class with automatic heuristics capability to derive some values based on model card content * No Change: - Internal GGUF Spec - `general.architecture` - `general.quantization_version` - `general.alignment` - `general.file_type` - General Model Details - `general.name` - `general.author` - `general.version` - `general.description` - Licensing details - `general.license` - Typically represents the converted GGUF repo (Unless made from scratch) - `general.url` - Model Source during conversion - `general.source.url` * Removed: - Model Source during conversion - `general.source.huggingface.repository` * Added: - General Model Details - `general.organization` - `general.finetune` - `general.basename` - `general.quantized_by` - `general.size_label` - Licensing details - `general.license.name` - `general.license.link` - Typically represents the converted GGUF repo (Unless made from scratch) - `general.doi` - `general.uuid` - `general.repo_url` - Model Source during conversion - `general.source.doi` - `general.source.uuid` - `general.source.repo_url` - Base Model Source - `general.base_model.count` - `general.base_model.{id}.name` - `general.base_model.{id}.author` - `general.base_model.{id}.version` - `general.base_model.{id}.organization` - `general.base_model.{id}.url` (Model Website/Paper) - `general.base_model.{id}.doi` - `general.base_model.{id}.uuid` - `general.base_model.{id}.repo_url` (Model Source Repository (git/svn/etc...)) - Array based KV stores - `general.tags` - `general.languages` - `general.datasets` --------- Co-authored-by: compilade <git@compilade.net> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 672a6f1 - Browse repository at this point
Copy the full SHA 672a6f1View commit details -
server: use relative routes for static files in new UI (ggerganov#8552)
* server: public: fix api_url on non-index pages * server: public: use relative routes for static files in new UI
Configuration menu - View commit details
-
Copy full SHA for 0d2c732 - Browse repository at this point
Copy the full SHA 0d2c732View commit details -
cmake : install all ggml public headers (ggerganov#8480)
Co-authored-by: 65a <65a@65a.invalid>
Configuration menu - View commit details
-
Copy full SHA for 705b7ec - Browse repository at this point
Copy the full SHA 705b7ecView commit details -
Configuration menu - View commit details
-
Copy full SHA for a15ef8f - Browse repository at this point
Copy the full SHA a15ef8fView commit details
Commits on Jul 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 3d0e436 - Browse repository at this point
Copy the full SHA 3d0e436View commit details -
fix: typo of chatglm4 chat tmpl (ggerganov#8586)
Signed-off-by: thxCode <thxcode0824@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for f299aa9 - Browse repository at this point
Copy the full SHA f299aa9View commit details -
ggml : add friendlier error message to fopen errors (ggerganov#8575)
* Add additional error information when model files fail to load. * Adding additional error information to most instances of fopen.
Configuration menu - View commit details
-
Copy full SHA for b57eb9c - Browse repository at this point
Copy the full SHA b57eb9cView commit details -
Configuration menu - View commit details
-
Copy full SHA for be0cfb4 - Browse repository at this point
Copy the full SHA be0cfb4View commit details -
llama : bump max layers from 256 to 512 (ggerganov#8530)
* llama : bump max layers from 256 to 512 * llama : replace asserts with exceptions
Configuration menu - View commit details
-
Copy full SHA for d197545 - Browse repository at this point
Copy the full SHA d197545View commit details -
Configuration menu - View commit details
-
Copy full SHA for 57b1d4f - Browse repository at this point
Copy the full SHA 57b1d4fView commit details -
ggml : fix quant dot product with odd number of blocks (ggerganov#8549)
* ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix odd blocks for ARM_NEON (ggerganov#8556) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix q4_1 * ggml : fix q5_0 * ggml : fix q5_1 * ggml : fix iq4_nl metal ggml-ci * ggml : fix q4_0 * ggml : fix q8_0 ggml-ci * ggml : remove special Q4_0 code for first 2 blocks * ggml : fix sumf redefinition --------- Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 87e397d - Browse repository at this point
Copy the full SHA 87e397dView commit details
Commits on Jul 20, 2024
-
gguf_dump.py: fix markddown kv array print (ggerganov#8588)
* gguf_dump.py: fix markddown kv array print * Update gguf-py/scripts/gguf_dump.py Co-authored-by: compilade <git@compilade.net> * gguf_dump.py: refactor kv array string handling * gguf_dump.py: escape backticks inside of strings * gguf_dump.py: inline code markdown escape handler added >>> escape_markdown_inline_code("hello world") '`hello world`' >>> escape_markdown_inline_code("hello ` world") '``hello ` world``' * gguf_dump.py: handle edge case about backticks on start or end of a string --------- Co-authored-by: compilade <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for c3776ca - Browse repository at this point
Copy the full SHA c3776caView commit details -
llama.swiftui: fix end of generation bug (ggerganov#8268)
* fix continuing generating blank lines after getting EOT token or EOS token from LLM * change variable name to is_done (variable name suggested by ggerganov) * minor : fix trailing whitespace * minor : add space --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 69b9945 - Browse repository at this point
Copy the full SHA 69b9945View commit details -
llama : add support for Tekken pre-tokenizer (ggerganov#8579)
* llama : Added support for Tekken pre-tokenizer (ggerganov#8577) Removed uneeded `vocab.tokenizer_clean_spaces` assignment * llama : fix order of pre-tokenizers * * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces * Updated chkhsh for Tekken tokenizer --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 9403622 - Browse repository at this point
Copy the full SHA 9403622View commit details -
Configuration menu - View commit details
-
Copy full SHA for 07283b1 - Browse repository at this point
Copy the full SHA 07283b1View commit details -
CUDA: MMQ code deduplication + iquant support (ggerganov#8495)
* CUDA: MMQ code deduplication + iquant support * 1 less parallel job for CI build
Configuration menu - View commit details
-
Copy full SHA for 69c487f - Browse repository at this point
Copy the full SHA 69c487fView commit details
Commits on Jul 21, 2024
-
convert_hf : fix Gemma v1 conversion (ggerganov#8597)
* convert_hf : fix Gemma v1 conversion * convert_hf : allow renaming tokens, but with a warning * convert_hf : fix Gemma v1 not setting BOS and EOS tokens
Configuration menu - View commit details
-
Copy full SHA for c69c630 - Browse repository at this point
Copy the full SHA c69c630View commit details -
gguf-py : fix some metadata name extraction edge cases (ggerganov#8591)
* gguf-py : fix some metadata name extraction edge cases * convert_lora : use the lora dir for the model card path * gguf-py : more metadata edge cases fixes Multiple finetune versions are now joined together, and the removal of the basename annotation on trailing versions is more robust. * gguf-py : add more name metadata extraction tests * convert_lora : fix default filename The default filename was previously hardcoded. * convert_hf : Model.fname_out can no longer be None * gguf-py : do not use title case for naming convention Some models use acronyms in lowercase, which can't be title-cased like other words, so it's best to simply use the same case as in the original model name. Note that the size label still has an uppercased suffix to make it distinguishable from the context size of a finetune.
Configuration menu - View commit details
-
Copy full SHA for 328884f - Browse repository at this point
Copy the full SHA 328884fView commit details -
examples : Rewrite pydantic_models_to_grammar_examples.py (ggerganov#…
…8493) Changes: - Move each example into its own function. This makes the code much easier to read and understand. - Make the program easy to only run one test by commenting out function calls in main(). - Make the output easy to parse by indenting the output for each example. - Add shebang and +x bit to make it clear it's an executable. - Make the host configurable via --host with a default 127.0.0.1:8080. - Make the code look in the tools list to call the registered tool, instead of hardcoding the returned values. This makes the code more copy-pastable. - Add error checking, so that the program exits 1 if the LLM didn't returned expected values. It's super useful to check for correctness. Testing: - Tested with Mistral-7B-Instruct-v0.3 in F16 and Q5_K_M and Meta-Llama-3-8B-Instruct in F16 and Q5_K_M. - I did not observe a failure even once in Mistral-7B-Instruct-v0.3. - Llama-3 failed about a third of the time in example_concurrent: it only returned one call instead of 3. Even for F16. Potential follow ups: - Do not fix the prompt encoding yet. Surprisingly it mostly works even if the prompt encoding is not model optimized. - Add chained answer and response. Test only change.
Configuration menu - View commit details
-
Copy full SHA for 22f281a - Browse repository at this point
Copy the full SHA 22f281aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 45f2c19 - Browse repository at this point
Copy the full SHA 45f2c19View commit details
Commits on Jul 22, 2024
-
examples: fix android example cannot be generated continuously (ggerg…
…anov#8621) When generation ends `completion_loop()` should return a NULL, not the empty string
Configuration menu - View commit details
-
Copy full SHA for b7c11d3 - Browse repository at this point
Copy the full SHA b7c11d3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 04bab6b - Browse repository at this point
Copy the full SHA 04bab6bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6281544 - Browse repository at this point
Copy the full SHA 6281544View commit details -
Configuration menu - View commit details
-
Copy full SHA for 50e0535 - Browse repository at this point
Copy the full SHA 50e0535View commit details -
tests : re-enable tokenizer tests (ggerganov#8611)
* models : remove duplicated gpt-2 vocab * models : remove old stablelm vocab * tests : re-enable MPT tokenizer tests * tests : re-enable DeepSeek tokenizer tests * cmake : sort ggml-ci
Configuration menu - View commit details
-
Copy full SHA for e093dd2 - Browse repository at this point
Copy the full SHA e093dd2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f11a83 - Browse repository at this point
Copy the full SHA 6f11a83View commit details -
*.py: Stylistic adjustments for python (ggerganov#8233)
* Superflous parens in conditionals were removed. * Unused args in function were removed. * Replaced unused `idx` var with `_` * Initializing file_format and format_version attributes * Renaming constant to capitals * Preventing redefinition of the `f` var Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for 566daa5 - Browse repository at this point
Copy the full SHA 566daa5View commit details -
llama : add support for SmolLm pre-tokenizer (ggerganov#8609)
* Adding SmolLM Pre Tokenizer * Update convert_hf_to_gguf_update.py Co-authored-by: compilade <git@compilade.net> * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * handle regex * removed .inp and out .out ggufs --------- Co-authored-by: compilade <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for d94c6e0 - Browse repository at this point
Copy the full SHA d94c6e0View commit details -
llama : fix codeshell support (ggerganov#8599)
* llama : fix codeshell support * llama : move codeshell after smollm below to respect the enum order
Configuration menu - View commit details
-
Copy full SHA for 081fe43 - Browse repository at this point
Copy the full SHA 081fe43View commit details
Commits on Jul 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 063d99a - Browse repository at this point
Copy the full SHA 063d99aView commit details -
contrib : clarify PR squashing + module names (ggerganov#8630)
* contrib : clarify PR squashing * contrib : fix typo + add list of modules
Configuration menu - View commit details
-
Copy full SHA for e7e6487 - Browse repository at this point
Copy the full SHA e7e6487View commit details -
Allow all RDNA2 archs to use sdot4 intrinsic (ggerganov#8629)
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
Configuration menu - View commit details
-
Copy full SHA for 46e4741 - Browse repository at this point
Copy the full SHA 46e4741View commit details -
Vulkan IQ4_NL Support (ggerganov#8613)
* Fix Vulkan matmul tests compile errors * Add Vulkan IQ4_NL support * Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
Configuration menu - View commit details
-
Copy full SHA for 751fcfc - Browse repository at this point
Copy the full SHA 751fcfcView commit details -
llama : move vocab, grammar and sampling into separate files (ggergan…
…ov#8508) * llama : move sampling code into llama-sampling ggml-ci * llama : move grammar code into llama-grammar ggml-ci * cont ggml-ci * cont : pre-fetch rules * cont ggml-ci * llama : deprecate llama_sample_grammar * llama : move tokenizers into llama-vocab ggml-ci * make : update llama.cpp deps [no ci] * llama : redirect external API to internal APIs ggml-ci * llama : suffix the internal APIs with "_impl" ggml-ci * llama : clean-up
Configuration menu - View commit details
-
Copy full SHA for 938943c - Browse repository at this point
Copy the full SHA 938943cView commit details -
sycl : Add support for non-release DPC++ & oneMKL (ggerganov#8644)
* Update cmake to support nvidia hardware & open-source compiler --------- Signed-off-by: Joe Todd <joe.todd@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for 64cf50a - Browse repository at this point
Copy the full SHA 64cf50aView commit details -
Configuration menu - View commit details
-
Copy full SHA for b841d07 - Browse repository at this point
Copy the full SHA b841d07View commit details -
examples : Fix
llama-export-lora
example (ggerganov#8607)* fix export-lora example * add more logging * reject merging subset * better check * typo
Configuration menu - View commit details
-
Copy full SHA for de28008 - Browse repository at this point
Copy the full SHA de28008View commit details
Commits on Jul 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for b115105 - Browse repository at this point
Copy the full SHA b115105View commit details -
Configuration menu - View commit details
-
Copy full SHA for 79167d9 - Browse repository at this point
Copy the full SHA 79167d9View commit details -
llama : fix
llama_chat_format_single
for mistral (ggerganov#8657)* fix `llama_chat_format_single` for mistral * fix typo * use printf
Configuration menu - View commit details
-
Copy full SHA for 96952e7 - Browse repository at this point
Copy the full SHA 96952e7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a7ac53 - Browse repository at this point
Copy the full SHA 3a7ac53View commit details -
Build Llama SYCL Intel with static libs (ggerganov#8668)
Ensure SYCL CI builds both static & dynamic libs for testing purposes Signed-off-by: Joe Todd <joe.todd@codeplay.com>
Configuration menu - View commit details
-
Copy full SHA for f19bf99 - Browse repository at this point
Copy the full SHA f19bf99View commit details -
readme : update games list (ggerganov#8673)
Added link to game I made that depends on llama
Configuration menu - View commit details
-
Copy full SHA for 68504f0 - Browse repository at this point
Copy the full SHA 68504f0View commit details
Commits on Jul 25, 2024
-
llama: use sliding window for phi3 (ggerganov#8627)
* use sliding window for phi3 * fix typo, "data_swa" -> "data" * [conver_hf_to_gguf.py] add phi3 sliding window
Configuration menu - View commit details
-
Copy full SHA for 8a4bad5 - Browse repository at this point
Copy the full SHA 8a4bad5View commit details -
docs : Quantum -> Quantized (ggerganov#8666)
* docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.
Configuration menu - View commit details
-
Copy full SHA for 4b0eff3 - Browse repository at this point
Copy the full SHA 4b0eff3View commit details -
examples : remove
finetune
andtrain-text-from-scratch
(ggerganov……#8669) * examples : remove finetune and train-text-from-scratch * fix build * update help message * fix small typo for export-lora
Configuration menu - View commit details
-
Copy full SHA for be6d7c0 - Browse repository at this point
Copy the full SHA be6d7c0View commit details -
Configuration menu - View commit details
-
Copy full SHA for eddcb52 - Browse repository at this point
Copy the full SHA eddcb52View commit details -
[SYCL] fix multi-gpu issue on sycl (ggerganov#8554)
--------- Signed-off-by: Chen Xi <xi2chen@intel.com> Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Configuration menu - View commit details
-
Copy full SHA for ed67bcb - Browse repository at this point
Copy the full SHA ed67bcbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 88954f7 - Browse repository at this point
Copy the full SHA 88954f7View commit details -
ggml : fix build on Windows with Snapdragon X (ggerganov#8531)
* Improvements for Windows with Snapdragon X * Revert "Improvements for Windows with Snapdragon X" This reverts commit bf21397. * Improvements for Windows with Snapdragon X * WOA build clarifications * WIndows on ARM build clarifications * cmake build for Windows clarifications * Update docs/build.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: AndreasKunar <andreaskmsn.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for bf5a81d - Browse repository at this point
Copy the full SHA bf5a81dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4226a8d - Browse repository at this point
Copy the full SHA 4226a8dView commit details -
ggml: handle ggml_init failure to fix NULL pointer deref (ggerganov#8692
Configuration menu - View commit details
-
Copy full SHA for 49ce0ab - Browse repository at this point
Copy the full SHA 49ce0abView commit details -
Configuration menu - View commit details
-
Copy full SHA for 41cd47c - Browse repository at this point
Copy the full SHA 41cd47cView commit details -
server : add Speech Recognition & Synthesis to UI (ggerganov#8679)
* server : add Speech Recognition & Synthesis to UI * server : add Speech Recognition & Synthesis to UI (fixes)
Configuration menu - View commit details
-
Copy full SHA for 01aec4a - Browse repository at this point
Copy the full SHA 01aec4aView commit details
Commits on Jul 26, 2024
-
llama : fix order of parameters (ggerganov#8706)
usage of `aclrtGetMemInfo` is correct: https://www.hiascend.com/doc_center/source/zh/canncommercial/63RC2/inferapplicationdev/aclcppdevg/aclcppdevg_03_0103.html Co-authored-by: Judd <foldl@boxvest.com>
Configuration menu - View commit details
-
Copy full SHA for 01245f5 - Browse repository at this point
Copy the full SHA 01245f5View commit details
Commits on Jul 27, 2024
-
ggml : reduce hash table reset cost (ggerganov#8698)
* ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string
Configuration menu - View commit details
-
Copy full SHA for 2b1f616 - Browse repository at this point
Copy the full SHA 2b1f616View commit details -
cann: Fix Multi-NPU execution error (ggerganov#8710)
* cann: fix multi-npu exec error * cann: update comment for ggml_backend_cann_supports_buft
Configuration menu - View commit details
-
Copy full SHA for bfb4c74 - Browse repository at this point
Copy the full SHA bfb4c74View commit details -
common : add --no-warmup option for main/llama-cli (ggerganov#8712)
This commit adds a --no-warmup option for llama-cli. The motivation for this is that it can be convenient to skip the warmup llama_decode call when debugging. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 9d03d08 - Browse repository at this point
Copy the full SHA 9d03d08View commit details -
llama : add function for model-based max number of graph nodes (ggerg…
…anov#8622) * llama : model-based max number of graph nodes ggml-ci * llama : disable 405B max_nodes path due to lack of complaints ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 92090ec - Browse repository at this point
Copy the full SHA 92090ecView commit details -
llama : add support for llama 3.1 rope scaling factors (ggerganov#8676)
* Add llama 3.1 rope scaling factors to llama conversion and inference This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192 * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> * address comments * address comments * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for b5e9546 - Browse repository at this point
Copy the full SHA b5e9546View commit details -
ggml : remove unnecessary UNUSED macro call (ggml/880)
This commit removes an UNUSED macro call that is not needed as the variable n0 is used in the code and will not produce a warning. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for c12b6e8 - Browse repository at this point
Copy the full SHA c12b6e8View commit details -
Configuration menu - View commit details
-
Copy full SHA for d2b851b - Browse repository at this point
Copy the full SHA d2b851bView commit details -
vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/…
…893) This prevents invalid frees when destroying a partially initialized vk_buffer_struct. For example, this could happen in ggml_vk_create_buffer when running out of device memory. Co-authored-by: Tony Wasserka <neobrain@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 203b7f1 - Browse repository at this point
Copy the full SHA 203b7f1View commit details -
ggml: add support for float16 input tensors in pooling operations (gg…
…ml/895) * Add support for float16 tensors in 1d pooling operations * Add support for float16 input tensors in 2d pooling operations * code cleanup remove unnecessary casting during srow ptr initialization --------- Co-authored-by: vanaka11 <vanaka1189@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 9f77d89 - Browse repository at this point
Copy the full SHA 9f77d89View commit details -
ggml : loop tiling optimizations for scalar path (ggml/898)
Apply a loop tiling technique to the generic path, which provides performance upside for ISAs with enough registers to take advantage of it. Also helps the compiler optimize this path.
Configuration menu - View commit details
-
Copy full SHA for a05ca93 - Browse repository at this point
Copy the full SHA a05ca93View commit details -
Configuration menu - View commit details
-
Copy full SHA for ae7985c - Browse repository at this point
Copy the full SHA ae7985cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 345c8c0 - Browse repository at this point
Copy the full SHA 345c8c0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 56f20aa - Browse repository at this point
Copy the full SHA 56f20aaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e2727f - Browse repository at this point
Copy the full SHA 5e2727fView commit details -
feat: Support Moore Threads GPU (ggerganov#8383)
* Update doc for MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in Makefile Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Add GGML_MUSA in CMake Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * CUDA => MUSA Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * MUSA adds support for __vsubss4 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Fix CI build failure Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Configuration menu - View commit details
-
Copy full SHA for e54c35e - Browse repository at this point
Copy the full SHA e54c35eView commit details
Commits on Jul 28, 2024
-
llama : refactor session file management (ggerganov#8699)
* llama : refactor session file management * llama : saving and restoring state checks for overflow The size of the buffers should now be given to the functions working with them, otherwise a truncated file could cause out of bound reads. * llama : stream from session file instead of copying into a big buffer Loading session files should no longer cause a memory usage spike. * llama : llama_state_get_size returns the actual size instead of max This is a breaking change, but makes that function *much* easier to keep up to date, and it also makes it reflect the behavior of llama_state_seq_get_size. * llama : share code between whole and seq_id-specific state saving Both session file types now use a more similar format. * llama : no longer store all hparams in session files Instead, the model arch name is stored. The layer count and the embedding dimensions of the KV cache are still verified when loading. Storing all the hparams is not necessary. * llama : fix uint64_t format type * llama : various integer type cast and format string fixes Some platforms use "%lu" and others "%llu" for uint64_t. Not sure how to handle that, so casting to size_t when displaying errors. * llama : remove _context suffix for llama_data_context * llama : fix session file loading llama_state_get_size cannot be used to get the max size anymore. * llama : more graceful error handling of invalid session files * llama : remove LLAMA_MAX_RNG_STATE It's no longer necessary to limit the size of the RNG state, because the max size of session files is not estimated anymore. * llama : cast seq_id in comparison with unsigned n_seq_max
Configuration menu - View commit details
-
Copy full SHA for 4c676c8 - Browse repository at this point
Copy the full SHA 4c676c8View commit details -
chore : Fix vulkan related compiler warnings, add help text, improve …
…CLI options (ggerganov#8477) * chore: Fix compiler warnings, add help text, improve CLI options * Add prototypes for function definitions * Invert logic of --no-clean option to be more intuitive * Provide a new help prompt with clear instructions * chore : Add ignore rule for vulkan shader generator Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> * Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp Co-authored-by: 0cc4m <picard12@live.de> * chore : Remove void and apply C++ style empty parameters * chore : Remove void and apply C++ style empty parameters --------- Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com> Co-authored-by: 0cc4m <picard12@live.de>
Configuration menu - View commit details
-
Copy full SHA for 4730fac - Browse repository at this point
Copy the full SHA 4730facView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6eeaeba - Browse repository at this point
Copy the full SHA 6eeaebaView commit details
Commits on Jul 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0832de7 - Browse repository at this point
Copy the full SHA 0832de7View commit details -
cuda : organize vendor-specific headers into vendors directory (ggerg…
…anov#8746) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Configuration menu - View commit details
-
Copy full SHA for 439b3fc - Browse repository at this point
Copy the full SHA 439b3fcView commit details -
ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (…
…ggerganov#8748) In these codes, we want to retain the value that they previously held when mask[i] is false. So we should use undisturbed. With the default agnostic policy of rvv intrinsic, these values can be held or be written with 1s. Co-authored-by: carter.li <carter.li@starfivetech.com>
Configuration menu - View commit details
-
Copy full SHA for 75af08c - Browse repository at this point
Copy the full SHA 75af08cView commit details
Commits on Jul 30, 2024
-
[SYCL] Add
TIMESTEP_EMBEDDING
OP (ggerganov#8707)Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
Configuration menu - View commit details
-
Copy full SHA for c887d8b - Browse repository at this point
Copy the full SHA c887d8bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e2b600 - Browse repository at this point
Copy the full SHA 6e2b600View commit details -
Configuration menu - View commit details
-
Copy full SHA for 140074b - Browse repository at this point
Copy the full SHA 140074bView commit details -
added android implementation of ggml_print_backtrace_symbols (ggergan…
…ov#8751) * added android implementation of ggml_print_backtrace_symbols * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> * Update ggml/src/ggml.c Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 7c27a19 - Browse repository at this point
Copy the full SHA 7c27a19View commit details -
py: add_array() will not add to kv store if value is an empty array (g…
…gerganov#8774) * gguf_writer.py: add_array() should not add to kv store if empty * Apply suggestions from code review I was wondering if there was a specific reason for `if val` but good to hear we can safely use `len(val == 0` Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for 7e72aa7 - Browse repository at this point
Copy the full SHA 7e72aa7View commit details -
nix: cuda: rely on propagatedBuildInputs (ggerganov#8772)
Listing individual outputs no longer necessary to reduce the runtime closure size after NixOS/nixpkgs#323056.
Configuration menu - View commit details
-
Copy full SHA for 268c566 - Browse repository at this point
Copy the full SHA 268c566View commit details
Commits on Jul 31, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 44d28dd - Browse repository at this point
Copy the full SHA 44d28ddView commit details -
Adding Gemma 2 2B configs (ggerganov#8784)
* Adding Gemma 2 2B configs Updates to Q scaling and Gemma 2 model sizes to match v2 2B model. * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 398ede5 - Browse repository at this point
Copy the full SHA 398ede5View commit details -
Build: Fix potential race condition (ggerganov#8781)
* Fix potential race condition as pointed out by @fairydreaming in ggerganov#8776 * Reference the .o rather than rebuilding every time. * Adding in CXXFLAGS and LDFLAGS * Removing unnecessary linker flags.
Configuration menu - View commit details
-
Copy full SHA for ed9d285 - Browse repository at this point
Copy the full SHA ed9d285View commit details -
Configuration menu - View commit details
-
Copy full SHA for afbbcf3 - Browse repository at this point
Copy the full SHA afbbcf3View commit details
Commits on Aug 1, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c8a0090 - Browse repository at this point
Copy the full SHA c8a0090View commit details -
cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (ggerganov#8800)
* cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X * update asserts * only use dmmv for supported types * add test
Configuration menu - View commit details
-
Copy full SHA for 7a11eb3 - Browse repository at this point
Copy the full SHA 7a11eb3View commit details -
Build: Only include execinfo.h on linux systems that support it (gger…
…ganov#8783) * Only enable backtrace on GLIBC linux systems * fix missing file from copy * use glibc macro instead of defining a custom one
Configuration menu - View commit details
-
Copy full SHA for b7a08fd - Browse repository at this point
Copy the full SHA b7a08fdView commit details -
ggml-cuda: Adding support for unified memory (ggerganov#8035)
* Adding support for unified memory * adding again the documentation about unified memory * refactoring: Moved the unified memory code in the correct location. * Fixed compilation error when using hipblas * cleaning up the documentation * Updating the documentation Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * adding one more case where the PR should not be enabled --------- Co-authored-by: matteo serva <matteo.serva@gmail.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Configuration menu - View commit details
-
Copy full SHA for afbb4c1 - Browse repository at this point
Copy the full SHA afbb4c1View commit details
Commits on Aug 2, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0fbbd88 - Browse repository at this point
Copy the full SHA 0fbbd88View commit details -
cann: Fix ggml_cann_im2col for 1D im2col (ggerganov#8819)
* fix ggml_cann_im2col for 1D im2col * fix build warning
Configuration menu - View commit details
-
Copy full SHA for e09a800 - Browse repository at this point
Copy the full SHA e09a800View commit details -
Fix conversion of unnormalized BF16->BF16 weights (ggerganov#7843)
* add truncate_bf16 * truncate intermediate fp32 if converting bf16 to bf16 * fix masking in __compute_fp32_to_bf16 * np.int16 no longer used * missing cast and additional numpy 2.x fix * ggml-impl : do not flush bf16 subnormals to zero * ggml : add reference fp32 to bf16 conversion The fast version is no longer equivalent for all platforms because of the handling of subnormal values. * gguf-py : remove flush to zero for bf16 subnormals * gguf-py : remove float32 truncation to bf16 Rounding achieves the same thing in the cases where this was used. * missed prototype update in merge * merge cleanup --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for b72c20b - Browse repository at this point
Copy the full SHA b72c20bView commit details
Commits on Aug 3, 2024
-
ggml : reading the runtime sve config of the cpu (ggerganov#8709)
* ggml : reading the runtime sve config of the cpu * change to one time init to prevent performance drop * prefix variable to avoid possible conflicts * revert xxhash fix and add brackets --------- Co-authored-by: domke <673751-domke@users.noreply.gitlab.com>
Configuration menu - View commit details
-
Copy full SHA for 76614f3 - Browse repository at this point
Copy the full SHA 76614f3View commit details
Commits on Aug 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 4b77ea9 - Browse repository at this point
Copy the full SHA 4b77ea9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 01aae2b - Browse repository at this point
Copy the full SHA 01aae2bView commit details -
batched-bench : handle empty
-npl
(ggerganov#8839)* [example] batched-bench "segmentation fault" When `llama-batched-bench` is invoked _without_ setting `-npl`, "number of parallel prompts", it segfaults. The segfault is caused by invoking `max_element()` on a zero-length vector, `n_pl` This commit addresses that by first checking to see if the number of parallel prompts is zero, and if so sets the maximum sequence size to 1; otherwise, sets it to the original, the result of `max_element()`. Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf` ``` * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28 69 llama_context_params ctx_params = llama_context_params_from_gpt_params(params); 70 71 // ensure enough sequences are available -> 72 ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end()); ``` * Update examples/batched-bench/batched-bench.cpp Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>
Configuration menu - View commit details
-
Copy full SHA for ecf6b7f - Browse repository at this point
Copy the full SHA ecf6b7fView commit details -
Server: Don't ignore llama.cpp params (ggerganov#8754)
* Don't ignore llama.cpp params * Add fallback for max_tokens
Configuration menu - View commit details
-
Copy full SHA for 978ba3d - Browse repository at this point
Copy the full SHA 978ba3dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d6fb52 - Browse repository at this point
Copy the full SHA 0d6fb52View commit details
Commits on Aug 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c02b0a8 - Browse repository at this point
Copy the full SHA c02b0a8View commit details -
ggml : move c parameter comment to ggml_rope_ext (ggml/901)
This commit moves the comment for the c parameter from ggml_rope to ggml_rope_ext. The comment is currently incorrect as ggml_rope does not have a c parameter (freq_factors tensor). Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 655858a - Browse repository at this point
Copy the full SHA 655858aView commit details -
vulkan : implement Stable Diffusion operators (ggml/904)
* Fix Vulkan repeat op * Implement Vulkan concat op * Delete old Vulkan shader generator * Implement Vulkan im2col op * Implement Vulkan unary gelu_quick op * Implement Vulkan group_norm op * Implement Vulkan timestep_embedding op * Implement Vulkan upscale op * Fix Vulkan vk_context tensor extra index issue * Fix Vulkan matmul shader parameter bug * Properly fix Vulkan matmul shader parameter bug * Add Vulkan ADD f16 + f32 -> f16 operator support * Implement Vulkan tanh op * Fix Vulkan group count too large Validation error on non-Nvidia GPUs * Throw error when too much memory is requested * Fix another Vulkan group count too large Validation error on non-Nvidia GPUs * Fix matmul MMQ condition * Implement Vulkan pad op * Fix Vulkan crash when tensor is used multiple times in a compute graph * Add Vulkan CONCAT f16 + f16 -> f16 op * Add Vulkan LEAKY_RELU op
Configuration menu - View commit details
-
Copy full SHA for a3738b2 - Browse repository at this point
Copy the full SHA a3738b2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5587e57 - Browse repository at this point
Copy the full SHA 5587e57View commit details -
vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (ggergan…
…ov#8855) * Fix Vulkan mul mat vec invalid results when ncols < warp size * Only run backend ops mul mat vec block size test if block size not already covered
Configuration menu - View commit details
-
Copy full SHA for 064cdc2 - Browse repository at this point
Copy the full SHA 064cdc2View commit details -
Configuration menu - View commit details
-
Copy full SHA for f1ea514 - Browse repository at this point
Copy the full SHA f1ea514View commit details -
Configuration menu - View commit details
-
Copy full SHA for 400ae6f - Browse repository at this point
Copy the full SHA 400ae6fView commit details -
cmake: fix paths for vulkan shaders compilation on Windows (ggerganov…
…#8573) * Vulkan-shaders: attempt fix compilation on windows * fix miss-matched parenthesis
Configuration menu - View commit details
-
Copy full SHA for e31a4f6 - Browse repository at this point
Copy the full SHA e31a4f6View commit details -
Stop the generation when <|eom_id|> token is encountered - needed for…
… Llama 3.1 tool call support (ggerganov#8858) * gguf-py, llama : add constants and methods related to Llama-3.1 <|eom_id|> token * llama : find Llama-3.1 <|eom_id|> token id during vocab loading * llama-vocab : add Llama-3.1 <|eom_id|> token to the set of tokens stopping the generation --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for d3f0c71 - Browse repository at this point
Copy the full SHA d3f0c71View commit details -
py: Add more authorship metadata from model card (ggerganov#8810)
* py: add more authorship metadata from model card * fixup! py: add more authorship metadata from model card
Configuration menu - View commit details
-
Copy full SHA for 1ef14b3 - Browse repository at this point
Copy the full SHA 1ef14b3View commit details -
ggml : fix overflows in elu function (ggerganov#8866)
It's helpful to use expm1f(x), because expf(x)-1 will result in overflow for 25% of single-precision floating point numbers.
Configuration menu - View commit details
-
Copy full SHA for b9dfc25 - Browse repository at this point
Copy the full SHA b9dfc25View commit details -
readme : add ramalama to the availables UI (ggerganov#8811)
ramalama is a repo agnostic boring CLI tool that supports pulling from ollama, huggingface and oci registries. Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Configuration menu - View commit details
-
Copy full SHA for b42978e - Browse repository at this point
Copy the full SHA b42978eView commit details -
Configuration menu - View commit details
-
Copy full SHA for bc0f887 - Browse repository at this point
Copy the full SHA bc0f887View commit details -
common : Changed tuple to struct (TODO fix) (ggerganov#8823)
* common : Changed tuple to struct (TODO fix) Use struct `llama_init_result` to replace the previous std::tuple<struct llama_model *, struct llama_context *> * delete llama_init_default_params() * delete the extra whitespace
Configuration menu - View commit details
-
Copy full SHA for 0a4ce78 - Browse repository at this point
Copy the full SHA 0a4ce78View commit details
Commits on Aug 6, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d4ff847 - Browse repository at this point
Copy the full SHA d4ff847View commit details -
[CANN]: Fix ggml_backend_cann_buffer_get_tensor (ggerganov#8871)
* cann: fix ggml_backend_cann_buffer_get_tensor 1. fix data ptr offset 2. enable the acquisition of incomplete tensors * fix backend cann set_tensor
Configuration menu - View commit details
-
Copy full SHA for c21a896 - Browse repository at this point
Copy the full SHA c21a896View commit details -
convert : add support for XLMRoberta embedding models (ggerganov#8658)
* add conversion for bge-m3; small fix in unigram tokenizer * clean up and simplify XLMRoberta conversion
Configuration menu - View commit details
-
Copy full SHA for cdd1889 - Browse repository at this point
Copy the full SHA cdd1889View commit details -
ggml : add epsilon as a parameter for group_norm (ggerganov#8818)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 2d5dd7b - Browse repository at this point
Copy the full SHA 2d5dd7bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0bf16de - Browse repository at this point
Copy the full SHA 0bf16deView commit details -
[Vulkan] Fix compilation of
vulkan-shaders-gen
on w64devkit after `……e31a4f6` (ggerganov#8880) * Fix compilation issue in `vulkan-shaders-gen` ggerganov@e31a4f6 broke compilation on w64devkit. Including `algorithm` seems to fix that. * Guard it under `#ifdef _WIN32`
Configuration menu - View commit details
-
Copy full SHA for efda90c - Browse repository at this point
Copy the full SHA efda90cView commit details -
cmake : Link vulkan-shaders-gen with pthreads (ggerganov#8835)
When using CMake to build with Vulkan support, compiling vulkan-shaders-gen fails due to missing a CMakeLists.txt specification to link vulkan-shaders-gen with the threading library, resulting in the following error. [5/172] Linking CXX executable bin/vulkan-shaders-gen FAILED: bin/vulkan-shaders-gen : && /usr/bin/c++ ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o -o bin/vulkan-shaders-gen && : ld: error: undefined symbol: pthread_create >>> referenced by vulkan-shaders-gen.cpp >>> ggml/src/vulkan-shaders/CMakeFiles/vulkan-shaders-gen.dir/vulkan-shaders-gen.cpp.o:(std::__1::__libcpp_thread_create[abi:se180100](pthread**, >>> void* (*)(void*), void*)) c++: error: linker command failed with exit code 1 (use -v to see invocation) [6/172] Generating build details from Git -- Found Git: /usr/local/bin/git (found version "2.45.2") ninja: build stopped: subcommand failed. Add the CMakeLists.txt specification to link vulkan-shaders-gen with the threading library and fix the above error. Fixes ggerganov#8834
Configuration menu - View commit details
-
Copy full SHA for db20f50 - Browse repository at this point
Copy the full SHA db20f50View commit details -
simple : update name of executable to llama-simple (ggerganov#8885)
This commit updates the name of the executable in README.md from `simple` to `llama-simple`.
Configuration menu - View commit details
-
Copy full SHA for 5f4dcb1 - Browse repository at this point
Copy the full SHA 5f4dcb1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 641f5dd - Browse repository at this point
Copy the full SHA 641f5ddView commit details -
server : add lora hotswap endpoint (WIP) (ggerganov#8857)
* server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style
Configuration menu - View commit details
-
Copy full SHA for 1e6f655 - Browse repository at this point
Copy the full SHA 1e6f655View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3195854 - Browse repository at this point
Copy the full SHA 3195854View commit details -
quantize : update usage comment in quantize.cpp (ggerganov#8889)
This commit updates the usage comment in quantize.cpp to reflect the new name of the executable, which is llama-quantize.
Configuration menu - View commit details
-
Copy full SHA for 725e3d9 - Browse repository at this point
Copy the full SHA 725e3d9View commit details
Commits on Aug 7, 2024
-
llama-bench : add support for getting cpu info on Windows (ggerganov#…
…8824) * Add support for getting cpu info on Windows for llama_bench * refactor --------- Co-authored-by: slaren <slarengh@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 506122d - Browse repository at this point
Copy the full SHA 506122dView commit details -
Configuration menu - View commit details
-
Copy full SHA for a8dbc6f - Browse repository at this point
Copy the full SHA a8dbc6fView commit details -
[SYCL] Updated SYCL device filtering (ggerganov#8901)
* Updated device filter to depend on default_selector (fixes non-intel device issues) * Small related update to example/sycl Readme
Configuration menu - View commit details
-
Copy full SHA for 0478174 - Browse repository at this point
Copy the full SHA 0478174View commit details -
ggml-backend : fix async copy from CPU (ggerganov#8897)
* ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same
Configuration menu - View commit details
-
Copy full SHA for be55695 - Browse repository at this point
Copy the full SHA be55695View commit details -
make : use C compiler to build metal embed object (ggerganov#8899)
* make : use C compiler to build metal embed object * use rm + rmdir to avoid -r flag in rm
Configuration menu - View commit details
-
Copy full SHA for 15fa07a - Browse repository at this point
Copy the full SHA 15fa07aView commit details
Commits on Aug 8, 2024
-
make : clean llamafile objects (ggerganov#8923)
`ggml/src/llamafile/sgemm.o` was not deleted on `make clean`
Configuration menu - View commit details
-
Copy full SHA for ebd541a - Browse repository at this point
Copy the full SHA ebd541aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 85fca8d - Browse repository at this point
Copy the full SHA 85fca8dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b33ea1 - Browse repository at this point
Copy the full SHA 5b33ea1View commit details -
Configuration menu - View commit details
-
Copy full SHA for f93d49a - Browse repository at this point
Copy the full SHA f93d49aView commit details -
Configuration menu - View commit details
-
Copy full SHA for e44a561 - Browse repository at this point
Copy the full SHA e44a561View commit details -
Configuration menu - View commit details
-
Copy full SHA for 366d486 - Browse repository at this point
Copy the full SHA 366d486View commit details -
Configuration menu - View commit details
-
Copy full SHA for afd27f0 - Browse repository at this point
Copy the full SHA afd27f0View commit details -
gguf-py : simplify support for quant types (ggerganov#8838)
* gguf-py : use classes for quants * convert_hf : simplify internal quantization type selection * gguf-py : fix flake8 lint * gguf-py : fix BF16 numpy view type * gguf-py : remove LlamaFileTypeMap Too specific to 'llama.cpp', and would be a maintenance burden to keep up to date. * gguf-py : add generic quantize and dequantize functions The quant classes no longer need to be known, only the target or the source type, for 'quantize' and 'dequantize', respectively.
Configuration menu - View commit details
-
Copy full SHA for 3a14e00 - Browse repository at this point
Copy the full SHA 3a14e00View commit details
Commits on Aug 9, 2024
-
llama : reduce useless copies when saving session (ggerganov#8916)
* llama : avoid useless copies in dummy session writer * llama : avoid double tensor copy when saving session to buffer
Configuration menu - View commit details
-
Copy full SHA for 345a686 - Browse repository at this point
Copy the full SHA 345a686View commit details -
Configuration menu - View commit details
-
Copy full SHA for daef3ab - Browse repository at this point
Copy the full SHA daef3abView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f6496b - Browse repository at this point
Copy the full SHA 6f6496bView commit details -
embedding : add --pooling option to README.md [no ci] (ggerganov#8934)
This commit adds the `--pooling` option to the README.md file in the `examples/embedding` directory. The motivation for adding this options is that currently if the model used does not specify a pooling type the embedding example will fail with the following error message: ```console main: error: pooling type NONE not supported ``` This commit also updates the name of the executable in the examples section.
Configuration menu - View commit details
-
Copy full SHA for 5b2c04f - Browse repository at this point
Copy the full SHA 5b2c04fView commit details -
whisper : use vulkan as gpu backend when available (whisper/2302)
* ggml: use vulkan as gpu backend when available Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com> * whisper: enable using vk as default buffer type Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com> --------- Signed-off-by: Matt Stephenson <mstephenson6@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 70c0ea3 - Browse repository at this point
Copy the full SHA 70c0ea3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4305b57 - Browse repository at this point
Copy the full SHA 4305b57View commit details -
llava : support MiniCPM-V-2.5 (ggerganov#7599)
* init * rename * add run android for termux in readme * add android readme * add instructions in readme * change name in readme * Update README.md * fixed line * add result in readme * random pos_embed * add positions index * change for ollama * change for ollama * better pos_embed in clip * support ollama * updata cmakelist * updata cmakelist * rename wrapper * clear code * replace and organize code * add link * sync master * fix warnings * fix warnings * fix bug in bicubic resize when need resize iamge smaller * receive review comments and modify * receive review comments and modify * put all code into llava dir * fix quality problem in pr code * change n_layer * add space in "-1" * imitate reshape bug of python code * fix bug in clip * fix issues for merging * fix llama-minicpmv-cli in cmake file * change pr readme * fix code review * remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir * fix cmakefile * add warn * fix KEY_HAS_MINICPMV_PROJ * remove load_image_size into clip_ctx * remove the extern "C", MINICPMV_API * fix uhd code for review comment * delete minicpmv-wrapper in pr * remove uhd_image_embed * Modify 2 notes * clip : style changes * del common.h in clip * fix Type-Check error * fix Type-Check error * fix Type-Check error * fix Type-Check error * fix makefile error * fix ubuntu-make error * try fix clip * try fix 1 --------- Co-authored-by: Hongji Zhu <fireyoucan@gmail.com> Co-authored-by: harvestingmoon <leewenyeong@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 3071c0a - Browse repository at this point
Copy the full SHA 3071c0aView commit details -
llama : better replace_all (cont) (ggerganov#8926)
* llama : better replace_all (cont) ggml-ci * code : deduplicate replace_all ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 45a55b9 - Browse repository at this point
Copy the full SHA 45a55b9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 272e3bd - Browse repository at this point
Copy the full SHA 272e3bdView commit details -
llama : add support for lora adapters in T5 model (ggerganov#8938)
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 6afd1a9 - Browse repository at this point
Copy the full SHA 6afd1a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for b72942f - Browse repository at this point
Copy the full SHA b72942fView commit details
Commits on Aug 10, 2024
-
gguf-py : fix double call to add_architecture() (ggerganov#8952)
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 911b437 - Browse repository at this point
Copy the full SHA 911b437View commit details -
Configuration menu - View commit details
-
Copy full SHA for ea0c828 - Browse repository at this point
Copy the full SHA ea0c828View commit details -
Configuration menu - View commit details
-
Copy full SHA for fc1c860 - Browse repository at this point
Copy the full SHA fc1c860View commit details -
Configuration menu - View commit details
-
Copy full SHA for ce0d1a6 - Browse repository at this point
Copy the full SHA ce0d1a6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6cad864 - Browse repository at this point
Copy the full SHA 6cad864View commit details -
Configuration menu - View commit details
-
Copy full SHA for fe39ecc - Browse repository at this point
Copy the full SHA fe39eccView commit details -
Configuration menu - View commit details
-
Copy full SHA for bffbe1c - Browse repository at this point
Copy the full SHA bffbe1cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 28d6a0f - Browse repository at this point
Copy the full SHA 28d6a0fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4a87d1d - Browse repository at this point
Copy the full SHA 4a87d1dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 32b47f6 - Browse repository at this point
Copy the full SHA 32b47f6View commit details
Commits on Aug 12, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 662d4c1 - Browse repository at this point
Copy the full SHA 662d4c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for a945b3c - Browse repository at this point
Copy the full SHA a945b3cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 89d378c - Browse repository at this point
Copy the full SHA 89d378cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1ec79f0 - Browse repository at this point
Copy the full SHA 1ec79f0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1123376 - Browse repository at this point
Copy the full SHA 1123376View commit details -
Configuration menu - View commit details
-
Copy full SHA for f30c5e1 - Browse repository at this point
Copy the full SHA f30c5e1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 47eb0a5 - Browse repository at this point
Copy the full SHA 47eb0a5View commit details
Commits on Aug 13, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1ca3f06 - Browse repository at this point
Copy the full SHA 1ca3f06View commit details