sync Prepare pr of minicpm v2.6 #26

tc-mb · 2024-08-15T09:32:06Z

sync Prepare pr of minicpm v2.6

Rename an old command name "finetune" to "llama-finetune" in README.md Signed-off-by: Masanari Iida <standby24x7@gmail.com>

ggml-ci

* add chatglm3-6b model support huggingface model: https://hf-mirror.com/THUDM/chatglm3-6b Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * fix lint error Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * optimize convert-hf-to-gguf.py for chatglm model Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * support glm-4-9b-chat Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> * fix eos tokens to glm4 * remove unused log * add preprocess to chatglm3 and chatglm4 * add eos_id_list to llama.cpp * fix code style * fix code style * fix conflicts * fix conflicts * Revert "add eos_id_list to llama.cpp" This reverts commit 3a4d579. * set <|endoftext|> as eos and <|user|> as eot * fix chat template bug * add comment to glm prefix and suffix * fix conflicts and add rope_ratio & ChatGLMForConditionalGeneration * fix chat template bug * fix codestyle * fix conflicts * modified the general name of glm model * fix conflicts * remove prefix and suffix * use normal glm4 chattempalte & use LLM_FFN_SWIGLU in phi3 * fix: resolve Flake8 errors in `convert-hf-to-gguf.py` - Fix E302 by adding two blank lines before top-level function definitions - Replace print statements to fix NP100 - Fix E303 by ensuring only one blank line between lines of code * fix rope ratio to solve incorrect answers * fix by comments --------- Signed-off-by: XingXing Qiao <qiaoxx@dingdao.com> Co-authored-by: XingXing Qiao <qiaoxx@dingdao.com> Co-authored-by: Umpire2018 <138990495+Umpire2018@users.noreply.github.com>

…gerganov#8048) CLI to hash GGUF files to detect difference on a per model and per tensor level The hash type we support is: - `--xxh64`: use xhash 64bit hash mode (default) - `--sha1`: use sha1 - `--uuid`: use uuid - `--sha256`: use sha256 While most POSIX systems already have hash checking programs like sha256sum, it is designed to check entire files. This is not ideal for our purpose if we want to check for consistency of the tensor data even if the metadata content of the gguf KV store has been updated. This program is designed to hash a gguf tensor payload on a 'per tensor layer' in addition to a 'entire tensor model' hash. The intent is that the entire tensor layer can be checked first but if there is any detected inconsistencies, then the per tensor hash can be used to narrow down the specific tensor layer that has inconsistencies. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* adding guile_llama_cpp to binding list * fix formatting * fix formatting

* Added checks for cmake,make and ctest * Removed erroneous whitespace

* Update README.md * Update README.md * Update README.md fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas * Update README.md * Update README.md * Update README.md

* py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.

…nov#8351)

`emplace_back` repeatedly-called is slower than preallocating the vector to the vocab size and directly inserting the data. Some rudimentary profiling with `chrono` improves the performance of this block of code from ~500us/op to ~40us/op. Overall, this slightly improves the sampling performance which has a more substantial impact for the `examples/lookahead` implementation -- I am able to see a ~10% performance boost in lookahead inference.

* conv transpose 1d passing test for 1d input and kernel * working for different input and output channel counts, added test for variable stride * initial draft appears to work with stride other than 1 * working with all old and new conv1d tests * added a test for large tensors * removed use cuda hardcoding * restored test-conv-transpose.c * removed unused arugments, and fixed bug where test failure would cause subsequent tests to fail * fixed accumulator bug * added test to test-backend-ops * fixed mistake * addressed review * fixed includes * removed blank lines * style and warning fixes * return failure when test fails * fix supports_op --------- Co-authored-by: slaren <slarengh@gmail.com>

ggml-ci

Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/2a55567fcf15b1b1c7ed712a2c6fadaec7412ea8?narHash=sha256-iKzJcpdXih14qYVcZ9QC9XuZYnPc6T8YImb6dX166kw%3D' (2024-06-01) → 'github:hercules-ci/flake-parts/9227223f6d922fee3c7b190b2cc238a99527bbb7?narHash=sha256-pQMhCCHyQGRzdfAkdJ4cIWiw%2BJNuWsTX7f0ZYSyz0VY%3D' (2024-07-03) • Updated input 'flake-parts/nixpkgs-lib': 'https://github.com/NixOS/nixpkgs/archive/eb9ceca17df2ea50a250b6b27f7bf6ab0186f198.tar.gz?narHash=sha256-lIbdfCsf8LMFloheeE6N31%2BBMIeixqyQWbSr2vk79EQ%3D' (2024-06-01) → 'https://github.com/NixOS/nixpkgs/archive/5daf0514482af3f97abaefc78a6606365c9108e2.tar.gz?narHash=sha256-Fm2rDDs86sHy0/1jxTOKB1118Q0O3Uc7EC0iXvXKpbI%3D' (2024-07-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/b2852eb9365c6de48ffb0dc2c9562591f652242a?narHash=sha256-C8e9S7RzshSdHB7L%2Bv9I51af1gDM5unhJ2xO1ywxNH8%3D' (2024-06-27) → 'github:NixOS/nixpkgs/9f4128e00b0ae8ec65918efeba59db998750ead6?narHash=sha256-rwz8NJZV%2B387rnWpTYcXaRNvzUSnnF9aHONoJIYmiUQ%3D' (2024-07-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Bakus-Naur --> Backus-Naur

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment

…rganov#8283) * Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from ggerganov#7809 and migrate to the new filenames. * Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

…gerganov#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

standby24x7 and others added 30 commits July 7, 2024 13:38

finetune: Rename command name in README.md (ggerganov#8343)

b81ba1f

Rename an old command name "finetune" to "llama-finetune" in README.md Signed-off-by: Masanari Iida <standby24x7@gmail.com>

py : use cpu-only torch in requirements.txt (ggerganov#8335)

d39130a

llama : fix n_rot default (ggerganov#8348)

b504008

ggml-ci

readme : update bindings list (ggerganov#8222)

f1948f1

* adding guile_llama_cpp to binding list * fix formatting * fix formatting

ci : add checks for cmake,make and ctest in ci/run.sh (ggerganov#8200)

4090ea5

* Added checks for cmake,make and ctest * Removed erroneous whitespace

Update llama-cli documentation (ggerganov#8315)

a8db2a9

* Update README.md * Update README.md * Update README.md fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas * Update README.md * Update README.md * Update README.md

readme : add supported glm models (ggerganov#8360)

04ce3a8

common : avoid unnecessary logits fetch (ggerganov#8358)

ffd0079

infill : assert prefix/suffix tokens + remove old space logic (ggerga…

6f0dbf6

…nov#8351)

tests : fix whitespace (#0)

6847d54

sync : ggml

2ee44c9

ggml-ci

scripts : fix sync for sycl

3f2d538

sycl : fix powf call in device code (ggerganov#8368)

2ec846d

readme : fix web link error [no ci] (ggerganov#8347)

c4dd11d

labeler : updated sycl to match docs and code refactor (ggerganov#8373)

a130ecc

gguf-py : do not use internal numpy types (ggerganov#7472)

7d0e23d

readme : fix typo [no ci] (ggerganov#8389)

9beb2dd

Bakus-Naur --> Backus-Naur

cmake : allow external ggml (ggerganov#8370)

9925ca4

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (ggerganov#8372)

5b0b8d8

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend * Reduced verbosity of comment

make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (ggerganov#8392)

a03e8dd

Update README.md to fix broken link to docs (ggerganov#8399)

fd560fe

Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'

Server: Enable setting default sampling parameters via command-line (g…

a59f8fd

…gerganov#8402) * Load server sampling parameters from the server context by default. * Wordsmithing comment

py : fix extra space in convert_hf_to_gguf.py (ggerganov#8407)

8f0fad4

tc-mb added 12 commits August 10, 2024 18:19

add resampler of v2.6

bffbe1c

modify clip

28d6a0f

modify readme

4a87d1d

fix type-check

32b47f6

fix type-check

662d4c1

fix type-check

a945b3c

fix type-check

89d378c

modify convert script and readme

1ec79f0

fix convert script and readme

1123376

fix convert

f30c5e1

fix num in convert

47eb0a5

fix type-check

1ca3f06

github-actions bot added documentation Improvements or additions to documentation examples SYCL Nvidia GPU Vulkan testing build devops python android server ggml Kompute Apple Metal script nix labels Aug 15, 2024

tc-mb merged commit f23b44b into minicpmv-main-dev Aug 15, 2024
3 of 6 checks passed

tc-mb deleted the prepare-PR-of-minicpm-v2.6 branch August 20, 2024 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync Prepare pr of minicpm v2.6 #26

sync Prepare pr of minicpm v2.6 #26

tc-mb commented Aug 15, 2024

sync Prepare pr of minicpm v2.6 #26

sync Prepare pr of minicpm v2.6 #26

Conversation

tc-mb commented Aug 15, 2024