server: Add "tokens per second" information in the backend (#10548) #38
build.yml
on: push
Matrix: windows-2019-cmake-cuda
Matrix: windows-latest-cmake-hip-release
Matrix: windows-latest-cmake
macOS-latest-cmake-arm64
12m 50s
macOS-latest-cmake-x64
3m 50s
ubuntu-focal-make
3m 20s
ubuntu-latest-cmake
2m 42s
macOS-latest-make
2m 4s
macOS-latest-cmake
11m 43s
ubuntu-focal-make-curl
2m 39s
ubuntu-latest-cmake-rpc
2m 16s
ubuntu-22-cmake-vulkan
2m 35s
ubuntu-22-cmake-hip
18m 40s
ubuntu-22-cmake-musa
11m 57s
ubuntu-22-cmake-sycl
4m 46s
ubuntu-22-cmake-sycl-fp16
4m 36s
macOS-latest-cmake-ios
1m 5s
macOS-latest-cmake-tvos
1m 23s
ubuntu-latest-cmake-cuda
11m 35s
windows-latest-cmake-sycl
9m 46s
windows-latest-cmake-hip
32m 43s
ios-xcode-build
50s
android-build
6m 25s
Matrix: macOS-latest-swift
Matrix: ubuntu-latest-cmake-sanitizer
Matrix: windows-msys2
release
1m 11s
Annotations
1 error and 11 warnings
Artifacts
Produced during runtime
Name | Size | |
---|---|---|
cudart-llama-bin-win-cu11.7-x64.zip
|
303 MB |
|
cudart-llama-bin-win-cu12.4-x64.zip
|
372 MB |
|
llama-bin-macos-arm64.zip
|
51.9 MB |
|
llama-bin-macos-x64.zip
|
53.5 MB |
|
llama-bin-ubuntu-x64.zip
|
58.7 MB |
|
llama-bin-win-avx-x64.zip
|
8.51 MB |
|
llama-bin-win-avx2-x64.zip
|
8.52 MB |
|
llama-bin-win-avx512-x64.zip
|
8.53 MB |
|
llama-bin-win-cu11.7-x64.zip
|
145 MB |
|
llama-bin-win-cu12.4-x64.zip
|
145 MB |
|
llama-bin-win-hip-x64-gfx1030.zip
|
228 MB |
|
llama-bin-win-hip-x64-gfx1100.zip
|
230 MB |
|
llama-bin-win-hip-x64-gfx1101.zip
|
230 MB |
|
llama-bin-win-kompute-x64.zip
|
8.81 MB |
|
llama-bin-win-llvm-arm64.zip
|
10.1 MB |
|
llama-bin-win-msvc-arm64.zip
|
12.8 MB |
|
llama-bin-win-noavx-x64.zip
|
8.49 MB |
|
llama-bin-win-openblas-x64.zip
|
19.5 MB |
|
llama-bin-win-sycl-x64.zip
|
89.2 MB |
|
llama-bin-win-vulkan-x64.zip
|
9.27 MB |
|