Spaces:
Running
Running
Commit History
ggml: fix compile error for RISC-V (llama/8623) 4eec44b
Mark Zhuang commited on
CUDA: MMQ code deduplication + iquant support (llama/8495) 6d14124
gguf : handle null name during init (llama/8587) 2f95156
ggml : fix quant dot product with odd number of blocks (llama/8549) 0083f96
ggml : add friendlier error message to fopen errors (llama/8575) ab5b4e0
CUDA: fix partial offloading for ne0 % 256 != 0 (llama/8572) afc137c
Add Ascend NPU backend (llama/6035) 3175a17
make/cmake: add missing force MMQ/cuBLAS for HIP (llama/8515) 5096c91
Refactor lora adapter support (llama/8332) 76bcfc6
add concat through dim 1/2 (llama/8483) acf23d9
Vulkan MMQ Fix (llama/8479) e2989d0
vulkan : cmake integration (llama/8119) a094e22
bandoti commited on
metal : template-ify some of the kernels (llama/8447) 3c3094f
ggml : minor naming changes (llama/8433) e0c6dff
ggml : add NVPL BLAS support (ggml/8329) (llama/8425) 4816a87
cuda : suppress 'noreturn' warn in no_device_code (llama/8414) 13c1163
CUDA: optimize and refactor MMQ (llama/8416) a3fe534
Use multi_ptr to clean up deprecated warnings (llama/8256) 6dbe297
AidanBeltonS commited on
ggml : move sgemm sources to llamafile subfolder (llama/8394) 1554348
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (llama/5780) 9509586
Dibakar Gope commited on
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372) b969571
Alberto Cabrera Pérez commited on
sycl : fix powf call in device code (llama/8368) 011fbfd
Alberto Cabrera Pérez commited on
ggml : loop tiling optimizations for scalar path (ggml/898) 1c4b0ca
Mahesh Madhav commited on
ggml: add support for float16 input tensors in pooling operations (ggml/895) 8248d8e
Ivan Filipov vanaka11 commited on
vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893) 8c409e3
Tony Wasserka Tony Wasserka commited on
whisper : use vulkan as gpu backend when available (#2302) 0755fa0 unverified
Matt Stephenson commited on
ggml : sync sycl (skip) (#0) bf6ccee
ggml : remove unnecessary UNUSED macro call (ggml/880) ab9a7d0
cmake : add GGML_BUILD and GGML_SHARED macro definitions (llama/8281) a8f9bda
Enabled more data types for oneMKL gemm_batch (llama/8236) 08501f8
Ouadie EL FAROUKI commited on
CUDA: MMQ support for iq4_nl, iq4_xs (llama/8278) 8411e3c
CUDA: revert part of the RDNA1 optimizations (llama/8309) fcd0c52
Daniele commited on
CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (llama/8311) 04d4209
Fix WARP_SIZE=16 bug of Intel GPU (llama/8266) 1ce11e2
rm get_work_group_size() by local cache for performance (llama/8286) 08fd758
Neo Zhang Jianyu arthw commited on
Define and optimize RDNA1 (llama/8085) 6aa5a89
Daniele commited on
fix typo (llama/8267) 0c9c7c8
Judd Judd commited on
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258) cc49462
cuda : update supports_op for matrix multiplication (llama/8245) 2314334
slaren commited on
Fix win build conflict of math library (llama/8230) 5a33963
Fix the sub group size of Intel (llama/8106) 2dd429e
CUDA: refactor and optimize IQ MMVQ (llama/8215) afa1447
Update SYCL-Rope op and Refactor (llama/8157) 06acee2
CUDA: fix MMQ stream-k for --split-mode row (llama/8167) ef3d018
feat: cuda implementation for `ggml_conv_transpose_1d` (ggml/854) 025493b
John Balis slaren commited on
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) (llama/8140) e83fdad unverified
slaren commited on