Spaces:
Running
Running
Commit History
llamafile : ppc64le MMA implementation for Q4_0. (llama/12489) d154905
amritahs-ibm commited on
SYCL: implement memset ggml backend buffer interface (llama/12580) 3f95f2b
HIP: Add support for RDNA4 targets (llama/12372) a73f01f
Slobodan Josic commited on
metal : refactor mat-vec code (llama/12569) 71d72f9
ggml : fix MUL_MAT_ID repack with Q8_K (llama/12544) a13f78c
ggml-cpu : update KleidiAI to v1.5.0 (llama/12568) 9b4460a
Dan Johansson commited on
SYCL: disable Q4_0 reorder optimization (llama/12560) 33f8316
opencl: simplify kernel embedding logic in cmakefile (llama/12503) 5f131ac
lhez Max Krasnyansky commited on
CUDA: Fix clang warnings (llama/12540) efa6dac
R0CKSTAR commited on
vulkan: fix mul_mat_vec failure in backend tests (llama/12529) 09dd86a
ggml : fix quantized cpy op (llama/12310) 608b377
musa: refine compute capability (llama/12493) 5e508d2
R0CKSTAR commited on
vulkan: Optimize mul_mat_vec p021 and nc shaders (llama/12505) 6868981
Vulkan: RTE rounding for cpy to quant (llama/12480) 8707beb
vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (llama/12472) 417a5d6
Eve commited on
Fix build on Windows when ccache enabled (ggml/9954) (llama/9976) bbd0292
蕭澧邦 Romain Biessy commited on
sycl: cleanup oneDNN related code (llama/12097) 959346b
Svetlozar Georgiev commited on
ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (llama/12332) 0729506
Srihari-mcw commited on
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183) 3a7ca19
vulkan: optimize iq1 coopmat2 dequant functions (llama/12427) 53dd8ad
Fix visionOS build and add CI (llama/12415) ecb4322
vulkan: Submit once enough matmul work has been recorded (llama/12406) ec77b2c
opencl: improve profiling (llama/12442) 4abe3ae
lhez commited on
musa: override warp_size of musa device to 32 (llama/12445) 184c152
R0CKSTAR commited on
SYCL: using graphs is configurable by environment variable and compile option (llama/12371) c18969f
Łukasz Ślusarczyk Romain Biessy commited on
ggml : add SVE support for q6_K_q8_K (llama/12361) 607a196
fj-y-saito commited on
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (llama/12434) 55088d3
fixed compilation warnings in ggml-sycl (llama/12424) 77ff985
Łukasz Ślusarczyk commited on
llama: Add support for RWKV v7 architecture (llama/12412) 727de7e
cuda : enable CUDA Graph on CUDA Toolkit < 12.x (llama/12394) 1e69b8c
Gaurav Garg commited on
ggml-vulkan: remove unused find_program(glslc) (llama/12416) 40652de
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (llama/12312) c9f86c1
vulkan: subgroup size tuning (llama/12087) af63c3d
vulkan: use fp32 in coopmat2 q4_k dequant function (llama/12309) 9ca84c6
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (llama/12273) 5d51f1c
vulkan: Adjust coopmat2 tile sizes and selection heuristic (llama/12258) 3cc6539
cmake : enable building llama.cpp using system libggml (llama/12321) 6da01d6
Christian Kastner commited on
SYCL: set extras only on GGML_TYPE_Q4_0 (llama/12366) 6f03947
SYCL: Delete redundant plus sign and space (llama/12391) 5b32141
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (llama/12399) 2d7a940
MUL_MAT optimization (llama/12382) 9dd08d5
Chenguang Li commited on
sycl : variable sg_size support for mmvq kernels (llama/12336) 83e6f74
Alberto Cabrera Pérez commited on
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (llama/12315) 2adc060
uvos commited on
vulkan: fix bug in coopmat1 mul_mat_id (llama/12316) 1e50161
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (llama/12177) 1f75790
ggml-backend : fix backend search path (llama/12330) 66d5b20
jklincn commited on
metal : Cache the Metal library at the device context level (llama/12265) e3908a2
BB-fat commited on
mat vec double buffer (llama/12188) 7274b04
Eve commited on
musa: support new arch mp_31 and update doc (llama/12296) 9aca77e
R0CKSTAR commited on