Questions on FP8 inference, parallel requests, and context length with 4x H200s
#15 opened about 4 hours ago
by
sultan93
Does its api support formot?
#14 opened about 9 hours ago
by
Connde
jinja2 chat template is malformed
1
#13 opened about 20 hours ago
by
smcleod

Impressive Broad Knowledge
π
2
2
#12 opened about 23 hours ago
by
phil111
Thinking tokens issue
3
#9 opened 1 day ago
by
iyanello
Benchmarks for non-thinking mode
π
3
1
#8 opened 2 days ago
by
PSM24
Thankyou GLM Team for the wonderful MOE Model
π₯
2
#7 opened 2 days ago
by
Narutoouz

When GGUF?
π₯
π
13
5
#6 opened 3 days ago
by
ChuckMcSneed

AWQ 4Bit / GPTQ with full precision gates and head? Please
5
#4 opened 3 days ago
by
chriswritescode

We Have Gemini At Home
4
#1 opened 3 days ago
by
MarinaraSpaghetti
