GLM-4 better than GLM-Z1 for coding

#2
by AekDevDev - opened

I'm skeptical of the performance graph on the model card page. It shows that the model can compete with DeepSeek R1, but when I tested it with coding tasks, it failed most of the time. Even GLM-4 passed the tests, but this model (GLM-Z1) didn’t.

Share some task?

Sign up or log in to comment