We try to evaluate the LLM model performance in Indonesian.
There are many ways for calculate it, for example: BLEU, Perplexity, Human Eval, and GPT4 as Judge.
However, In our opinion, we use Perplexity as it was the fastest way for big evaluation dataset.
If you have any faster and easier ways for calculating BLUE or any other metrics. Feel free to contribute in this repo.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support