Effi-13B AWQ is a quantization model of our Effi-13B a reasoning model.

About AWQ

AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization. Compared to GPTQ, it offers faster Transformers-based inference.

It is also now supported by continuous batching server vLLM, allowing use of AWQ models for high-throughput concurrent inference in multi-user server scenarios.

effi-13B parameters is a causal decoder-only model built by AI Planet based on Llama-2-13b-chat-hf and fine tuned using the 1.8 Million coversations from CoT dataset available in huggingface datasets. The model is made available under the Apache 2.0 license.

Why use effi-13B-Instruct?

  • This is a ready to use chat/instruct model based on Llama-2-13b-chat-hf, which provides a rationale for the context provided.
  • Llama-2 is the best open-source model available. This is an instruct model, which may not be ideal for further finetuning. If you are interested in building your own instruct/chat model, we recommend starting from Llama-2-13b-chat-hf You will need at least 85-100GB of memory to run inference with effi-13b swiftly.

Our benchmarking

Metric Value
Perplexity 5.529
MMLU 50.90
Hella Swag (acc) 59.38
Hella Swag (acc_norm) 78.91
TruthfulQA 38.24

Direct Use

effi-13b has been finetuned on a Chain of Thought dataset.

Out-of-Scope Use

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

Bias, Risks, and Limitations

This model has been majorly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.

Recommendations

We recommend users of effi-13b to develop guardrails and take appropriate precautions for any production use.

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information is needed for further recommendations.

Citations

@misc {lucifertrj,
    author       = { {Tarun Jain} },
    title        = { Effi-13B-AWQ by AI Planet},
    year         = 2024,
    url          = { https://huggingface.co/aiplanet/effi-13B-AWQ/ },
    publisher    = { Hugging Face }
}
Downloads last month
-
Safetensors
Model size
2B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support