Instructions to use Someshfengde/llama-3-instruction-tuned-AIMO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Someshfengde/llama-3-instruction-tuned-AIMO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Someshfengde/llama-3-instruction-tuned-AIMO") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Someshfengde/llama-3-instruction-tuned-AIMO") model = AutoModelForCausalLM.from_pretrained("Someshfengde/llama-3-instruction-tuned-AIMO") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Someshfengde/llama-3-instruction-tuned-AIMO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Someshfengde/llama-3-instruction-tuned-AIMO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Someshfengde/llama-3-instruction-tuned-AIMO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Someshfengde/llama-3-instruction-tuned-AIMO
- SGLang
How to use Someshfengde/llama-3-instruction-tuned-AIMO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Someshfengde/llama-3-instruction-tuned-AIMO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Someshfengde/llama-3-instruction-tuned-AIMO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Someshfengde/llama-3-instruction-tuned-AIMO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Someshfengde/llama-3-instruction-tuned-AIMO", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Someshfengde/llama-3-instruction-tuned-AIMO with Docker Model Runner:
docker model run hf.co/Someshfengde/llama-3-instruction-tuned-AIMO
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Instruction Tuning LLAMA3
This repo uses the torchtune for instruction tuning the llama3 pretrained model on mathematical tasks using LORA.
Wandb report link
https://wandb.ai/som/torchtune_llama3?nw=nwusersom
Instruction_tuned Model
https://huggingface.co/Someshfengde/llama-3-instruction-tuned-AIMO
Original metallama model
https://huggingface.co/meta-llama/Meta-Llama-3-8B
For running this project
> pip install poetry
> poetry install
Further commands over shell terminal
To download the model
tune download meta-llama/Meta-Llama-3-8B \
--output-dir llama3-8b-hf \
--hf-token <HF_TOKEN>
To start instruction tuning with lora and torchtune
tune run lora_finetune_single_device --config ./lora_finetune_single_device.yaml
To quantize the model
tune run quantize --config ./quantization_config.yaml
To generate inference from model.
tune run generate --config ./generation_config.yaml \
prompt="what is 2 + 2."
Dataset used
https://huggingface.co/datasets/Someshfengde/AIMO_dataset
Evaluations
To run evaluations
tune run eleuther_eval --config ./eval_config.yaml
TruthfulQA: 0.42
MMLU Abstract Algebra: 0.35
MATHQA: 0.33
Agieval_sat_math: 0.31
- Downloads last month
- 5



