Instructions to use allenai/OLMo-2-0425-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allenai/OLMo-2-0425-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="allenai/OLMo-2-0425-1B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-0425-1B") model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0425-1B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use allenai/OLMo-2-0425-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "allenai/OLMo-2-0425-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-2-0425-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/allenai/OLMo-2-0425-1B
- SGLang
How to use allenai/OLMo-2-0425-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "allenai/OLMo-2-0425-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-2-0425-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "allenai/OLMo-2-0425-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-2-0425-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use allenai/OLMo-2-0425-1B with Docker Model Runner:
docker model run hf.co/allenai/OLMo-2-0425-1B
The released final ckpt and stage2-ingredient1-step23852-tokens51B ckpt have different eval results
As mentioned in #1, the released final checkpoint corresponds to ingredient 1, stage2-ingredient1-step23852-tokens51B. I use lm-evaluation-harness to evaluate allenai/OLMo-2-0425-1B and stage2-ingredient1-step23852-tokens51B, and they have different results on MMLU and gsm8k.
Can you please clarify why the released ckpt has lower evaluation results? Thanks.
MMLU:
released final:
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.4257|± |0.0041|
| - humanities | 2|none | |acc |↑ |0.3947|± |0.0069|
| - other | 2|none | |acc |↑ |0.4870|± |0.0088|
| - social sciences| 2|none | |acc |↑ |0.4807|± |0.0089|
| - stem | 2|none | |acc |↑ |0.3578|± |0.0084|
stage2-ingredient1-step23852-tokens51B:
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |↑ |0.4417|± |0.0041|
| - humanities | 2|none | |acc |↑ |0.4136|± |0.0069|
| - other | 2|none | |acc |↑ |0.4957|± |0.0088|
| - social sciences| 2|none | |acc |↑ |0.5018|± |0.0088|
| - stem | 2|none | |acc |↑ |0.3717|± |0.0085|
gsm8k:
released final:
hf (pretrained=allenai/OLMo-2-0425-1B,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 4, batch_size: auto
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot| 3|flexible-extract| 4|exact_match|↑ |0.4079|± |0.0135|
| | |strict-match | 4|exact_match|↑ |0.4003|± |0.0135|
stage2-ingredient1-step23852-tokens51B:
hf (pretrained=allenai/OLMo-2-0425-1B,revision=stage2-ingredient1-step23852-tokens51B,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 4, batch_size: auto
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot| 3|flexible-extract| 4|exact_match|↑ |0.4594|± |0.0137|
| | |strict-match | 4|exact_match|↑ |0.4223|± |0.0136|
I use the same evaluation setting:
lm_eval --model hf \
--model_args pretrained=allenai/OLMo-2-0425-1B(,revision=stage2-ingredient1-step23852-tokens51B) \
--tasks gsm8k_cot\
--batch_size auto \
--num_fewshot 4 \
--trust_remote_code \
--confirm_run_unsafe_code
Also the description in allenai/OLMo claims that the released main ckpt is merged from soup, which are different from the description on the hf model page and #1.
Hey @wydwww , thanks for raising this issue. I have cross verified with the team on this again.
- There is no model souping (there was a typo in README file on Github OLMo repo, I fixed it).
- From my #1 comment, I was wrong. Ingredient 3 is seed 42 and it is the final main checkpoint. Not the ingredients 1 and 2, they are just exploratory anneals. I addressed it in #1.
- To clear out things, I have updated the readme.
Sorry for the inconvenience. You can retry the evals.
Thanks for your reply @amanrangapur . I ran the gsm8k eval of stage2-ingredient3-step23852-tokens51B with the same command, and still got a significantly higher result (0.4549) than the main ckpt (0.4079). FYI, the ingredient 2 ckpt has a 0.4556 score in this setting. Did you use any post-processing to get the final ckpt?
hf (pretrained=allenai/OLMo-2-0425-1B,revision=stage2-ingredient3-step23852-tokens51B,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 4, batch_size: auto
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|---------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot| 3|flexible-extract| 4|exact_match|↑ |0.4549|± |0.0137|
| | |strict-match | 4|exact_match|↑ |0.4511|± |0.0137|
Hey @wydwww , we did not use any post-processing on final checkpoint. We selected one of the ingredients (anneals) based on average scores of evals.
@amanrangapur It seems that the final ckpt does not match any of the 3 ingredient ckpts. Do you have some thoughts on this? Can you please verify the main and stage2-ingredient3-step23852-tokens51Bckpts are the same in your setting? Thanks.