Model Card for dynamic-qwen-vpr-gamma0.5
This model is a dynamically-computed version of outputs/PRETRAIN-TEST-qwen2.5-0.5B-vpr-pretrain_mix-2025-08-28_13-29-10-gamma=0.5/final_model, fine-tuned
using the VPR architecture.
- Dynamic Architecture:
VPR - Capacity Gamma (γ):
0.5
The VPR architecture enables the model to conditionally skip parts of its
computation, aiming for improved efficiency. The capacity_gamma parameter
controls the portion of tokens processed by the dynamic components.
How to Use
This model requires trust_remote_code=True to load the custom architecture.
from transformers import AutoModelForCausalLM, AutoTokenizer
# It is recommended to load in bfloat16 for efficiency
model = AutoModelForCausalLM.from_pretrained(
"fredericowieser/dynamic-qwen-vpr-gamma0.5",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("fredericowieser/dynamic-qwen-vpr-gamma0.5")
# Example usage
prompt = "The capital of the United Kingdom is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Evaluation
Results on standard benchmarks:
| Task | Metric | Value |
|---|---|---|
| arc_challenge | acc,none | 0.2244 |
| arc_challenge | acc_norm,none | 0.2585 |
| arc_challenge | acc_norm_stderr,none | 0.0128 |
| arc_challenge | acc_stderr,none | 0.0122 |
| hellaswag | acc,none | 0.3020 |
| hellaswag | acc_norm,none | 0.3328 |
| hellaswag | acc_norm_stderr,none | 0.0047 |
| hellaswag | acc_stderr,none | 0.0046 |
| mmlu | acc,none | 0.2441 |
| mmlu | acc_stderr,none | 0.0036 |
| mmlu_abstract_algebra | acc,none | 0.2500 |
| mmlu_abstract_algebra | acc_stderr,none | 0.0435 |
| mmlu_anatomy | acc,none | 0.3111 |
| mmlu_anatomy | acc_stderr,none | 0.0400 |
| mmlu_astronomy | acc,none | 0.2303 |
| mmlu_astronomy | acc_stderr,none | 0.0343 |
| mmlu_business_ethics | acc,none | 0.2200 |
| mmlu_business_ethics | acc_stderr,none | 0.0416 |
| mmlu_clinical_knowledge | acc,none | 0.2113 |
| mmlu_clinical_knowledge | acc_stderr,none | 0.0251 |
| mmlu_college_biology | acc,none | 0.2569 |
| mmlu_college_biology | acc_stderr,none | 0.0365 |
| mmlu_college_chemistry | acc,none | 0.2500 |
| mmlu_college_chemistry | acc_stderr,none | 0.0435 |
| mmlu_college_computer_science | acc,none | 0.2300 |
| mmlu_college_computer_science | acc_stderr,none | 0.0423 |
| mmlu_college_mathematics | acc,none | 0.2800 |
| mmlu_college_mathematics | acc_stderr,none | 0.0451 |
| mmlu_college_medicine | acc,none | 0.2717 |
| mmlu_college_medicine | acc_stderr,none | 0.0339 |
| mmlu_college_physics | acc,none | 0.2745 |
| mmlu_college_physics | acc_stderr,none | 0.0444 |
| mmlu_computer_security | acc,none | 0.2300 |
| mmlu_computer_security | acc_stderr,none | 0.0423 |
| mmlu_conceptual_physics | acc,none | 0.2468 |
| mmlu_conceptual_physics | acc_stderr,none | 0.0282 |
| mmlu_econometrics | acc,none | 0.2193 |
| mmlu_econometrics | acc_stderr,none | 0.0389 |
| mmlu_electrical_engineering | acc,none | 0.2345 |
| mmlu_electrical_engineering | acc_stderr,none | 0.0353 |
| mmlu_elementary_mathematics | acc,none | 0.2540 |
| mmlu_elementary_mathematics | acc_stderr,none | 0.0224 |
| mmlu_formal_logic | acc,none | 0.1905 |
| mmlu_formal_logic | acc_stderr,none | 0.0351 |
| mmlu_global_facts | acc,none | 0.2100 |
| mmlu_global_facts | acc_stderr,none | 0.0409 |
| mmlu_high_school_biology | acc,none | 0.2387 |
| mmlu_high_school_biology | acc_stderr,none | 0.0243 |
| mmlu_high_school_chemistry | acc,none | 0.2315 |
| mmlu_high_school_chemistry | acc_stderr,none | 0.0297 |
| mmlu_high_school_computer_science | acc,none | 0.1700 |
| mmlu_high_school_computer_science | acc_stderr,none | 0.0378 |
| mmlu_high_school_european_history | acc,none | 0.2121 |
| mmlu_high_school_european_history | acc_stderr,none | 0.0319 |
| mmlu_high_school_geography | acc,none | 0.1869 |
| mmlu_high_school_geography | acc_stderr,none | 0.0278 |
| mmlu_high_school_government_and_politics | acc,none | 0.2798 |
| mmlu_high_school_government_and_politics | acc_stderr,none | 0.0324 |
| mmlu_high_school_macroeconomics | acc,none | 0.2282 |
| mmlu_high_school_macroeconomics | acc_stderr,none | 0.0213 |
| mmlu_high_school_mathematics | acc,none | 0.2333 |
| mmlu_high_school_mathematics | acc_stderr,none | 0.0258 |
| mmlu_high_school_microeconomics | acc,none | 0.2479 |
| mmlu_high_school_microeconomics | acc_stderr,none | 0.0280 |
| mmlu_high_school_physics | acc,none | 0.2450 |
| mmlu_high_school_physics | acc_stderr,none | 0.0351 |
| mmlu_high_school_psychology | acc,none | 0.1963 |
| mmlu_high_school_psychology | acc_stderr,none | 0.0170 |
| mmlu_high_school_statistics | acc,none | 0.2361 |
| mmlu_high_school_statistics | acc_stderr,none | 0.0290 |
| mmlu_high_school_us_history | acc,none | 0.2451 |
| mmlu_high_school_us_history | acc_stderr,none | 0.0302 |
| mmlu_high_school_world_history | acc,none | 0.2658 |
| mmlu_high_school_world_history | acc_stderr,none | 0.0288 |
| mmlu_human_aging | acc,none | 0.3318 |
| mmlu_human_aging | acc_stderr,none | 0.0316 |
| mmlu_human_sexuality | acc,none | 0.2366 |
| mmlu_human_sexuality | acc_stderr,none | 0.0373 |
| mmlu_humanities | acc,none | 0.2440 |
| mmlu_humanities | acc_stderr,none | 0.0063 |
| mmlu_international_law | acc,none | 0.2562 |
| mmlu_international_law | acc_stderr,none | 0.0398 |
| mmlu_jurisprudence | acc,none | 0.1852 |
| mmlu_jurisprudence | acc_stderr,none | 0.0376 |
| mmlu_logical_fallacies | acc,none | 0.3067 |
| mmlu_logical_fallacies | acc_stderr,none | 0.0362 |
| mmlu_machine_learning | acc,none | 0.2946 |
| mmlu_machine_learning | acc_stderr,none | 0.0433 |
| mmlu_management | acc,none | 0.2136 |
| mmlu_management | acc_stderr,none | 0.0406 |
| mmlu_marketing | acc,none | 0.2479 |
| mmlu_marketing | acc_stderr,none | 0.0283 |
| mmlu_medical_genetics | acc,none | 0.3100 |
| mmlu_medical_genetics | acc_stderr,none | 0.0465 |
| mmlu_miscellaneous | acc,none | 0.2708 |
| mmlu_miscellaneous | acc_stderr,none | 0.0159 |
| mmlu_moral_disputes | acc,none | 0.2457 |
| mmlu_moral_disputes | acc_stderr,none | 0.0232 |
| mmlu_moral_scenarios | acc,none | 0.2212 |
| mmlu_moral_scenarios | acc_stderr,none | 0.0139 |
| mmlu_nutrition | acc,none | 0.2451 |
| mmlu_nutrition | acc_stderr,none | 0.0246 |
| mmlu_other | acc,none | 0.2546 |
| mmlu_other | acc_stderr,none | 0.0078 |
| mmlu_philosophy | acc,none | 0.2797 |
| mmlu_philosophy | acc_stderr,none | 0.0255 |
| mmlu_prehistory | acc,none | 0.2284 |
| mmlu_prehistory | acc_stderr,none | 0.0234 |
| mmlu_professional_accounting | acc,none | 0.2305 |
| mmlu_professional_accounting | acc_stderr,none | 0.0251 |
| mmlu_professional_law | acc,none | 0.2458 |
| mmlu_professional_law | acc_stderr,none | 0.0110 |
| mmlu_professional_medicine | acc,none | 0.2279 |
| mmlu_professional_medicine | acc_stderr,none | 0.0255 |
| mmlu_professional_psychology | acc,none | 0.2631 |
| mmlu_professional_psychology | acc_stderr,none | 0.0178 |
| mmlu_public_relations | acc,none | 0.2000 |
| mmlu_public_relations | acc_stderr,none | 0.0383 |
| mmlu_security_studies | acc,none | 0.2163 |
| mmlu_security_studies | acc_stderr,none | 0.0264 |
| mmlu_social_sciences | acc,none | 0.2314 |
| mmlu_social_sciences | acc_stderr,none | 0.0076 |
| mmlu_sociology | acc,none | 0.2239 |
| mmlu_sociology | acc_stderr,none | 0.0295 |
| mmlu_stem | acc,none | 0.2461 |
| mmlu_stem | acc_stderr,none | 0.0077 |
| mmlu_us_foreign_policy | acc,none | 0.2900 |
| mmlu_us_foreign_policy | acc_stderr,none | 0.0456 |
| mmlu_virology | acc,none | 0.2771 |
| mmlu_virology | acc_stderr,none | 0.0348 |
| mmlu_world_religions | acc,none | 0.3158 |
| mmlu_world_religions | acc_stderr,none | 0.0357 |
| truthfulqa_mc2 | acc,none | 0.4447 |
| truthfulqa_mc2 | acc_stderr,none | 0.0155 |
| winogrande | acc,none | 0.5067 |
| winogrande | acc_stderr,none | 0.0141 |
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support