Model Card for dynamic-qwen-vpr-gamma0.5

This model is a dynamically-computed version of outputs/PRETRAIN-TEST-qwen2.5-0.5B-vpr-pretrain_mix-2025-08-28_13-29-10-gamma=0.5/final_model, fine-tuned using the VPR architecture.

  • Dynamic Architecture: VPR
  • Capacity Gamma (γ): 0.5

The VPR architecture enables the model to conditionally skip parts of its computation, aiming for improved efficiency. The capacity_gamma parameter controls the portion of tokens processed by the dynamic components.

How to Use

This model requires trust_remote_code=True to load the custom architecture.

from transformers import AutoModelForCausalLM, AutoTokenizer

# It is recommended to load in bfloat16 for efficiency
model = AutoModelForCausalLM.from_pretrained(
    "fredericowieser/dynamic-qwen-vpr-gamma0.5",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("fredericowieser/dynamic-qwen-vpr-gamma0.5")

# Example usage
prompt = "The capital of the United Kingdom is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Evaluation

Results on standard benchmarks:

Task Metric Value
arc_challenge acc,none 0.2244
arc_challenge acc_norm,none 0.2585
arc_challenge acc_norm_stderr,none 0.0128
arc_challenge acc_stderr,none 0.0122
hellaswag acc,none 0.3020
hellaswag acc_norm,none 0.3328
hellaswag acc_norm_stderr,none 0.0047
hellaswag acc_stderr,none 0.0046
mmlu acc,none 0.2441
mmlu acc_stderr,none 0.0036
mmlu_abstract_algebra acc,none 0.2500
mmlu_abstract_algebra acc_stderr,none 0.0435
mmlu_anatomy acc,none 0.3111
mmlu_anatomy acc_stderr,none 0.0400
mmlu_astronomy acc,none 0.2303
mmlu_astronomy acc_stderr,none 0.0343
mmlu_business_ethics acc,none 0.2200
mmlu_business_ethics acc_stderr,none 0.0416
mmlu_clinical_knowledge acc,none 0.2113
mmlu_clinical_knowledge acc_stderr,none 0.0251
mmlu_college_biology acc,none 0.2569
mmlu_college_biology acc_stderr,none 0.0365
mmlu_college_chemistry acc,none 0.2500
mmlu_college_chemistry acc_stderr,none 0.0435
mmlu_college_computer_science acc,none 0.2300
mmlu_college_computer_science acc_stderr,none 0.0423
mmlu_college_mathematics acc,none 0.2800
mmlu_college_mathematics acc_stderr,none 0.0451
mmlu_college_medicine acc,none 0.2717
mmlu_college_medicine acc_stderr,none 0.0339
mmlu_college_physics acc,none 0.2745
mmlu_college_physics acc_stderr,none 0.0444
mmlu_computer_security acc,none 0.2300
mmlu_computer_security acc_stderr,none 0.0423
mmlu_conceptual_physics acc,none 0.2468
mmlu_conceptual_physics acc_stderr,none 0.0282
mmlu_econometrics acc,none 0.2193
mmlu_econometrics acc_stderr,none 0.0389
mmlu_electrical_engineering acc,none 0.2345
mmlu_electrical_engineering acc_stderr,none 0.0353
mmlu_elementary_mathematics acc,none 0.2540
mmlu_elementary_mathematics acc_stderr,none 0.0224
mmlu_formal_logic acc,none 0.1905
mmlu_formal_logic acc_stderr,none 0.0351
mmlu_global_facts acc,none 0.2100
mmlu_global_facts acc_stderr,none 0.0409
mmlu_high_school_biology acc,none 0.2387
mmlu_high_school_biology acc_stderr,none 0.0243
mmlu_high_school_chemistry acc,none 0.2315
mmlu_high_school_chemistry acc_stderr,none 0.0297
mmlu_high_school_computer_science acc,none 0.1700
mmlu_high_school_computer_science acc_stderr,none 0.0378
mmlu_high_school_european_history acc,none 0.2121
mmlu_high_school_european_history acc_stderr,none 0.0319
mmlu_high_school_geography acc,none 0.1869
mmlu_high_school_geography acc_stderr,none 0.0278
mmlu_high_school_government_and_politics acc,none 0.2798
mmlu_high_school_government_and_politics acc_stderr,none 0.0324
mmlu_high_school_macroeconomics acc,none 0.2282
mmlu_high_school_macroeconomics acc_stderr,none 0.0213
mmlu_high_school_mathematics acc,none 0.2333
mmlu_high_school_mathematics acc_stderr,none 0.0258
mmlu_high_school_microeconomics acc,none 0.2479
mmlu_high_school_microeconomics acc_stderr,none 0.0280
mmlu_high_school_physics acc,none 0.2450
mmlu_high_school_physics acc_stderr,none 0.0351
mmlu_high_school_psychology acc,none 0.1963
mmlu_high_school_psychology acc_stderr,none 0.0170
mmlu_high_school_statistics acc,none 0.2361
mmlu_high_school_statistics acc_stderr,none 0.0290
mmlu_high_school_us_history acc,none 0.2451
mmlu_high_school_us_history acc_stderr,none 0.0302
mmlu_high_school_world_history acc,none 0.2658
mmlu_high_school_world_history acc_stderr,none 0.0288
mmlu_human_aging acc,none 0.3318
mmlu_human_aging acc_stderr,none 0.0316
mmlu_human_sexuality acc,none 0.2366
mmlu_human_sexuality acc_stderr,none 0.0373
mmlu_humanities acc,none 0.2440
mmlu_humanities acc_stderr,none 0.0063
mmlu_international_law acc,none 0.2562
mmlu_international_law acc_stderr,none 0.0398
mmlu_jurisprudence acc,none 0.1852
mmlu_jurisprudence acc_stderr,none 0.0376
mmlu_logical_fallacies acc,none 0.3067
mmlu_logical_fallacies acc_stderr,none 0.0362
mmlu_machine_learning acc,none 0.2946
mmlu_machine_learning acc_stderr,none 0.0433
mmlu_management acc,none 0.2136
mmlu_management acc_stderr,none 0.0406
mmlu_marketing acc,none 0.2479
mmlu_marketing acc_stderr,none 0.0283
mmlu_medical_genetics acc,none 0.3100
mmlu_medical_genetics acc_stderr,none 0.0465
mmlu_miscellaneous acc,none 0.2708
mmlu_miscellaneous acc_stderr,none 0.0159
mmlu_moral_disputes acc,none 0.2457
mmlu_moral_disputes acc_stderr,none 0.0232
mmlu_moral_scenarios acc,none 0.2212
mmlu_moral_scenarios acc_stderr,none 0.0139
mmlu_nutrition acc,none 0.2451
mmlu_nutrition acc_stderr,none 0.0246
mmlu_other acc,none 0.2546
mmlu_other acc_stderr,none 0.0078
mmlu_philosophy acc,none 0.2797
mmlu_philosophy acc_stderr,none 0.0255
mmlu_prehistory acc,none 0.2284
mmlu_prehistory acc_stderr,none 0.0234
mmlu_professional_accounting acc,none 0.2305
mmlu_professional_accounting acc_stderr,none 0.0251
mmlu_professional_law acc,none 0.2458
mmlu_professional_law acc_stderr,none 0.0110
mmlu_professional_medicine acc,none 0.2279
mmlu_professional_medicine acc_stderr,none 0.0255
mmlu_professional_psychology acc,none 0.2631
mmlu_professional_psychology acc_stderr,none 0.0178
mmlu_public_relations acc,none 0.2000
mmlu_public_relations acc_stderr,none 0.0383
mmlu_security_studies acc,none 0.2163
mmlu_security_studies acc_stderr,none 0.0264
mmlu_social_sciences acc,none 0.2314
mmlu_social_sciences acc_stderr,none 0.0076
mmlu_sociology acc,none 0.2239
mmlu_sociology acc_stderr,none 0.0295
mmlu_stem acc,none 0.2461
mmlu_stem acc_stderr,none 0.0077
mmlu_us_foreign_policy acc,none 0.2900
mmlu_us_foreign_policy acc_stderr,none 0.0456
mmlu_virology acc,none 0.2771
mmlu_virology acc_stderr,none 0.0348
mmlu_world_religions acc,none 0.3158
mmlu_world_religions acc_stderr,none 0.0357
truthfulqa_mc2 acc,none 0.4447
truthfulqa_mc2 acc_stderr,none 0.0155
winogrande acc,none 0.5067
winogrande acc_stderr,none 0.0141
Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support