arco-3 / README.md
appvoid's picture
Update README.md
6f4f339 verified
metadata
license: apache-2.0
pipeline_tag: text-generation
extra_gated_prompt: >-
  You agree to not use this model (or future versions) to conduct experiments
  that cause harm to any person or group.
extra_gated_fields:
  Company: text
  Country: country
  Specific date: date_picker
  I want to use this model for:
    type: select
    options:
      - Work
      - Research
      - Education
      - Hobby
      - label: Other
        value: other
  I agree to use this model in good faith ONLY: checkbox

cubby

In this repository, we propose the next iteration of arco, a new meta-learner small language model. Now with qwen as the base architecture for improvements.

During previous research, we first noticed a dramatic underpeformance on fewshot prompting from previous arco series (regardless of benchmark improvements on arc) so we decided that the main concept to work on was making a more robust fewshot learning by focusing directly on tasks that improve that skill with a stronger baseline model like qwen family.

After several merging iterations with some openly available models, we finally achieved a strong baseline for a meta-learner model which we called arco-3. This model will serve as the starting point for future fewshot finetunings and experiments.

prompt

There is no prompt intentionally set.

benchmarks

meta arena

We tested around 65 models against each other with fewshot tasks and used gemini-2.5-pro to chose the best answers from each one. Currently, it ranks 13th in meta-arena.

meta arena

variance

We also tested the model against some popular small models on "power" distribution for our 5 typically chosen language modeling benchmarks. variance

language modeling

To our surprise, this model also improved some language modeling tasks over the base model on several well-known benchmarks.

Parameters Model MMLU ARC-C HellaSwag PIQA Winogrande Average
0.6b qwen 3 40.31 34.47 47.38 67.46 56.04 49.13
0.6b arco 3 43.34 36.01 49.56 68.17 58.09 51.03

strengths

  • Strong bias to format
  • Excellent classifier
  • State-of-the-art paraphrasing
  • Vocabulary/Idiomatic understanding

limitations

  • Lack of creative outputs
  • Extremely poor summarization skills
  • Poor causality understanding
  • Hallucinations

We have a plan to tackle each one of these issues for them to be corrected in the future.

supporters

Buy Me A Coffee

trivia

arco means "bow" in spanish, which is just another way to say that hits its target fast and accurately.

Note: the model has not been tested as a chat assistant and it might not work as intended, use with caution.