arco-3 / README.md

appvoid

Update README.md

6f4f339 verified 14 days ago

preview code

raw

history blame contribute delete

4.26 kB

metadata

license: apache-2.0
pipeline_tag: text-generation
extra_gated_prompt: >-
  You agree to not use this model (or future versions) to conduct experiments
  that cause harm to any person or group.
extra_gated_fields:
  Company: text
  Country: country
  Specific date: date_picker
  I want to use this model for:
    type: select
    options:
      - Work
      - Research
      - Education
      - Hobby
      - label: Other
        value: other
  I agree to use this model in good faith ONLY: checkbox

In this repository, we propose the next iteration of arco, a new meta-learner small language model. Now with qwen as the base architecture for improvements.
During previous research, we first noticed a dramatic underpeformance on fewshot prompting from previous arco series (regardless of benchmark improvements on arc) so we decided that the main concept to work on was making a more robust fewshot learning by focusing directly on tasks that improve that skill with a stronger baseline model like qwen family.
After several merging iterations with some openly available models, we finally achieved a strong baseline for a meta-learner model which we called arco-3. This model will serve as the starting point for future fewshot finetunings and experiments.

		prompt
	
There is no prompt intentionally set.

		benchmarks
	
		meta arena
	
We tested around 65 models against each other with fewshot tasks and used gemini-2.5-pro to chose the best answers from each one. Currently, it ranks 13th in meta-arena.

		variance
	
We also tested the model against some popular small models on "power" distribution for our 5 typically chosen language modeling benchmarks.

		language modeling
	
To our surprise, this model also improved some language modeling tasks over the base model on several well-known benchmarks.

Parameters
Model
MMLU
ARC-C
HellaSwag
PIQA
Winogrande
Average

0.6b
qwen 3
40.31
34.47
47.38
67.46
56.04
49.13

0.6b
arco 3
43.34
36.01
49.56
68.17
58.09
51.03

		strengths
	
Strong bias to format
Excellent classifier
State-of-the-art paraphrasing
Vocabulary/Idiomatic understanding

		limitations
	
Lack of creative outputs
Extremely poor summarization skills
Poor causality understanding
Hallucinations
We have a plan to tackle each one of these issues for them to be corrected in the future.

		supporters
	
		trivia
	
arco means "bow" in spanish, which is just another way to say that hits its target fast and accurately.
Note: the model has not been tested as a chat assistant and it might not work as intended, use with caution.

Parameters	Model	MMLU	ARC-C	HellaSwag	PIQA	Winogrande	Average
0.6b	qwen 3	40.31	34.47	47.38	67.46	56.04	49.13
0.6b	arco 3	43.34	36.01	49.56	68.17	58.09	51.03