Toddler-LLM (fully pre-trained on CHILDES)

Model name: Toddler-LLM
Type: Decoder-only small LM for toddler-like dialogue
Status: Fully pre-trained from scratch on child-directed speech, then SFT + GRPO
Primary language: English
Target behavior: Coherent, short, child-like responses (approx. 2–3 years old)
Parameter count: ~155M (see config below)
Intended domain: Parent–child conversational exchanges

Source: CHILDES (filtered, English-only)
Approx. 14M tokens after filtering
Pretraining exclusively on child-directed speech (no large-scale adult corpora)
Data filtering (for downstream SFT/GRPO): caregiver utterance clarity via RM-4 to select top 10% “helpful” caregiver prompts; coherence scoring for child utterances via RM-2

RM-1 (Toddler-BERT): BERT classifier for “child-like” style found @ enochlev/childish_behavior_model
RM-2 (Coherence-BERT): BERT classifier trained with soft labels for coherence
- Labeled by Llama‑3.3‑70B (batched for consistency), 0.0–1.0 scale found @ enochlev/child_coherence_model
- Training: 5 epochs, BCEWithLogitsLoss, LR 2e-5, weight decay 0.01, batch size 150, max length 96
RM-3 (Length PMF): Bayesian-based PMF over sentence lengths from CHILDES, min–max normalized to [0,1], temperature for smoothness; per-sentence score scaled by 1/max(1, number_of_punctuations) to encourage one short sentence
RM-4 (Caregiver clarity): LLM-scored question clarity; used as a filter only (not a reward) to select top 10% caregiver prompts

Chat template: compatible with SmolLM2 Instruct-style templates
Guidance:
- Input: single caregiver question or brief prompt
- Output: one short, coherent sentence with age-appropriate vocabulary
Example prompt format (generic):
- System: “You are a 2–3-year-old child speaking in short, simple sentences.”
- User: “Caregiver: What did you have for dessert for lunch?”
- Assistant (model): “i had some spaghettis.” (Note: spelling/grammar may be age-typical)

Caregiver Utterance: What did you have for dessert for lunch?
- Gold Child Response: a ice cream sandwich
- llm-toddler-30: i had some spaghettis.
Caregiver Utterance: What did we do with the toy?
- Gold Child Response: we put it in his cage
- llm-toddler-30: it broke.
Caregiver Utterance: Where is your bag?
- Gold Child Response: I didn't bring it. It's in the car
- llm-toddler-30: i got it
Caregiver Utterance: What did you find?
- Gold Child Response: look. Elmo
- llm-toddler-30: i found luke's lightsaber.
Caregiver Utterance: What did you put on the pizza?
- Gold Child Response: cheese and cheese
- llm-toddler-30: i put sause on it

Human evaluation (19 participants; two robot platforms: Cozmo, Misty II):
- Perceived age: ~3 years (closer to target for our model)
- Coherence: comparable across models; higher with Cozmo than Misty on average
- AoA and vocabulary: humans used broader vocabulary and higher AoA; models remained lower AoA as intended; some SmolLM variants occasionally produced adult-level content
- Notable: Participant expectations matched Cozmo’s child-like morphology/voice better than Misty’s

Not for clinical, diagnostic, educational placement, or childcare decision-making
English-only; small corpus (≈14M tokens); limited world knowledge
Can produce off-context, random child-like words; may fixate on certain “baby words”
May generate age-inappropriate content in rare cases; monitor outputs
Sensitive to prompt phrasing; best with concise caregiver questions

Safetensors

Model size

0.2B params

Tensor type

BF16