OpenOranje/TweeTaal-nl-en-0.6B
Model Description
The TweeTaal-en-nl model has been fine-tuned on Dutch-English & English-Dutch translation pairs to provide accurate, fluent translations. The compact 0.6B parameter size makes it suitable for deployment in resource-constrained environments while maintaining strong translation quality.
Intended Use
Primary Use Case: Translating Dutch text to English / English text to Dutch across various domains
Recommended Applications:
- General-purpose Dutch-to-English and English-to-Dutch translation
- Content localization
- Cross-lingual communication tools
- Educational language learning applications
Performance
Benchmark Results
Training Details
Training Procedure
Method: Supervised Fine-Tuning (SFT)
- The model was trained on parallel Dutch-English text pairs
- Standard cross-entropy loss optimization
- The base Qwen3-0.6b model was adapted specifically for translation tasks
Training Data
The model was trained on Dutch-English parallel corpora. (Note: Specify your actual dataset details, such as:
- Dataset name and source
- Number of training examples
- Domain coverage (general, technical, literary, etc.)
- Data preprocessing steps)
Usage
Basic Usage Example
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer
model_name = "OpenOranje/qwen3-0.6b-dutch-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Prepare input
dutch_text = "Hallo, hoe gaat het met je?"
prompt = f"Translate from Dutch to English:\n{dutch_text}"
message = [{"role":"user", "content": prompt}]
# Generate translation
inputs = tokenizer.apply_chat_template(message, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.7)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)
Prompt Format
The model expects input in the following format:
Translate the following text from Dutch to English:\n{dutch_text}
Translate the following text from English to Dutch:\n{english_text}
Inference Parameters
Recommended generation parameters:
- Temperature: 0.7 (adjust for creativity vs. consistency)
- Max tokens: Set based on expected translation length
- Top-p: 0.9 (nucleus sampling)
Limitations
- Context Length: Trained on 4096 Tokens
- Rare Words: May struggle with highly specialized terminology or rare vocabulary not well-represented in training data
- Informal Language: Performance on slang, dialects, or very informal Dutch may vary
Ethical Considerations
- Training Data Bias: The model may reflect biases present in the training data
- Cultural Nuances: Some cultural expressions may not translate perfectly
Contact
For questions or issues, please contact: [[email protected]][[email protected]]
Additional Resources
- Base Model: Qwen3-0.6B
- Training Code: [TBD]
- Dataset: Data
Version History
- v1.0 (2025-10-24): Initial release
License: [Apache 2.0]
- Downloads last month
- 588