Text Generation
Transformers
Safetensors
qwen2
text-generation-inference
conversational
Logo

🌟 BloomVN-0.5B-ppo

A fine-tuned multilingual model for Vietnamese language

πŸ“‹ Overview

This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior.

πŸ”§ Method

The experimentation process was conducted using veRL, focusing on:

  • Implementation of PPO algorithm with a 0.5B parameter model
  • Running training experiments on a small dataset
  • Testing veRL's framework capabilities in handling RL tasks
  • Evaluating training efficiency and model behavior

This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment.

πŸ“Š VLMU Benchmark

EVALUATION DATE STEM πŸ”¬ SOCIAL SCIENCE 🌍 HUMANITIES πŸ“š OTHERS 🎯 AVG ⭐
07/02/2025 23.18 32.84 32.71 33.67 29.43

🀝 Contributors

Developed with ❀️ by BlossomAI


Star ⭐️ this repo if you find it valuable!
Downloads last month
6
Safetensors
Model size
494M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for BlossomsAI/BloomVN-0.5B-ppo

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(357)
this model
Adapters
21 models
Quantizations
2 models

Dataset used to train BlossomsAI/BloomVN-0.5B-ppo