Update README.md
Browse files
README.md
CHANGED
|
@@ -5,4 +5,78 @@ language:
|
|
| 5 |
tags:
|
| 6 |
- game
|
| 7 |
- reinforcement-learning
|
| 8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- game
|
| 7 |
- reinforcement-learning
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# Mini RL Game — DQN (Vector + Pixels)
|
| 11 |
+
|
| 12 |
+
A simple **Pygame** environment with a **DQN** agent that learns two scenarios. It is a **Educational RL example**. Quick experimentation with DQN on a minimal game.
|
| 13 |
+
|
| 14 |
+
- **Eat**: Catch falling objects.
|
| 15 |
+
- **Avoid**: Dodge falling objects as long as possible.
|
| 16 |
+
|
| 17 |
+
Check the project from the [GitHub Link](https://github.com/turhancan97/mini-game-with-RL). You can download the models here.
|
| 18 |
+
|
| 19 |
+
## Observation Types
|
| 20 |
+
- **Vector (MLP)**: Compact state per enemy with normalized deltas.
|
| 21 |
+
- **Pixels (CNN)**: Raw frames (84×84 grayscale) stacked over 4 frames.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## ✅ Checkpoints
|
| 26 |
+
|
| 27 |
+
### Vector (MLP)
|
| 28 |
+
| Scenario | Episodes | Enemies | File |
|
| 29 |
+
|----------|----------|---------|------------------------|
|
| 30 |
+
| Eat | 1000 | 4 | `model_vector_eat.h5` |
|
| 31 |
+
| Avoid | 3000 | 8 | `model_vector_avoid.h5`|
|
| 32 |
+
|
| 33 |
+
### Pixels (CNN)
|
| 34 |
+
| Scenario | Episodes | Enemies | File |
|
| 35 |
+
|----------|----------|---------|-------------------------|
|
| 36 |
+
| Eat | 1000 | 4 | `model_pixels_eat.h5` |
|
| 37 |
+
| Avoid | 3000 | 8 | `model_pixels_avoid.h5` |
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## 🧠 Model Architecture
|
| 42 |
+
|
| 43 |
+
### Vector (MLP) DQN
|
| 44 |
+
- **Input**: `2 * N_enemies` features (per enemy: Δx/width, Δy/height).
|
| 45 |
+
- **Network**:
|
| 46 |
+
`Dense(128, relu) → Dense(128, relu) → Dense(3, linear)`
|
| 47 |
+
|
| 48 |
+
### Pixels (CNN) DQN
|
| 49 |
+
- **Input**: `(84, 84, 4)` stacked grayscale frames.
|
| 50 |
+
- **Network**:
|
| 51 |
+
`Conv(32, 8×8, s=4, relu) → Conv(64, 4×4, s=2, relu) → Conv(64, 3×3, s=1, relu) → Dense(512, relu) → Dense(3, linear)`
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## ⚙️ Training Setup
|
| 56 |
+
|
| 57 |
+
**Algorithm**: DQN with target network
|
| 58 |
+
**Loss**: Huber
|
| 59 |
+
**Optimizer**: Adam (`lr=1e-3` for MLP, `lr=2.5e-4` for CNN)
|
| 60 |
+
**Target Updates**: Soft update with τ=0.005
|
| 61 |
+
|
| 62 |
+
### Replay
|
| 63 |
+
- Buffer size: `50k (MLP)` / `100k (CNN)`
|
| 64 |
+
- Warm-up (`train_start`): `2000 (MLP)` / `5000 (CNN)`
|
| 65 |
+
- Updates per env step: `2`
|
| 66 |
+
- Batch size: `64–128` (typical)
|
| 67 |
+
|
| 68 |
+
### Exploration
|
| 69 |
+
- Linear epsilon decay per episode: `1.0 → 0.05` over ~750 episodes.
|
| 70 |
+
|
| 71 |
+
### Rewards *(scaled small for stability)*
|
| 72 |
+
- **Eat**: `step −0.01`, `catch +1.0`, `miss −1.0`
|
| 73 |
+
- **Avoid**: `survival +0.001` per step, `near-miss up to −0.25`, `collision −5.0`
|
| 74 |
+
|
| 75 |
+
### Environment
|
| 76 |
+
- **Pygame**; player moves along bottom; multiple falling enemies.
|
| 77 |
+
- Dependencies
|
| 78 |
+
* Python 3.8
|
| 79 |
+
* TensorFlow 2.x (e.g., 2.9)
|
| 80 |
+
* NumPy
|
| 81 |
+
* scikit-image (for pixels preprocessing)
|
| 82 |
+
* Pygame
|