first version

Browse files

Files changed (4) hide show

README.md +43 -0
cliffWalking_qtable.npy +3 -0
replay.mp4 +0 -0
train.py +93 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+tags:
+- reinforcement-learning
+- q-learning
+- gymnasium
+- cliffwalking
+library_name: gymnasium
+license: apache-2.0
+---
+# CliffWalking Q-Learning Agent
+This repository contains a Q-learning agent trained on the **CliffWalking-v0** environment from **Gymnasium**. The agent learns to navigate the cliff, avoiding falling into the cliff zone while reaching the goal with minimal penalties. The Q-learning algorithm is implemented with epsilon-greedy exploration and updates the Q-table based on state-action-reward transitions.
+## Files:
+- `train.py`: The main script that trains the Q-learning agent.
+- `cliffWalking_qtable.npy`: The saved Q-table after training.
+- `replay.mp4`: A video of the agent's performance after training.
+## Training Details:
+- **Environment**: `CliffWalking-v0` (Gymnasium)
+- **Episodes**: 30,000
+- **Learning Rate (α)**: 0.2
+- **Discount Factor (γ)**: 0.97
+- **Epsilon (ε)**: 0.2 (exploration vs exploitation trade-off)
+The agent starts by exploring the environment randomly and gradually learns the optimal path to avoid falling off the cliff while reaching the goal.
+## How to Run:
+### 1. Install Dependencies:
+Make sure you have the required packages installed:
+```bash
+pip install gymnasium numpy imageio[ffmpeg]
+```
+### 2. Training the Agent:
+To train the agent, run the script train.py:
+```bash
+python train.py
+```

cliffWalking_qtable.npy ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1e5660a72690ef4ad817dc09eca6744adcf58d4b3f51efcdc85bbdb0d94af9b3
+size 1664

replay.mp4 ADDED Viewed

Binary file (47.5 kB). View file

train.py ADDED Viewed

	@@ -0,0 +1,93 @@

+import gymnasium as gym
+import numpy as np
+import imageio
+NUMBER_OF_EPISODES = 30000
+LEARNING_RATE = 0.2
+DISCOUNT_FACTOR = 0.97
+EPSILON = 0.2
+def initialize_environment():
+    env = gym.make('CliffWalking-v0')
+    state_size = env.observation_space.n
+    action_size = env.action_space.n
+    print(f"State size: {state_size}, Action size: {action_size}")
+    return env, state_size, action_size
+def initialize_q_table(state_size, action_size):
+    return np.zeros((state_size, action_size))
+def epsilon_greedy_action_selection(state, qtable, env, epsilon):
+    if np.random.uniform(0, 1) < epsilon:
+        return env.action_space.sample()
+    else:
+        return np.argmax(qtable[state, :])
+def update_q_value(current_state, action, reward, next_state, qtable, learning_rate, discount_factor):
+    future_q_value = np.max(qtable[next_state, :])
+    current_q_value = qtable[current_state, action]
+    new_q_value = current_q_value + learning_rate * (reward + discount_factor * future_q_value - current_q_value)
+    qtable[current_state, action] = new_q_value
+def train_agent(env, qtable, num_episodes, learning_rate, discount_factor, epsilon):
+    for episode_nr in range(num_episodes):
+        current_state, _ = env.reset()
+        done = False
+        while not done:
+            action = epsilon_greedy_action_selection(current_state, qtable, env, epsilon)
+            next_state, reward, done, _, _ = env.step(action)
+            update_q_value(current_state, action, reward, next_state, qtable, learning_rate, discount_factor)
+            current_state = next_state
+        if episode_nr % 10000 == 0:
+            print(f"\nQ-table after episode {episode_nr + 1}:")
+            np.set_printoptions(precision=2, suppress=True)
+            print(qtable)
+    return qtable
+def save_qtable(filename, qtable):
+    np.save(filename, qtable)
+    print(f"Q-table saved as {filename}")
+def create_replay_video(env, qtable, filename="replay.mp4"):
+    frames = []
+    current_state, _ = env.reset()
+    done = False
+    while not done:
+        frames.append(env.render())
+        action = np.argmax(qtable[current_state, :])
+        next_state, _, done, _, _ = env.step(action)
+        current_state = next_state
+    env.close()
+    with imageio.get_writer(filename, fps=10) as video:
+        for frame in frames:
+            video.append_data(frame)
+    print(f"Video saved as {filename}")
+def main():
+    env, state_size, action_size = initialize_environment()
+    qtable = initialize_q_table(state_size, action_size)
+    qtable = train_agent(env, qtable, NUMBER_OF_EPISODES, LEARNING_RATE, DISCOUNT_FACTOR, EPSILON)
+    save_qtable("cliffWalking_qtable.npy", qtable)
+    env = gym.make('CliffWalking-v0', render_mode="rgb_array")
+    create_replay_video(env, qtable)
+if __name__ == "__main__":
+    main()