udrl / README.md
vimmoos@Thor
final readme
7232d67

A newer version of the Streamlit SDK is available: 1.48.1

Upgrade
metadata
title: Upside-Down Reinforcement Learning
emoji: 🤖
colorFrom: green
colorTo: gray
sdk: streamlit
python_version: '3.10'
sdk_version: 1.39.0
app_file: app.py
pinned: true
short_description: Upside-Down Reinforcement Learning (UDRL)
tags:
  - reinforcement learning
  - RL
  - upside-down reinforcement learning
  - interpretability
  - explainable AI

Upside-Down RL

Version

This project implements an Upside-Down Reinforcement Learning (UDRL) agent.

This is the codebase of the paper: arXiv

The website associated with it is: demo

Installation

  1. Make sure you have Python 3.10 installed. You can check your version with python --version. NOTE Use a virtual env to avoid dependency clash
  2. Install the project dependencies using Poetry:
    poetry install
    
    If you do not have poetry use pip to install the requirements like so:
    pip install -r requirements.txt
    

Running the Experiment

You can run the experiment with various configuration options using the command line:

poetry run python -m udrl [options]

Note If you are already inside a virtual env python -m udrl [options] is enough Note All defaults are for the CartPole-v0 Available options include:

  • --env_name: Name of the Gym environment (default: CartPole-v0)
  • --estimator_name: "neural" for NN or a fully qualified name of the scikit-learn estimator class (default: ensemble.RandomForestClassifier)
  • --seed: Random seed (default: 42)
  • --max_episode: Maximum training episodes (default: 500)
  • --collect_episode: Episodes to collect between training (default: 15)
  • --batch_size: Batch size for training (default: 0, uses entire replay buffer)
  • Other options related to warm-up, memory size, exploration, testing, saving, etc.

Result Data

  • Experiment configuration and final test results are saved in a JSON file (conf.json) within a directory structure based on the environment, seed, and non-default configuration values (e.g., data/[env-name]/[experiment_name]/[seed]/conf.json).
  • If save_policy is True, the trained policy is saved in the same directory (policy).
  • If save_learning_infos is True, learning infos and rewards during training are saved as a NumPy file (e.g.test_rewards.npy) and a json file (e.h.learning_infos.json) in the same directory.

Process Data

  • A base post processing is available to convert the results data in csvs run it as python -m udrl.data_proc

Project Structure

  • data: Stores experiment results and other data.
  • old_code: Contains previous code versions (not used in the current setup).
  • poetry.lock, pyproject.toml: Manage project dependencies and configuration.
  • README.md: This file.
  • udrl: Contains the main Python modules for the UDRL agent.

Please refer to the code and comments for further details on the implementation.

Troubleshooting

If you encounter any errors during installation or execution, or if you have any questions about the project, feel free to reach out to me at [email protected] or open an issue. I'll be happy to assist you!