A newer version of the Streamlit SDK is available:
1.48.1
metadata
title: Upside-Down Reinforcement Learning
emoji: 🤖
colorFrom: green
colorTo: gray
sdk: streamlit
python_version: '3.10'
sdk_version: 1.39.0
app_file: app.py
pinned: true
short_description: Upside-Down Reinforcement Learning (UDRL)
tags:
- reinforcement learning
- RL
- upside-down reinforcement learning
- interpretability
- explainable AI
Upside-Down RL

This project implements an Upside-Down Reinforcement Learning (UDRL) agent.
This is the codebase of the paper: arXiv
The website associated with it is: demo
Installation
- Make sure you have Python 3.10 installed. You can check your version with
python --version
. NOTE Use a virtual env to avoid dependency clash - Install the project dependencies using Poetry:
If you do not have poetry use pip to install the requirements like so:poetry install
pip install -r requirements.txt
Running the Experiment
You can run the experiment with various configuration options using the command line:
poetry run python -m udrl [options]
Note If you are already inside a virtual env python -m udrl [options]
is enough
Note All defaults are for the CartPole-v0
Available options include:
--env_name
: Name of the Gym environment (default:CartPole-v0
)--estimator_name
: "neural" for NN or a fully qualified name of the scikit-learn estimator class (default:ensemble.RandomForestClassifier
)--seed
: Random seed (default:42
)--max_episode
: Maximum training episodes (default:500
)--collect_episode
: Episodes to collect between training (default:15
)--batch_size
: Batch size for training (default:0
, uses entire replay buffer)- Other options related to warm-up, memory size, exploration, testing, saving, etc.
Result Data
- Experiment configuration and final test results are saved in a JSON file (
conf.json
) within a directory structure based on the environment, seed, and non-default configuration values (e.g.,data/[env-name]/[experiment_name]/[seed]/conf.json
). - If
save_policy
is True, the trained policy is saved in the same directory (policy
). - If
save_learning_infos
is True, learning infos and rewards during training are saved as a NumPy file (e.g.test_rewards.npy
) and a json file (e.h.learning_infos.json
) in the same directory.
Process Data
- A base post processing is available to convert the results data in csvs run it as
python -m udrl.data_proc
Project Structure
data
: Stores experiment results and other data.old_code
: Contains previous code versions (not used in the current setup).poetry.lock
,pyproject.toml
: Manage project dependencies and configuration.README.md
: This file.udrl
: Contains the main Python modules for the UDRL agent.
Please refer to the code and comments for further details on the implementation.
Troubleshooting
If you encounter any errors during installation or execution, or if you have any questions about the project, feel free to reach out to me at [email protected] or open an issue. I'll be happy to assist you!