Spaces:

vimmoos
/

udrl

Running

App Files Files Community

udrl / README.md

vimmoos@Thor

final readme

7232d67 8 months ago

preview code

raw

history blame contribute delete

3.3 kB

	---
	title: Upside-Down Reinforcement Learning
	emoji: 🤖
	colorFrom: green
	colorTo: gray
	sdk: streamlit
	python_version: "3.10"
	sdk_version: 1.39.0
	app_file: app.py
	pinned: true
	short_description: Upside-Down Reinforcement Learning (UDRL)
	tags:
	- reinforcement learning
	- RL
	- upside-down reinforcement learning
	- interpretability
	- explainable AI
	---

	# Upside-Down RL

	<img alt="Version" src="./website_photo.jpg" />


	This project implements an Upside-Down Reinforcement Learning (UDRL) agent.

	This is the codebase of the paper: [arXiv](https://arxiv.org/abs/2411.11457)

	The website associated with it is: [demo](https://vimmoos-udrl.hf.space/)


	### Installation

	1. Make sure you have Python 3.10 installed. You can check your version with `python --version`.
	NOTE Use a virtual env to avoid dependency clash
	2. Install the project dependencies using Poetry:
	```bash
	poetry install
	```
	If you do not have poetry use pip to install the requirements like so:
	```bash
	pip install -r requirements.txt
	```


	### Running the Experiment

	You can run the experiment with various configuration options using the command line:

	```bash
	poetry run python -m udrl [options]
	```
	Note If you are already inside a virtual env `python -m udrl [options]` is enough
	Note All defaults are for the CartPole-v0
	Available options include:

	* `--env_name`: Name of the Gym environment (default: `CartPole-v0`)
	* `--estimator_name`: "neural" for NN or a fully qualified name of the scikit-learn estimator class (default: `ensemble.RandomForestClassifier`)
	* `--seed`: Random seed (default: `42`)
	* `--max_episode`: Maximum training episodes (default: `500`)
	* `--collect_episode`: Episodes to collect between training (default: `15`)
	* `--batch_size`: Batch size for training (default: `0`, uses entire replay buffer)
	* Other options related to warm-up, memory size, exploration, testing, saving, etc.


	### Result Data

	* Experiment configuration and final test results are saved in a JSON file (`conf.json`) within a directory structure based on the environment, seed, and non-default configuration values (e.g., `data/[env-name]/[experiment_name]/[seed]/conf.json`).
	* If `save_policy` is True, the trained policy is saved in the same directory (`policy`).
	* If `save_learning_infos` is True, learning infos and rewards during training are saved as a NumPy file (e.g.`test_rewards.npy`) and a json file (e.h.`learning_infos.json`) in the same directory.

	### Process Data
	* A base post processing is available to convert the results data in csvs run it as `python -m udrl.data_proc`

	### Project Structure

	* `data`: Stores experiment results and other data.
	* `old_code`: Contains previous code versions (not used in the current setup).
	* `poetry.lock`, `pyproject.toml`: Manage project dependencies and configuration.
	* `README.md`: This file.
	* `udrl`: Contains the main Python modules for the UDRL agent.

	Please refer to the code and comments for further details on the implementation.



	## Troubleshooting

	If you encounter any errors during installation or execution, or if you have any questions about the project, feel free to reach out to me at [[email protected]](mailto:[email protected]) or open an issue. I'll be happy to assist you!