Spaces:
Sleeping
Sleeping
Commit
·
fada987
0
Parent(s):
Initial QP-RNN interactive demo for Hugging Face Spaces
Browse files- .gitignore +10 -0
- README.md +62 -0
- app.py +220 -0
- requirements.txt +5 -0
.gitignore
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
__pycache__/
|
2 |
+
*.py[cod]
|
3 |
+
*$py.class
|
4 |
+
*.so
|
5 |
+
.Python
|
6 |
+
*.png
|
7 |
+
*.jpg
|
8 |
+
*.jpeg
|
9 |
+
*.gif
|
10 |
+
.DS_Store
|
README.md
ADDED
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: QP-RNN Interactive Demo
|
3 |
+
emoji: 🎮
|
4 |
+
colorFrom: blue
|
5 |
+
colorTo: green
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 4.19.2
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: mit
|
11 |
+
---
|
12 |
+
|
13 |
+
# QP-RNN: Quadratic Programming Recurrent Neural Network
|
14 |
+
|
15 |
+
This is an interactive demo for the paper ["MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control"](https://arxiv.org/abs/2312.05332) (L4DC 2024).
|
16 |
+
|
17 |
+
## What is QP-RNN?
|
18 |
+
|
19 |
+
QP-RNN is a novel neural network architecture that combines:
|
20 |
+
- 🎯 **Structure** of Model Predictive Control (MPC)
|
21 |
+
- 🧠 **Learning** capabilities of Deep Reinforcement Learning
|
22 |
+
- ✅ **Verifiable** properties (stability, constraint satisfaction)
|
23 |
+
|
24 |
+
At each time step, QP-RNN solves a parameterized Quadratic Program:
|
25 |
+
```
|
26 |
+
min 0.5 * y'Py + q(x)'y
|
27 |
+
s.t. -1 ≤ Hy + b(x) ≤ 1
|
28 |
+
```
|
29 |
+
|
30 |
+
Where the parameters (P, H, q, b) are learned through RL instead of derived from a model.
|
31 |
+
|
32 |
+
## Demo Features
|
33 |
+
|
34 |
+
This interactive demo lets you:
|
35 |
+
- 🎮 Control a double integrator system with QP-RNN
|
36 |
+
- 🔧 Adjust controller parameters in real-time
|
37 |
+
- 📊 Visualize system response and phase portraits
|
38 |
+
- 📈 See performance metrics and constraint satisfaction
|
39 |
+
|
40 |
+
## Key Advantages
|
41 |
+
|
42 |
+
1. **Interpretable**: QP structure provides clear understanding
|
43 |
+
2. **Verifiable**: Enables formal stability and safety analysis
|
44 |
+
3. **Efficient**: Fixed-iteration solver suitable for real-time control
|
45 |
+
4. **Robust**: Handles constraints and disturbances naturally
|
46 |
+
|
47 |
+
## Links
|
48 |
+
|
49 |
+
- 📄 [Paper](https://arxiv.org/abs/2312.05332)
|
50 |
+
- 💻 [GitHub Repository](https://github.com/yiwenlu66/learning-qp)
|
51 |
+
- 🤖 [Full Training Code](https://github.com/yiwenlu66/learning-qp)
|
52 |
+
|
53 |
+
## Citation
|
54 |
+
|
55 |
+
```bibtex
|
56 |
+
@InProceedings{lu2024mpc,
|
57 |
+
title={MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control},
|
58 |
+
author={Lu, Yiwen and Li, Zishuo and Zhou, Yihan and Li, Na and Mo, Yilin},
|
59 |
+
booktitle={Proceedings of the 6th Conference on Learning for Dynamics and Control},
|
60 |
+
year={2024}
|
61 |
+
}
|
62 |
+
```
|
app.py
ADDED
@@ -0,0 +1,220 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
"""
|
3 |
+
Gradio app for QP-RNN interactive demo.
|
4 |
+
Suitable for deployment on Hugging Face Spaces.
|
5 |
+
"""
|
6 |
+
|
7 |
+
import gradio as gr
|
8 |
+
import torch
|
9 |
+
import numpy as np
|
10 |
+
import matplotlib
|
11 |
+
matplotlib.use('Agg') # Use non-interactive backend
|
12 |
+
import matplotlib.pyplot as plt
|
13 |
+
from io import BytesIO
|
14 |
+
import base64
|
15 |
+
|
16 |
+
class MinimalQPRNN(torch.nn.Module):
|
17 |
+
"""Minimal QP-RNN for demonstration."""
|
18 |
+
|
19 |
+
def __init__(self, position_gain=3.0, velocity_gain=1.5, control_cost=10.0):
|
20 |
+
super().__init__()
|
21 |
+
self.P = torch.tensor([[control_cost]], dtype=torch.float32)
|
22 |
+
self.K = torch.tensor([position_gain, velocity_gain], dtype=torch.float32)
|
23 |
+
|
24 |
+
def forward(self, state, reference=None):
|
25 |
+
if reference is None:
|
26 |
+
reference = torch.zeros_like(state)
|
27 |
+
error = state - reference
|
28 |
+
q = torch.sum(self.K * error, dim=-1, keepdim=True)
|
29 |
+
u_unconstrained = -q / self.P
|
30 |
+
u = torch.clamp(u_unconstrained, -1.0, 1.0)
|
31 |
+
return u
|
32 |
+
|
33 |
+
def simulate_system(position_gain, velocity_gain, control_cost,
|
34 |
+
initial_position, initial_velocity,
|
35 |
+
target_position, simulation_time):
|
36 |
+
"""Run simulation with given parameters."""
|
37 |
+
|
38 |
+
# Create controller
|
39 |
+
controller = MinimalQPRNN(position_gain, velocity_gain, control_cost)
|
40 |
+
|
41 |
+
# Setup
|
42 |
+
dt = 0.05
|
43 |
+
T = int(simulation_time / dt)
|
44 |
+
x0 = torch.tensor([initial_position, initial_velocity])
|
45 |
+
x_ref = torch.tensor([target_position, 0.0])
|
46 |
+
|
47 |
+
# Simulate
|
48 |
+
states = [x0.numpy()]
|
49 |
+
controls = []
|
50 |
+
x = x0.clone()
|
51 |
+
|
52 |
+
for t in range(T):
|
53 |
+
u = controller(x, x_ref)
|
54 |
+
x_next = torch.zeros_like(x)
|
55 |
+
x_next[0] = x[0] + x[1] * dt
|
56 |
+
x_next[1] = x[1] + u.item() * dt
|
57 |
+
states.append(x_next.numpy())
|
58 |
+
controls.append(u.item())
|
59 |
+
x = x_next
|
60 |
+
|
61 |
+
return np.array(states), np.array(controls), dt
|
62 |
+
|
63 |
+
def create_plots(states, controls, dt):
|
64 |
+
"""Create visualization plots."""
|
65 |
+
time = np.arange(len(states)) * dt
|
66 |
+
time_control = time[:-1]
|
67 |
+
|
68 |
+
# Create figure with subplots
|
69 |
+
fig = plt.figure(figsize=(12, 10))
|
70 |
+
|
71 |
+
# Position subplot
|
72 |
+
ax1 = plt.subplot(3, 2, 1)
|
73 |
+
ax1.plot(time, states[:, 0], 'b-', linewidth=2)
|
74 |
+
ax1.axhline(y=states[-1, 0], color='r', linestyle='--', alpha=0.5)
|
75 |
+
ax1.set_ylabel('Position')
|
76 |
+
ax1.set_title('Position vs Time')
|
77 |
+
ax1.grid(True, alpha=0.3)
|
78 |
+
|
79 |
+
# Velocity subplot
|
80 |
+
ax2 = plt.subplot(3, 2, 2)
|
81 |
+
ax2.plot(time, states[:, 1], 'g-', linewidth=2)
|
82 |
+
ax2.axhline(y=0, color='r', linestyle='--', alpha=0.5)
|
83 |
+
ax2.set_ylabel('Velocity')
|
84 |
+
ax2.set_title('Velocity vs Time')
|
85 |
+
ax2.grid(True, alpha=0.3)
|
86 |
+
|
87 |
+
# Control subplot
|
88 |
+
ax3 = plt.subplot(3, 2, 3)
|
89 |
+
ax3.plot(time_control, controls, 'r-', linewidth=2)
|
90 |
+
ax3.axhline(y=1, color='k', linestyle=':', alpha=0.5)
|
91 |
+
ax3.axhline(y=-1, color='k', linestyle=':', alpha=0.5)
|
92 |
+
ax3.set_ylabel('Control Input')
|
93 |
+
ax3.set_xlabel('Time (s)')
|
94 |
+
ax3.set_title('Control Input vs Time')
|
95 |
+
ax3.grid(True, alpha=0.3)
|
96 |
+
ax3.set_ylim(-1.2, 1.2)
|
97 |
+
|
98 |
+
# Phase portrait
|
99 |
+
ax4 = plt.subplot(3, 2, 4)
|
100 |
+
ax4.plot(states[:, 0], states[:, 1], 'b-', linewidth=2)
|
101 |
+
ax4.scatter([states[0, 0]], [states[0, 1]], color='green', s=100, marker='o', label='Start')
|
102 |
+
ax4.scatter([states[-1, 0]], [states[-1, 1]], color='red', s=100, marker='x', label='End')
|
103 |
+
ax4.set_xlabel('Position')
|
104 |
+
ax4.set_ylabel('Velocity')
|
105 |
+
ax4.set_title('Phase Portrait')
|
106 |
+
ax4.legend()
|
107 |
+
ax4.grid(True, alpha=0.3)
|
108 |
+
|
109 |
+
# QP visualization
|
110 |
+
ax5 = plt.subplot(3, 2, 5)
|
111 |
+
# Show how control saturates
|
112 |
+
time_saturated = np.sum(np.abs(controls) >= 0.99) / len(controls) * 100
|
113 |
+
labels = ['Saturated', 'Unsaturated']
|
114 |
+
sizes = [time_saturated, 100 - time_saturated]
|
115 |
+
colors = ['red', 'blue']
|
116 |
+
ax5.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%')
|
117 |
+
ax5.set_title('Control Saturation')
|
118 |
+
|
119 |
+
# Metrics text
|
120 |
+
ax6 = plt.subplot(3, 2, 6)
|
121 |
+
ax6.axis('off')
|
122 |
+
metrics_text = f"""Performance Metrics:
|
123 |
+
|
124 |
+
Final Position Error: {abs(states[-1, 0]):.4f}
|
125 |
+
Final Velocity: {states[-1, 1]:.4f}
|
126 |
+
Control Effort (L1): {np.sum(np.abs(controls)):.2f}
|
127 |
+
Control Effort (L2): {np.sqrt(np.sum(controls**2)):.2f}
|
128 |
+
Settling Time: ~{len(states) * dt:.1f}s
|
129 |
+
Max Overshoot: {np.max(np.abs(states[:, 0])):.2f}
|
130 |
+
"""
|
131 |
+
ax6.text(0.1, 0.5, metrics_text, fontsize=12, verticalalignment='center',
|
132 |
+
fontfamily='monospace', bbox=dict(boxstyle="round,pad=0.5", facecolor="lightgray"))
|
133 |
+
|
134 |
+
plt.suptitle('QP-RNN Control Simulation Results', fontsize=16)
|
135 |
+
plt.tight_layout()
|
136 |
+
|
137 |
+
return fig
|
138 |
+
|
139 |
+
def run_qp_rnn_demo(position_gain, velocity_gain, control_cost,
|
140 |
+
initial_position, initial_velocity,
|
141 |
+
target_position, simulation_time):
|
142 |
+
"""Main function for Gradio interface."""
|
143 |
+
|
144 |
+
# Run simulation
|
145 |
+
states, controls, dt = simulate_system(
|
146 |
+
position_gain, velocity_gain, control_cost,
|
147 |
+
initial_position, initial_velocity,
|
148 |
+
target_position, simulation_time
|
149 |
+
)
|
150 |
+
|
151 |
+
# Create plots
|
152 |
+
fig = create_plots(states, controls, dt)
|
153 |
+
|
154 |
+
# Create description
|
155 |
+
description = f"""
|
156 |
+
### QP-RNN Control Results
|
157 |
+
|
158 |
+
The QP-RNN controller solves the following optimization problem at each time step:
|
159 |
+
|
160 |
+
```
|
161 |
+
min 0.5 * u² * {control_cost} + u * (K @ error)
|
162 |
+
s.t. -1 ≤ u ≤ 1
|
163 |
+
```
|
164 |
+
|
165 |
+
Where K = [{position_gain}, {velocity_gain}] are the feedback gains.
|
166 |
+
|
167 |
+
**Final State:** Position = {states[-1, 0]:.3f}, Velocity = {states[-1, 1]:.3f}
|
168 |
+
|
169 |
+
**Key Features:**
|
170 |
+
- Guaranteed constraint satisfaction (control always in [-1, 1])
|
171 |
+
- Interpretable structure (quadratic cost + linear feedback)
|
172 |
+
- Can be trained via RL for complex tasks
|
173 |
+
"""
|
174 |
+
|
175 |
+
return fig, description
|
176 |
+
|
177 |
+
# Create Gradio interface
|
178 |
+
iface = gr.Interface(
|
179 |
+
fn=run_qp_rnn_demo,
|
180 |
+
inputs=[
|
181 |
+
gr.Slider(0.1, 10.0, value=3.0, label="Position Gain (Kp)",
|
182 |
+
info="Higher values = faster position correction"),
|
183 |
+
gr.Slider(0.1, 5.0, value=1.5, label="Velocity Gain (Kv)",
|
184 |
+
info="Higher values = more damping"),
|
185 |
+
gr.Slider(0.1, 50.0, value=10.0, label="Control Cost",
|
186 |
+
info="Higher values = less aggressive control"),
|
187 |
+
gr.Slider(-5.0, 5.0, value=2.0, label="Initial Position"),
|
188 |
+
gr.Slider(-2.0, 2.0, value=0.0, label="Initial Velocity"),
|
189 |
+
gr.Slider(-3.0, 3.0, value=0.0, label="Target Position"),
|
190 |
+
gr.Slider(1.0, 10.0, value=5.0, label="Simulation Time (s)")
|
191 |
+
],
|
192 |
+
outputs=[
|
193 |
+
gr.Plot(label="Simulation Results"),
|
194 |
+
gr.Markdown(label="Analysis")
|
195 |
+
],
|
196 |
+
title="QP-RNN: Quadratic Programming Recurrent Neural Network Demo",
|
197 |
+
description="""
|
198 |
+
This interactive demo shows how QP-RNN controllers work for a simple double integrator system.
|
199 |
+
|
200 |
+
**What is QP-RNN?**
|
201 |
+
- Combines Model Predictive Control structure with Deep Reinforcement Learning
|
202 |
+
- Learns to solve a parameterized Quadratic Program (QP) to generate control actions
|
203 |
+
- Provides theoretical guarantees (constraint satisfaction, stability verification)
|
204 |
+
|
205 |
+
**Try adjusting the parameters** to see how they affect control performance!
|
206 |
+
|
207 |
+
Paper: [MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control](https://arxiv.org/abs/2312.05332)
|
208 |
+
""",
|
209 |
+
examples=[
|
210 |
+
[3.0, 1.5, 10.0, 2.0, 0.0, 0.0, 5.0], # Default
|
211 |
+
[5.0, 2.0, 5.0, 2.0, 0.0, 0.0, 5.0], # Aggressive
|
212 |
+
[1.0, 0.5, 20.0, 2.0, 0.0, 0.0, 5.0], # Conservative
|
213 |
+
[3.0, 0.1, 10.0, 2.0, 0.0, 0.0, 5.0], # Underdamped
|
214 |
+
[3.0, 3.0, 10.0, 2.0, 0.0, 0.0, 5.0], # Overdamped
|
215 |
+
],
|
216 |
+
cache_examples=True
|
217 |
+
)
|
218 |
+
|
219 |
+
if __name__ == "__main__":
|
220 |
+
iface.launch()
|
requirements.txt
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Requirements for Hugging Face Spaces demo
|
2 |
+
torch>=2.0.0
|
3 |
+
numpy<2.0.0
|
4 |
+
matplotlib>=3.6.0
|
5 |
+
gradio>=4.0.0
|