mcfloundinho commited on
Commit
fada987
·
0 Parent(s):

Initial QP-RNN interactive demo for Hugging Face Spaces

Browse files
Files changed (4) hide show
  1. .gitignore +10 -0
  2. README.md +62 -0
  3. app.py +220 -0
  4. requirements.txt +5 -0
.gitignore ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.py[cod]
3
+ *$py.class
4
+ *.so
5
+ .Python
6
+ *.png
7
+ *.jpg
8
+ *.jpeg
9
+ *.gif
10
+ .DS_Store
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: QP-RNN Interactive Demo
3
+ emoji: 🎮
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 4.19.2
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # QP-RNN: Quadratic Programming Recurrent Neural Network
14
+
15
+ This is an interactive demo for the paper ["MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control"](https://arxiv.org/abs/2312.05332) (L4DC 2024).
16
+
17
+ ## What is QP-RNN?
18
+
19
+ QP-RNN is a novel neural network architecture that combines:
20
+ - 🎯 **Structure** of Model Predictive Control (MPC)
21
+ - 🧠 **Learning** capabilities of Deep Reinforcement Learning
22
+ - ✅ **Verifiable** properties (stability, constraint satisfaction)
23
+
24
+ At each time step, QP-RNN solves a parameterized Quadratic Program:
25
+ ```
26
+ min 0.5 * y'Py + q(x)'y
27
+ s.t. -1 ≤ Hy + b(x) ≤ 1
28
+ ```
29
+
30
+ Where the parameters (P, H, q, b) are learned through RL instead of derived from a model.
31
+
32
+ ## Demo Features
33
+
34
+ This interactive demo lets you:
35
+ - 🎮 Control a double integrator system with QP-RNN
36
+ - 🔧 Adjust controller parameters in real-time
37
+ - 📊 Visualize system response and phase portraits
38
+ - 📈 See performance metrics and constraint satisfaction
39
+
40
+ ## Key Advantages
41
+
42
+ 1. **Interpretable**: QP structure provides clear understanding
43
+ 2. **Verifiable**: Enables formal stability and safety analysis
44
+ 3. **Efficient**: Fixed-iteration solver suitable for real-time control
45
+ 4. **Robust**: Handles constraints and disturbances naturally
46
+
47
+ ## Links
48
+
49
+ - 📄 [Paper](https://arxiv.org/abs/2312.05332)
50
+ - 💻 [GitHub Repository](https://github.com/yiwenlu66/learning-qp)
51
+ - 🤖 [Full Training Code](https://github.com/yiwenlu66/learning-qp)
52
+
53
+ ## Citation
54
+
55
+ ```bibtex
56
+ @InProceedings{lu2024mpc,
57
+ title={MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control},
58
+ author={Lu, Yiwen and Li, Zishuo and Zhou, Yihan and Li, Na and Mo, Yilin},
59
+ booktitle={Proceedings of the 6th Conference on Learning for Dynamics and Control},
60
+ year={2024}
61
+ }
62
+ ```
app.py ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Gradio app for QP-RNN interactive demo.
4
+ Suitable for deployment on Hugging Face Spaces.
5
+ """
6
+
7
+ import gradio as gr
8
+ import torch
9
+ import numpy as np
10
+ import matplotlib
11
+ matplotlib.use('Agg') # Use non-interactive backend
12
+ import matplotlib.pyplot as plt
13
+ from io import BytesIO
14
+ import base64
15
+
16
+ class MinimalQPRNN(torch.nn.Module):
17
+ """Minimal QP-RNN for demonstration."""
18
+
19
+ def __init__(self, position_gain=3.0, velocity_gain=1.5, control_cost=10.0):
20
+ super().__init__()
21
+ self.P = torch.tensor([[control_cost]], dtype=torch.float32)
22
+ self.K = torch.tensor([position_gain, velocity_gain], dtype=torch.float32)
23
+
24
+ def forward(self, state, reference=None):
25
+ if reference is None:
26
+ reference = torch.zeros_like(state)
27
+ error = state - reference
28
+ q = torch.sum(self.K * error, dim=-1, keepdim=True)
29
+ u_unconstrained = -q / self.P
30
+ u = torch.clamp(u_unconstrained, -1.0, 1.0)
31
+ return u
32
+
33
+ def simulate_system(position_gain, velocity_gain, control_cost,
34
+ initial_position, initial_velocity,
35
+ target_position, simulation_time):
36
+ """Run simulation with given parameters."""
37
+
38
+ # Create controller
39
+ controller = MinimalQPRNN(position_gain, velocity_gain, control_cost)
40
+
41
+ # Setup
42
+ dt = 0.05
43
+ T = int(simulation_time / dt)
44
+ x0 = torch.tensor([initial_position, initial_velocity])
45
+ x_ref = torch.tensor([target_position, 0.0])
46
+
47
+ # Simulate
48
+ states = [x0.numpy()]
49
+ controls = []
50
+ x = x0.clone()
51
+
52
+ for t in range(T):
53
+ u = controller(x, x_ref)
54
+ x_next = torch.zeros_like(x)
55
+ x_next[0] = x[0] + x[1] * dt
56
+ x_next[1] = x[1] + u.item() * dt
57
+ states.append(x_next.numpy())
58
+ controls.append(u.item())
59
+ x = x_next
60
+
61
+ return np.array(states), np.array(controls), dt
62
+
63
+ def create_plots(states, controls, dt):
64
+ """Create visualization plots."""
65
+ time = np.arange(len(states)) * dt
66
+ time_control = time[:-1]
67
+
68
+ # Create figure with subplots
69
+ fig = plt.figure(figsize=(12, 10))
70
+
71
+ # Position subplot
72
+ ax1 = plt.subplot(3, 2, 1)
73
+ ax1.plot(time, states[:, 0], 'b-', linewidth=2)
74
+ ax1.axhline(y=states[-1, 0], color='r', linestyle='--', alpha=0.5)
75
+ ax1.set_ylabel('Position')
76
+ ax1.set_title('Position vs Time')
77
+ ax1.grid(True, alpha=0.3)
78
+
79
+ # Velocity subplot
80
+ ax2 = plt.subplot(3, 2, 2)
81
+ ax2.plot(time, states[:, 1], 'g-', linewidth=2)
82
+ ax2.axhline(y=0, color='r', linestyle='--', alpha=0.5)
83
+ ax2.set_ylabel('Velocity')
84
+ ax2.set_title('Velocity vs Time')
85
+ ax2.grid(True, alpha=0.3)
86
+
87
+ # Control subplot
88
+ ax3 = plt.subplot(3, 2, 3)
89
+ ax3.plot(time_control, controls, 'r-', linewidth=2)
90
+ ax3.axhline(y=1, color='k', linestyle=':', alpha=0.5)
91
+ ax3.axhline(y=-1, color='k', linestyle=':', alpha=0.5)
92
+ ax3.set_ylabel('Control Input')
93
+ ax3.set_xlabel('Time (s)')
94
+ ax3.set_title('Control Input vs Time')
95
+ ax3.grid(True, alpha=0.3)
96
+ ax3.set_ylim(-1.2, 1.2)
97
+
98
+ # Phase portrait
99
+ ax4 = plt.subplot(3, 2, 4)
100
+ ax4.plot(states[:, 0], states[:, 1], 'b-', linewidth=2)
101
+ ax4.scatter([states[0, 0]], [states[0, 1]], color='green', s=100, marker='o', label='Start')
102
+ ax4.scatter([states[-1, 0]], [states[-1, 1]], color='red', s=100, marker='x', label='End')
103
+ ax4.set_xlabel('Position')
104
+ ax4.set_ylabel('Velocity')
105
+ ax4.set_title('Phase Portrait')
106
+ ax4.legend()
107
+ ax4.grid(True, alpha=0.3)
108
+
109
+ # QP visualization
110
+ ax5 = plt.subplot(3, 2, 5)
111
+ # Show how control saturates
112
+ time_saturated = np.sum(np.abs(controls) >= 0.99) / len(controls) * 100
113
+ labels = ['Saturated', 'Unsaturated']
114
+ sizes = [time_saturated, 100 - time_saturated]
115
+ colors = ['red', 'blue']
116
+ ax5.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%')
117
+ ax5.set_title('Control Saturation')
118
+
119
+ # Metrics text
120
+ ax6 = plt.subplot(3, 2, 6)
121
+ ax6.axis('off')
122
+ metrics_text = f"""Performance Metrics:
123
+
124
+ Final Position Error: {abs(states[-1, 0]):.4f}
125
+ Final Velocity: {states[-1, 1]:.4f}
126
+ Control Effort (L1): {np.sum(np.abs(controls)):.2f}
127
+ Control Effort (L2): {np.sqrt(np.sum(controls**2)):.2f}
128
+ Settling Time: ~{len(states) * dt:.1f}s
129
+ Max Overshoot: {np.max(np.abs(states[:, 0])):.2f}
130
+ """
131
+ ax6.text(0.1, 0.5, metrics_text, fontsize=12, verticalalignment='center',
132
+ fontfamily='monospace', bbox=dict(boxstyle="round,pad=0.5", facecolor="lightgray"))
133
+
134
+ plt.suptitle('QP-RNN Control Simulation Results', fontsize=16)
135
+ plt.tight_layout()
136
+
137
+ return fig
138
+
139
+ def run_qp_rnn_demo(position_gain, velocity_gain, control_cost,
140
+ initial_position, initial_velocity,
141
+ target_position, simulation_time):
142
+ """Main function for Gradio interface."""
143
+
144
+ # Run simulation
145
+ states, controls, dt = simulate_system(
146
+ position_gain, velocity_gain, control_cost,
147
+ initial_position, initial_velocity,
148
+ target_position, simulation_time
149
+ )
150
+
151
+ # Create plots
152
+ fig = create_plots(states, controls, dt)
153
+
154
+ # Create description
155
+ description = f"""
156
+ ### QP-RNN Control Results
157
+
158
+ The QP-RNN controller solves the following optimization problem at each time step:
159
+
160
+ ```
161
+ min 0.5 * u² * {control_cost} + u * (K @ error)
162
+ s.t. -1 ≤ u ≤ 1
163
+ ```
164
+
165
+ Where K = [{position_gain}, {velocity_gain}] are the feedback gains.
166
+
167
+ **Final State:** Position = {states[-1, 0]:.3f}, Velocity = {states[-1, 1]:.3f}
168
+
169
+ **Key Features:**
170
+ - Guaranteed constraint satisfaction (control always in [-1, 1])
171
+ - Interpretable structure (quadratic cost + linear feedback)
172
+ - Can be trained via RL for complex tasks
173
+ """
174
+
175
+ return fig, description
176
+
177
+ # Create Gradio interface
178
+ iface = gr.Interface(
179
+ fn=run_qp_rnn_demo,
180
+ inputs=[
181
+ gr.Slider(0.1, 10.0, value=3.0, label="Position Gain (Kp)",
182
+ info="Higher values = faster position correction"),
183
+ gr.Slider(0.1, 5.0, value=1.5, label="Velocity Gain (Kv)",
184
+ info="Higher values = more damping"),
185
+ gr.Slider(0.1, 50.0, value=10.0, label="Control Cost",
186
+ info="Higher values = less aggressive control"),
187
+ gr.Slider(-5.0, 5.0, value=2.0, label="Initial Position"),
188
+ gr.Slider(-2.0, 2.0, value=0.0, label="Initial Velocity"),
189
+ gr.Slider(-3.0, 3.0, value=0.0, label="Target Position"),
190
+ gr.Slider(1.0, 10.0, value=5.0, label="Simulation Time (s)")
191
+ ],
192
+ outputs=[
193
+ gr.Plot(label="Simulation Results"),
194
+ gr.Markdown(label="Analysis")
195
+ ],
196
+ title="QP-RNN: Quadratic Programming Recurrent Neural Network Demo",
197
+ description="""
198
+ This interactive demo shows how QP-RNN controllers work for a simple double integrator system.
199
+
200
+ **What is QP-RNN?**
201
+ - Combines Model Predictive Control structure with Deep Reinforcement Learning
202
+ - Learns to solve a parameterized Quadratic Program (QP) to generate control actions
203
+ - Provides theoretical guarantees (constraint satisfaction, stability verification)
204
+
205
+ **Try adjusting the parameters** to see how they affect control performance!
206
+
207
+ Paper: [MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control](https://arxiv.org/abs/2312.05332)
208
+ """,
209
+ examples=[
210
+ [3.0, 1.5, 10.0, 2.0, 0.0, 0.0, 5.0], # Default
211
+ [5.0, 2.0, 5.0, 2.0, 0.0, 0.0, 5.0], # Aggressive
212
+ [1.0, 0.5, 20.0, 2.0, 0.0, 0.0, 5.0], # Conservative
213
+ [3.0, 0.1, 10.0, 2.0, 0.0, 0.0, 5.0], # Underdamped
214
+ [3.0, 3.0, 10.0, 2.0, 0.0, 0.0, 5.0], # Overdamped
215
+ ],
216
+ cache_examples=True
217
+ )
218
+
219
+ if __name__ == "__main__":
220
+ iface.launch()
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ # Requirements for Hugging Face Spaces demo
2
+ torch>=2.0.0
3
+ numpy<2.0.0
4
+ matplotlib>=3.6.0
5
+ gradio>=4.0.0