Skip to content

feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size#6

Open
PushpitaJoardar wants to merge 3 commits intogeometric-intelligence:mainfrom
PushpitaJoardar:main
Open

feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size#6
PushpitaJoardar wants to merge 3 commits intogeometric-intelligence:mainfrom
PushpitaJoardar:main

Conversation

@PushpitaJoardar
Copy link
Copy Markdown
Collaborator

Summary

Implemented the RL pipeline(RNN estimation) for Reacher-v5, supporting both
raw observation baseline and GRU-embedded observation conditions as
described in the Body-State Manifold Learning proposal.

Changes

New Files

  • articulated/rl/environment.py — ReacherWithEmbedding wrapper (raw + embedded modes)
  • articulated/rl/agent.py — RLAgent with PPO, VecNormalize, all config fields
  • articulated/rl/train.py — Training script with eval and TensorBoard logging
  • articulated/rl/fit_pca.py — PCA fitting script for GRU embedding compression
  • articulated/configs/rl/baseline.yaml — Raw obs baseline (500K steps)
  • articulated/configs/rl/baseline_tuned.yaml — Tuned baseline (1M steps)
  • articulated/configs/rl/baseline_tuned2.yaml — Tuned baseline, lower LR
  • articulated/configs/rl/embedded.yaml — GRU-embedded obs config
  • articulated/configs/estimation/gru_so2.yaml — GRU estimation config (SO2)

Modified Files

  • articulated/shared/robot_arm.py — Added RobotArm2DKinematics for SO(2)
  • articulated/estimation/datamodule.py — SO(2) manifold support
  • articulated/estimation/model.py — GRU support + get_embedding() interface
  • articulated/estimation/train.py — Training script updates

Results

Condition Mean Reward Timesteps
Baseline PPO (raw obs) -3.80 500K
Embedded RNN (val/acc=24%) -9.67 1M
Embedded GRU (val/acc=99%) -6.19 1M

Notes

  • GRU with kappa=20, seq_length=50 achieves val/acc=0.993
  • Embedded obs = [h_t | cos/sin joints | target_pos | fingertip_vec]

- environment.py: ReacherWithEmbedding wrapper (raw + embedded modes)
- agent.py: RLAgent with PPO, VecNormalize, all config fields
- train.py: training script with eval and TensorBoard logging
- configs: baseline, baseline_tuned, baseline_tuned2, embedded YAMLs
- estimation configs: rnn_so2.yaml, gru_so2.yaml (val/acc=0.993)
- embedded obs: h_t + joint angles + target pos + fingertip vec
@ht0324
Copy link
Copy Markdown
Collaborator

ht0324 commented Mar 10, 2026

Results:

  • Best eval callback: -11.36 ± 1.91 at 100k steps
  • Final eval callback at 1,000,000 steps: -13.55 ± 1.56
  • Fresh 20-episode reevaluation of the final saved model: -12.50 ± 1.61
  • Fresh 20-episode reevaluation of the saved best model: -12.63 ± 1.79

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants