feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size by PushpitaJoardar · Pull Request #6 · geometric-intelligence/articulated

PushpitaJoardar · 2026-03-04T08:02:04Z

Summary

Implemented the RL pipeline(RNN estimation) for Reacher-v5, supporting both
raw observation baseline and GRU-embedded observation conditions as
described in the Body-State Manifold Learning proposal.

Changes

New Files

articulated/rl/environment.py — ReacherWithEmbedding wrapper (raw + embedded modes)
articulated/rl/agent.py — RLAgent with PPO, VecNormalize, all config fields
articulated/rl/train.py — Training script with eval and TensorBoard logging
articulated/rl/fit_pca.py — PCA fitting script for GRU embedding compression
articulated/configs/rl/baseline.yaml — Raw obs baseline (500K steps)
articulated/configs/rl/baseline_tuned.yaml — Tuned baseline (1M steps)
articulated/configs/rl/baseline_tuned2.yaml — Tuned baseline, lower LR
articulated/configs/rl/embedded.yaml — GRU-embedded obs config
articulated/configs/estimation/gru_so2.yaml — GRU estimation config (SO2)

Modified Files

articulated/shared/robot_arm.py — Added RobotArm2DKinematics for SO(2)
articulated/estimation/datamodule.py — SO(2) manifold support
articulated/estimation/model.py — GRU support + get_embedding() interface
articulated/estimation/train.py — Training script updates

Results

Condition	Mean Reward	Timesteps
Baseline PPO (raw obs)	-3.80	500K
Embedded RNN (val/acc=24%)	-9.67	1M
Embedded GRU (val/acc=99%)	-6.19	1M

Notes

GRU with kappa=20, seq_length=50 achieves val/acc=0.993
Embedded obs = [h_t | cos/sin joints | target_pos | fingertip_vec]

- environment.py: ReacherWithEmbedding wrapper (raw + embedded modes) - agent.py: RLAgent with PPO, VecNormalize, all config fields - train.py: training script with eval and TensorBoard logging - configs: baseline, baseline_tuned, baseline_tuned2, embedded YAMLs - estimation configs: rnn_so2.yaml, gru_so2.yaml (val/acc=0.993) - embedded obs: h_t + joint angles + target pos + fingertip vec

…dded obs

ht0324 · 2026-03-10T01:27:40Z

Results:

Best eval callback: -11.36 ± 1.91 at 100k steps
Final eval callback at 1,000,000 steps: -13.55 ± 1.56
Fresh 20-episode reevaluation of the final saved model: -12.50 ± 1.61
Fresh 20-episode reevaluation of the saved best model: -12.63 ± 1.79

PushpitaJoardar added 3 commits February 13, 2026 09:39

Add PPO baseline training, logging, evaluation, and rendering pipeline

81a8ede

feat: add RobotArm2DKinematics, GRU estimation, RL pipeline with embe…

f694a93

…dded obs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size#6

feat(RL): PPO pipeline with GRU body-state embeddings for Reacher-v5 with 256 size#6
PushpitaJoardar wants to merge 3 commits intogeometric-intelligence:mainfrom
PushpitaJoardar:main

PushpitaJoardar commented Mar 4, 2026

Uh oh!

ht0324 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PushpitaJoardar commented Mar 4, 2026

Summary

Changes

New Files

Modified Files

Results

Notes

Uh oh!

ht0324 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants