Refactor RL system to integrate with existing grSim simulator and reuse shared constants by Copilot · Pull Request #2498 · RoboJackets/robocup-software

Copilot · 2026-02-23T01:06:01Z

Description

The RL system was fully standalone with its own physics engine and duplicated geometry/physics constants from the C++ codebase. This refactors it to use the existing grSim simulator as its physics backend and source constants from a single shared module that mirrors rj_constants/constants.hpp.

Core changes:

constants.py (new) — Single Python source of truth for physical constants, mirroring constants.hpp and network.hpp (robot/ball dimensions, field geometry, sim ports, physics params)
sim_client.py (new) — grSim UDP protobuf client using the same RobotControl/SimulatorCommand/SSL_WrapperPacket protocol as sim_radio.cpp
config.py — Removed FieldConfig and PhysicsConfig (duplicated geometry); only RL-specific config remains (RewardConfig, TrainingConfig, EnvConfig)
env.py — Replaced built-in physics with grSim communication (--use-sim); retains lightweight fallback using shared constants for CI/testing
state.py, action.py, reward.py — Now reference constants module instead of duplicated config values
proto_gen/ — Generated Python protobuf bindings from existing src/rj_protos/protos/ssl_simulation_*.proto

# Before: duplicated constants
class FieldConfig:
    length: float = 9.0      # duplicates constants.hpp
    robot_radius: float = 0.09  # duplicates constants.hpp

# After: single source of truth
from rj_rl import constants
constants.FIELD_LENGTH      # mirrors rj_constants/constants.hpp
constants.ROBOT_RADIUS      # mirrors rj_constants/constants.hpp
constants.SIM_COMMAND_PORT  # mirrors rj_common/network.hpp

Associated Issue

RL system architecture review feedback — eliminate standalone physics and geometry duplication.

Steps to test

cd src/rj_rl && pip install -e . && python -m pytest test/ -v — 52 tests pass
python -m scripts.train --timesteps 500 --seed 42 — fallback mode training works
Start grSim, then python -m scripts.train --timesteps 5000 --use-sim — simulator-backed training

Expected result: All tests pass; training runs in both fallback and grSim modes; no duplicated physics constants remain in config.py.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Create src/rj_rl/ package with: - Gym-compatible 2D environment with SSL-like physics - State encoding for observation space - Discrete action space (move, shoot, pass, defend, position, idle) - Configurable reward functions (goals, possession, progress) - Numpy-based MLP neural network (actor-critic) - PPO agent with GAE advantage estimation - Training loop with logging and checkpointing - Training script entry point - 50 unit tests (all passing) - README documentation Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>

Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>

- Remove FieldConfig and PhysicsConfig (duplicated geometry/physics) - Create constants.py mirroring rj_constants/constants.hpp - Create sim_client.py for grSim UDP protobuf communication - Refactor env.py to use grSim (with fallback for testing) - Update state.py, action.py, reward.py to use shared constants - Generate Python protobuf bindings from existing proto files - Update all tests (52 passing) - Update README with grSim integration documentation Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>

…aces Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>

Had an AI implement the trained model in modifying our strategy.

automated style fixes Co-authored-by: CameronLyon <CameronLyon@users.noreply.github.com>

Initial plan

6ccec38

Copilot AI assigned Copilot and sanatd33 Feb 23, 2026

Copilot started work on behalf of sanatd33 February 23, 2026 01:06 View session

Copilot AI and others added 2 commits February 23, 2026 01:29

Fix slice operator spacing in state.py per code review

4f5b0ca

Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add reinforcement learning system for strategy improvement~~ Add reinforcement learning framework for RoboCup SSL strategy Feb 23, 2026

Copilot AI requested a review from sanatd33 February 23, 2026 01:37

Copilot finished work on behalf of sanatd33 February 23, 2026 01:37

Copilot started work on behalf of sanatd33 February 23, 2026 01:38 View session

Copilot AI and others added 2 commits February 23, 2026 01:54

Fix security: bind vision socket to sim address instead of all interf…

7315e4a

…aces Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>

Copilot AI changed the title ~~Add reinforcement learning framework for RoboCup SSL strategy~~ Refactor RL system to integrate with existing grSim simulator and reuse shared constants Feb 23, 2026

Copilot finished work on behalf of sanatd33 February 23, 2026 01:59

RL Implementation

8ecc194

Had an AI implement the trained model in modifying our strategy.

github-actions bot mentioned this pull request Feb 25, 2026

Fix Code Style On copilot/develop-reinforcement-learning-system #2507

Merged

Fix Code Style On copilot/develop-reinforcement-learning-system (#2507)

93527f8

automated style fixes Co-authored-by: CameronLyon <CameronLyon@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor RL system to integrate with existing grSim simulator and reuse shared constants#2498

Refactor RL system to integrate with existing grSim simulator and reuse shared constants#2498
Copilot wants to merge 7 commits intoros2from
copilot/develop-reinforcement-learning-system

Copilot AI commented Feb 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Associated Issue

Steps to test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 23, 2026 •

edited

Loading