Refactor RL system to integrate with existing grSim simulator and reuse shared constants#2498
Draft
Refactor RL system to integrate with existing grSim simulator and reuse shared constants#2498
Conversation
Create src/rj_rl/ package with: - Gym-compatible 2D environment with SSL-like physics - State encoding for observation space - Discrete action space (move, shoot, pass, defend, position, idle) - Configurable reward functions (goals, possession, progress) - Numpy-based MLP neural network (actor-critic) - PPO agent with GAE advantage estimation - Training loop with logging and checkpointing - Training script entry point - 50 unit tests (all passing) - README documentation Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>
Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add reinforcement learning system for strategy improvement
Add reinforcement learning framework for RoboCup SSL strategy
Feb 23, 2026
- Remove FieldConfig and PhysicsConfig (duplicated geometry/physics) - Create constants.py mirroring rj_constants/constants.hpp - Create sim_client.py for grSim UDP protobuf communication - Refactor env.py to use grSim (with fallback for testing) - Update state.py, action.py, reward.py to use shared constants - Generate Python protobuf bindings from existing proto files - Update all tests (52 passing) - Update README with grSim integration documentation Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>
…aces Co-authored-by: sanatd33 <53443682+sanatd33@users.noreply.github.com>
Copilot
AI
changed the title
Add reinforcement learning framework for RoboCup SSL strategy
Refactor RL system to integrate with existing grSim simulator and reuse shared constants
Feb 23, 2026
Had an AI implement the trained model in modifying our strategy.
automated style fixes Co-authored-by: CameronLyon <CameronLyon@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The RL system was fully standalone with its own physics engine and duplicated geometry/physics constants from the C++ codebase. This refactors it to use the existing grSim simulator as its physics backend and source constants from a single shared module that mirrors
rj_constants/constants.hpp.Core changes:
constants.py(new) — Single Python source of truth for physical constants, mirroringconstants.hppandnetwork.hpp(robot/ball dimensions, field geometry, sim ports, physics params)sim_client.py(new) — grSim UDP protobuf client using the sameRobotControl/SimulatorCommand/SSL_WrapperPacketprotocol assim_radio.cppconfig.py— RemovedFieldConfigandPhysicsConfig(duplicated geometry); only RL-specific config remains (RewardConfig,TrainingConfig,EnvConfig)env.py— Replaced built-in physics with grSim communication (--use-sim); retains lightweight fallback using shared constants for CI/testingstate.py,action.py,reward.py— Now referenceconstantsmodule instead of duplicated config valuesproto_gen/— Generated Python protobuf bindings from existingsrc/rj_protos/protos/ssl_simulation_*.protoAssociated Issue
RL system architecture review feedback — eliminate standalone physics and geometry duplication.
Steps to test
cd src/rj_rl && pip install -e . && python -m pytest test/ -v— 52 tests passpython -m scripts.train --timesteps 500 --seed 42— fallback mode training workspython -m scripts.train --timesteps 5000 --use-sim— simulator-backed trainingExpected result: All tests pass; training runs in both fallback and grSim modes; no duplicated physics constants remain in config.py.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.