Skip to content

feat(backend): add /api/diagnostics endpoint for environment and dependency health#623

Open
SxBxcoder wants to merge 2 commits intoAOSSIE-Org:mainfrom
SxBxcoder:feat/backend-diagnostics-endpoint
Open

feat(backend): add /api/diagnostics endpoint for environment and dependency health#623
SxBxcoder wants to merge 2 commits intoAOSSIE-Org:mainfrom
SxBxcoder:feat/backend-diagnostics-endpoint

Conversation

@SxBxcoder
Copy link
Copy Markdown
Contributor

@SxBxcoder SxBxcoder commented Mar 24, 2026

Addressed Issues:

Fixes N/A (Proactive backend infrastructure addition to assist maintainers in triaging local setup and dependency failures).

Screenshots/Recordings:

N/A (Backend JSON Endpoint).

Expected Output from GET /api/diagnostics:

{
  "status": "healthy",
  "system": {
    "os": "Windows",
    "release": "10",
    "architecture": "AMD64",
    "python_version": "3.10.11",
    "cpu_count": 8
  },
  "ml_environment": {
    "pytorch_available": true,
    "cuda_available": false,
    "torch_version": "2.1.2+cpu"
  }
}

Additional Notes:

This PR introduces a lightweight /api/diagnostics endpoint to backend/server.py.

Currently, when new contributors (especially those on Windows or machines with 8GB RAM) experience backend crashes during onboarding, maintainers have to guess if the issue is a Python version mismatch, a missing PyTorch wheel, or a CPU/VRAM bottleneck.

This endpoint requires zero new dependencies and provides an instant snapshot of the host's environment. Moving forward, when a user reports a crash in the Discord, maintainers can simply ask them to ping /api/diagnostics and share the output, drastically reducing triage time and friction for GSoC applicants.

AI Usage Disclosure:

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools: TODO

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • New Features
    • Added a diagnostics endpoint that returns OS, release, architecture, Python version, CPU count, and ML framework availability.
    • Reports ML runtime status (available/unavailable), framework version, CUDA availability, and any detection errors.
    • When diagnostics are disabled, the endpoint responds with a 403 access denied.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a527f140-7d82-4063-b26c-c82516f2797a

📥 Commits

Reviewing files that changed from the base of the PR and between 4183e03 and 7238fcb.

📒 Files selected for processing (1)
  • backend/server.py

📝 Walkthrough

Walkthrough

Added a new Flask GET endpoint /api/diagnostics that returns system and ML environment information (OS, release, architecture, Python version, CPU count, PyTorch availability and CUDA status), and conditionally denies access when ENABLE_DIAGNOSTICS is not enabled.

Changes

Cohort / File(s) Summary
Diagnostics Endpoint
backend/server.py
Added platform and sys imports; added system_diagnostics() registered at GET /api/diagnostics. Builds JSON with OS, release, architecture, Python version, CPU count, and ML fields. Attempts to import torch to set pytorch_available, torch_version, and cuda_available; handles ModuleNotFoundError and other exceptions with degraded status and error_log. Returns 403 when diagnostics disabled, 200 with payload when enabled.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Server as Flask Server
  participant OS as OS/System
  participant Torch as PyTorch (optional)

  Client->>Server: GET /api/diagnostics
  Server->>OS: gather platform, release, arch, python_version, cpu_count
  alt PyTorch import succeeds
    Server->>Torch: import torch
    Torch-->>Server: torch.__version__, cuda.is_available()
  else PyTorch import fails
    Torch-->>Server: ModuleNotFoundError / Exception
  end
  Server-->>Client: JSON diagnostics (200) or 403 if disabled
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped to peek at system light,
CPUs counted, Python bright,
Torch may join or hide away,
I log the truth in tidy play,
Diagnostics delivered — all in sight ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding a /api/diagnostics endpoint for environment and ML dependency health checking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 495-528: The diagnostics endpoint system_diagnostics currently
returns detailed host/runtime metadata; gate it behind a configuration flag and
tighten production behavior: add a config check (e.g.,
app.config.get("DIAGNOSTICS_ENABLED", False) or ENV != "production") at the top
of system_diagnostics and return a minimal 403/404 or a small
{"status":"unavailable"} response when disabled, and in production ensure only
non-identifying info (or just status) is returned; update any references to
platform/sys/os/torch usage to only run when diagnostics are enabled to avoid
leaking environment details.
- Around line 518-525: The current try/except only catches ImportError for the
torch import block and can still crash on OSError/RuntimeError or attribute
access failures; update the diagnostics logic around the torch import and
attribute reads (the block that sets
diagnostics["ml_environment"]["pytorch_available"],
diagnostics["ml_environment"]["torch_version"],
diagnostics["ml_environment"]["cuda_available"]) to catch broad exceptions
(catch Exception) and on any failure set diagnostics["status"] = "degraded
(pytorch error)" and record the exception message into diagnostics (e.g.,
diagnostics["ml_environment"]["pytorch_error"]) so broken installs return a
degraded status instead of raising; also guard attribute access
(torch.__version__, torch.cuda.is_available) inside the same exception scope to
avoid unhandled errors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24b5bbaa-309b-4dbf-89b8-1afd7a5c0fd2

📥 Commits

Reviewing files that changed from the base of the PR and between 2038116 and 4183e03.

📒 Files selected for processing (1)
  • backend/server.py

@SxBxcoder
Copy link
Copy Markdown
Contributor Author

Addressed CodeRabbit's security and resilience feedback in the latest commit.

  1. Security: The endpoint is now gated behind an ENABLE_DIAGNOSTICS=true environment variable check to prevent host fingerprint leakage in production deployments. > 2. Resilience: The torch import block now catches broad Exception classes. This safely handles corrupted local PyTorch/CUDA wheels (which often throw underlying C++ OSError or RuntimeError exceptions rather than standard ImportErrors) without taking down the Flask server.

Pipeline is green and this is ready for maintainer review whenever you have a chance @yatikakain @Aditya062003

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant