Skip to content

feat: add atomic backend storage and SDK for experiment tagging#1076

Open
tonypzy wants to merge 4 commits intofossasia:masterfrom
tonypzy:feature/experiment-tagging-backend
Open

feat: add atomic backend storage and SDK for experiment tagging#1076
tonypzy wants to merge 4 commits intofossasia:masterfrom
tonypzy:feature/experiment-tagging-backend

Conversation

@tonypzy
Copy link
Copy Markdown

@tonypzy tonypzy commented Mar 21, 2026

Description

This PR implements the core backend infrastructure for the Experiment Tagging feature (Phase 1). It focuses on providing a reliable, concurrent-safe, and performant way to store and manage environment metadata without breaking existing workflows.

Key Highlights

  • Atomic Writes: Implemented a write-to-temp + rename pattern using os.replace to prevent file corruption during server crashes.
  • Concurrency Safety: Introduced per-environment locks (WRITE_LOCKS) to ensure data integrity during simultaneous updates.
  • OOM Prevention: Implemented a lightweight tags_index.json to store metadata, preserving Visdom's lazy-loading mechanism for large environment files.
  • SDK Support: Added vis.set_tags() and vis.get_tags() to the Python client for programmatic experiment organization.
  • Backward Compatibility: Updated LazyEnvData to handle legacy files without the tags field gracefully.

Testing & Verification

I have verified the implementation with the following tests:

  1. Unit Tests: Verified atomic_save and LazyEnvData logic.
  2. Concurrency Stress Test: Ran 50 concurrent threads updating tags on the same environment. Result: Zero data corruption.
  3. Integration Test: Verified the Python SDK can correctly persist and retrieve tags from a live server.

Related Issue

Resolves #1075

Next Steps

  • Phase 2: Frontend UI integration in EnvControls.js.

Summary by Sourcery

Add backend support for experiment tags with atomic environment persistence and websocket synchronization.

New Features:

  • Introduce a tags HTTP handler and websocket broadcast messages to set, retrieve, and sync experiment tags per environment.
  • Add Python client APIs set_tags and get_tags for programmatic management of environment tags.
  • Maintain a lightweight tags_index.json file to store and quickly load tags metadata across environments.

Enhancements:

  • Make environment file writes atomic and per-environment locked to improve crash and concurrency safety.
  • Extend lazy environment loading to include optional tags data while preserving compatibility with legacy environment files.
  • Initialize the default main environment with an empty tags field during server startup.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 21, 2026

Reviewer's Guide

Implements atomic, concurrent-safe environment storage and a lightweight tag index to support experiment tagging, adds server endpoints and websocket broadcasts for tag updates/sync, and exposes new Python client APIs for setting and retrieving tags while maintaining backward compatibility with existing env files.

Sequence diagram for experiment tag set and broadcast

sequenceDiagram
    actor User
    participant VisdomClient as VisdomClient_Python_SDK
    participant HTTP as HTTP_TagsEndpoint
    participant TagsHandler as TagsHandler
    participant App as VisdomServerApp
    participant State as EnvState
    participant FS as FileSystem
    participant WSSubs as WebSocketSubscribers

    User->>VisdomClient: set_tags(tags, env, append)
    VisdomClient->>HTTP: POST /tags { eid, tags, append }
    HTTP->>TagsHandler: route_request

    TagsHandler->>TagsHandler: extract_eid(args)
    TagsHandler->>State: ensure state[eid] exists with jsons, reload, tags
    alt append is True
        TagsHandler->>State: merge tags into state[eid].tags
    else append is False
        TagsHandler->>State: replace state[eid].tags
    end

    TagsHandler->>App: update App.tags[eid]
    App->>FS: save_tag_index() using atomic_save(tags_index.json)

    TagsHandler->>WSSubs: broadcast_tags(eid, state[eid].tags)
    WSSubs-->>User: websocket message { command: tags_update, data: { eid, tags } }

    TagsHandler->>FS: serialize_env(state, [eid])
    FS->>FS: acquire WRITE_LOCKS[eid]
    FS->>FS: atomic_save(eid.json, json_dump(state[eid]))
    FS-->>FS: release WRITE_LOCKS[eid]

    TagsHandler-->>VisdomClient: HTTP response [tags]
    VisdomClient-->>User: return tags
Loading

Sequence diagram for tag synchronization on websocket connection

sequenceDiagram
    participant Client as WebSocketClient
    participant SocketHandler as SocketHandler
    participant App as VisdomServerApp

    Client->>SocketHandler: open()
    SocketHandler->>SocketHandler: register subscription
    SocketHandler->>SocketHandler: broadcast_layouts([self])
    SocketHandler->>SocketHandler: broadcast_envs([self])
    SocketHandler->>SocketHandler: sync_tags([self])

    SocketHandler->>App: read App.tags (tags_index.json already loaded)
    SocketHandler-->>Client: websocket message { command: tags_sync, data: tags_map }
Loading

Updated class diagram for tagging-related server and client structures

classDiagram
    class Visdom {
        +string server
        +int port
        +string base_url
        +string env
        +set_tags(tags, env, append)
        +get_tags(env)
        +_send(msg, endpoint, create)
    }

    class VisdomServerApp {
        +string env_path
        +dict state
        +dict subs
        +bool login_enabled
        +dict tags
        +load_state()
        +load_tag_index()
        +save_tag_index()
    }

    class TagsHandler {
        +dict state
        +string env_path
        +dict subs
        +bool login_enabled
        +VisdomServerApp app
        +initialize(app)
        +post()
    }

    class LazyEnvData {
        -string _env_path_file
        -dict _raw_dict
        +lazy_load_data()
        +__getitem__(key)
        +__iter__()
        +__len__()
    }

    class ServerUtils {
        +defaultdict WRITE_LOCKS
        +atomic_save(path, data)
        +serialize_env(state, eids, env_path)
        +broadcast_envs(handler, target_subs)
        +broadcast_tags(handler, eid, tags, target_subs)
        +sync_tags(handler, target_subs)
    }

    class SocketHandler {
        +dict subs
        +VisdomServerApp application
        +open()
        +broadcast_layouts(target_subs)
    }

    VisdomServerApp "1" o-- "many" TagsHandler : uses
    VisdomServerApp "1" o-- "many" SocketHandler : websocket_handlers
    VisdomServerApp "1" o-- "many" LazyEnvData : lazy_env_files
    VisdomServerApp "1" o-- "1" ServerUtils : file_and_broadcast_helpers

    TagsHandler --> VisdomServerApp : app
    TagsHandler --> ServerUtils : serialize_env
    TagsHandler --> ServerUtils : broadcast_tags

    LazyEnvData --> ServerUtils : lazy_load_data

    SocketHandler --> ServerUtils : broadcast_envs
    SocketHandler --> ServerUtils : sync_tags

    Visdom --> VisdomServerApp : talks_via_HTTP
    Visdom --> TagsHandler : /tags endpoint
Loading

File-Level Changes

Change Details Files
Add server-side handler and websocket flows for setting and retrieving per-environment tags.
  • Introduce TagsHandler to handle POST-based tag set/get using eid resolution and auth checks
  • Maintain tags in server state and a global app.tags mapping, updating both on tag changes
  • Broadcast tag updates to connected websocket subscribers via a new tags_update message
  • Sync all tags to new websocket connections via a tags_sync command
py/visdom/server/handlers/web_handlers.py
py/visdom/server/handlers/socket_handlers.py
py/visdom/utils/server_utils.py
Introduce atomic, lock-based environment file serialization and a separate tags index for OOM-safe tag storage.
  • Add per-environment WRITE_LOCKS and an atomic_save helper implementing write-to-temp plus os.replace with fsync
  • Wrap serialize_env writes with per-env locks and atomic_save, supporting both LazyEnvData and plain dict states
  • Add LazyEnvData support for the optional tags field, defaulting to an empty list for legacy files
  • Exclude tags_index.json from env file discovery and initialize tags from the tag index during app load
  • Implement load_tag_index and save_tag_index on the app, persisting tags via atomic_save into tags_index.json
py/visdom/utils/server_utils.py
py/visdom/server/app.py
Expose experiment tag management in the Python client SDK.
  • Add set_tags API to set or append tags for a given environment via the /tags endpoint
  • Add get_tags API that retrieves tags for an environment using a POST call to the tags handler and gracefully returns an empty list on error
py/visdom/__init__.py

Assessment against linked issues

Issue Objective Addressed Explanation
#1075 Implement backend experiment tagging support with hybrid storage (tags stored in each env_id.json plus a lightweight tags_index.json), including concurrency-safe, atomic persistence and loading of tags.
#1075 Expose experiment tagging through real-time sync mechanisms (WebSocket commands such as tags_sync and tags_update) and Python SDK methods (vis.set_tags and vis.get_tags).
#1075 Integrate experiment tags into the Visdom UI, including tag-based filtering in the environment selector and a tag management interface in the environment management modal. The PR explicitly scopes itself to Phase 1 (backend & SDK). No frontend/UI files (e.g., EnvControls.js or environment selector components) are modified, and no UI for tag filtering or management is implemented.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • Concurrent updates to self.tags and tags_index.json in TagsHandler.post and App.save_tag_index are not guarded by any lock, so you may want to add a small global/index-level lock to avoid races or partial overwrites when multiple environments update tags at the same time.
  • The TagsHandler.post endpoint multiplexes read and write semantics based on the presence of the tags key, which makes the API harder to reason about; consider adding an explicit GET handler (or a separate read endpoint) instead of overloading POST.
  • In Visdom.get_tags, the constructed url is not used and the SDK still goes through _send (POST), which is slightly misleading—either use the direct HTTP call with that URL or remove the unused variable/comment and make the POST-based behavior explicit.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Concurrent updates to `self.tags` and `tags_index.json` in `TagsHandler.post` and `App.save_tag_index` are not guarded by any lock, so you may want to add a small global/index-level lock to avoid races or partial overwrites when multiple environments update tags at the same time.
- The `TagsHandler.post` endpoint multiplexes read and write semantics based on the presence of the `tags` key, which makes the API harder to reason about; consider adding an explicit GET handler (or a separate read endpoint) instead of overloading POST.
- In `Visdom.get_tags`, the constructed `url` is not used and the SDK still goes through `_send` (POST), which is slightly misleading—either use the direct HTTP call with that URL or remove the unused variable/comment and make the POST-based behavior explicit.

## Individual Comments

### Comment 1
<location path="py/visdom/server/handlers/web_handlers.py" line_range="705-710" />
<code_context>
+            if eid not in self.state:
+                self.state[eid] = {"jsons": {}, "reload": {}, "tags": []}
+
+            if append:
+                current_tags = set(self.state[eid].get("tags", []))
+                current_tags.update(tags)
+                self.state[eid]["tags"] = list(current_tags)
+            else:
+                self.state[eid]["tags"] = list(set(tags))
+
+            # Update global index
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Deduplicating tags via set() changes tag ordering, which may be surprising for clients that rely on order.

Both the append and replace paths use `set(...)` for deduplication, which also reorders tags. If clients rely on user-specified tag order (e.g., for sorting or UI), this is a behavior change. To dedupe while preserving order, you could use something like:

```python
seen = set()
self.state[eid]["tags"] = [t for t in tags if not (t in seen or seen.add(t))]
```
</issue_to_address>

### Comment 2
<location path="py/visdom/__init__.py" line_range="844-855" />
<code_context>
+            env = self.env
+
+        try:
+            url = "{0}:{1}{2}/tags".format(
+                self.server, self.port, self.base_url
+            )
+            # We use a custom GET or POST here. Our handler currently only has POST.
+            # But let's check if we should support GET for simpler retrieval.
</code_context>
<issue_to_address>
**suggestion:** The constructed `url` in get_tags is unused, which can be confusing and suggests dead code.

Since `url` is never used and `_send` is called with `endpoint="tags"` instead, this looks like dead code and the comment about GET vs POST is misleading. Please either remove `url` and the comment, or refactor to actually use this URL in a direct HTTP call so the method’s intent is explicit.

```suggestion
        try:
            return self._send(
                msg={"eid": env},
                endpoint="tags",
                create=False,
            )
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Add experiment tagging to Visdom (GSoC 2026 Starter Task)

1 participant