Skip to content

[shimV2] add plan9 device controller#2641

Open
rawahars wants to merge 4 commits intomicrosoft:mainfrom
rawahars:plan9-dev-controller
Open

[shimV2] add plan9 device controller#2641
rawahars wants to merge 4 commits intomicrosoft:mainfrom
rawahars:plan9-dev-controller

Conversation

@rawahars
Copy link
Copy Markdown
Contributor

@rawahars rawahars commented Mar 21, 2026

Summary

This change adds the plan9 device controller which can add/remove plan9 shares from a VM. The guest side operations are part of mount controller responsibility.

Whenever a new request for AddToVM comes, we call into HCS to hot-add the path via HcsModifyComputeSystem, where you inject a Plan9Share entry into the VM’s Devices → Plan9 → Shares schema, and HCS plumbs the corresponding endpoints so the guest can mount it using the Plan 9 (9P) protocol.

Note

For the same host path, we add a new share to the UVM. This pattern is similar to the existing pattern.
Ref- https://github.com/microsoft/hcsshim/blob/87708ff3150b7bceca4dbb3f7cdb7147428e42c3/internal/uvm/plan9.go

Imagine that the caller wants to add the same host path at two different guest locations-

Location 1: /tools
Location2: /tools2

If we do not create a new share for the UVM, we are likely to send a second GCS request asking the same share to be mounted at tools2. When the guest receives the GuestRequest of type ResourceTypeMappedDirectory, the handler unconditionally calls plan9.Mount(). This will, regardless of whether the same share name / aname was already mounted, dials a brand new vsock connection to the Plan9 server, opens a file descriptor from it, and performs a fresh unix.Mount(..., "9p", ...) syscall. This leads to error.

If we need to perform refCounting for plan9, it would need changes on both shim and guest side.

@rawahars rawahars requested a review from a team as a code owner March 21, 2026 12:14
}

// Build the Plan9 share flags bitmask from the caller-provided options.
flags := shareFlagsLinuxMetadata
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the vdev already handles ref counting of share host path to options then we dont need it here. But I'm hesitating to think it would.

If you call AddPlan9 with identical specs what does it do? Or is that they always have a different nameCounter so they always have a different set of settings?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, plan9 vdev does not support refcounting. If we call with similar options, GCS errors out Cannot create a file when that file already exists.

return "", fmt.Errorf("add plan9 share %s to host: %w", name, err)
}

m.shares[name] = struct{}{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would imply that you end up overwriting the hcsschema with just one entry. I think you need to refcount here

rawahars added 2 commits April 5, 2026 06:33
This change adds the plan9 device controller which can add/remove plan9 shares from a VM. The guest side operations are part of mount controller responsibility.

Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
@rawahars rawahars force-pushed the plan9-dev-controller branch from 2ecafc6 to 4a697a8 Compare April 5, 2026 02:05
Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
@rawahars rawahars force-pushed the plan9-dev-controller branch from 4a697a8 to b2cff99 Compare April 5, 2026 04:19
case StateUnmounted:
return "", fmt.Errorf("cannot mount a share in state %s", m.state)
}
return "", nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golang makes this hard on the match fallthrough but should we panic here? Really you never expect to hit this return statement right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was skeptical about panic since it can cause leaked resources, say we enable HCS VMs to continue post shim getting terminated (needed for impactless update).

If we return error then the caller can simply handle it on their own by shutting down the shim gracefully.

// MountToGuest failed (it transitions StateReserved → StateUnmounted),
// and the caller subsequently calls UnmapFromGuest to clean up.
}
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. You really cant hit this return right?


// NewReserved creates a new [Mount] in the [StateReserved] state with the
// provided share name and guest-side mount configuration.
func NewReserved(shareName string, config Config) *Mount {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self, shareName must be unique to avoid guest collisions per mount.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Since shareName is generated by us in the share package, it's going to be unique.
Same applies in SCSI controller too.


// share-flag constants used in Plan9 HCS requests.
//
// These are marked private in the HCS schema. When public variants become
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate.

Plan9ShareFlags is public and these are public since 2.0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right. I have made code changes to include the flags in schema and use them directly.

// already exists, it increments the reference count after verifying the config
// matches.
func (s *Share) ReserveMount(ctx context.Context, config mount.Config) (*mount.Mount, error) {
if s.state != StateReserved && s.state != StateAdded {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a problem. If you reserve a share, and reserve a mount and then AddToVM fails, you will end up with a reserved mount but the share is in the StateRemoved state. I think you need to enforce that you can only do Mount operations within the lifetime of StateAdded

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug on the other one too? I need to look at that code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was intentional for plan9 controller. Reserving a mount for plan9 share does not reserve anything which blocks the future callers. If the AddToVM fails, we will simply stop tracking the Mount and garbage collector will clean that up. Nothing needs to be cleaned up for the Mount.

The other reservations will see that the share is not present and return with an error.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this entire package (or, at the least, the Controller) be behind an lcow build tag?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not keep the plan9 controller restricted to lcow build tag as it needs to be part of the VM struct.
From the VM itself, this would be available only in LCOW builds so the actual consumers only get it in lcow paths.

Ref- #2663
What do you think?

Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants