Skip to content

HDDS-14870. Allow balancing of over replicated and quasi closed containers#9964

Open
sarvekshayr wants to merge 6 commits intoapache:masterfrom
sarvekshayr:HDDS-14870
Open

HDDS-14870. Allow balancing of over replicated and quasi closed containers#9964
sarvekshayr wants to merge 6 commits intoapache:masterfrom
sarvekshayr:HDDS-14870

Conversation

@sarvekshayr
Copy link
Copy Markdown
Contributor

@sarvekshayr sarvekshayr commented Mar 21, 2026

What changes were proposed in this pull request?

Allow container balancer to include non-standard containers such as -

  • Container CLOSED + OVER_REPLICATED and replica CLOSED must have minimum CLOSED replicas based on replication config.
  • Container CLOSED + OVER_REPLICATED and replica QUASI_CLOSED must have minimum CLOSED replicas based on replication config and replica is non-empty.
  • Container QUASI_CLOSED + OVER_REPLICATED must have all QUASI_CLOSED replicas and source replica is non-empty.
  • Container QUASI_CLOSED + HEALTHY must have all QUASI_CLOSED closed and source replica is non-empty.

This is allowed only if the new config hdds.container.balancer.include.non.standard.containers is set to true.

What is the link to the Apache JIRA

HDDS-14870

How was this patch tested?

Added tests in TestContainerBalancerSelectionCriteria and TestMoveManager.

@sarvekshayr sarvekshayr requested a review from sadanand48 March 21, 2026 14:55
Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarvekshayr

I wanted to double check my understanding:
When includeNonStandardContainers is true, ContainerBalancerSelectionCriteria correctly allows:

  • Case A: CLOSED + OVER_REPLICATED containers (with min CLOSED replicas + QUASI_CLOSED replicas)
  • Case B: QUASI_CLOSED containers with all QUASI_CLOSED replicas

However, MoveManager.move() still enforces:

  • Health must be HEALTHY – so OVER_REPLICATED is rejected with REPLICATION_NOT_HEALTHY_BEFORE_MOVE
  • Container state must be CLOSED – so QUASI_CLOSED is rejected with REPLICATION_FAIL_CONTAINER_NOT_CLOSED

Since MoveManager doesn’t consider the config, it never sees includeNonStandardContainers. That would mean these containers get selected but fail when we actually try to move them.
Did you intend to update MoveManager as well to honor this config, or is there another path i am missing? I want to make sure i am not misreading the flow.

@sarvekshayr sarvekshayr marked this pull request as draft March 23, 2026 03:08
@sarvekshayr sarvekshayr marked this pull request as ready for review March 25, 2026 08:31
Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sarvekshayr for updating MoveManager to respect includeNonStandardContainers as well.

Just wanted to double check something against the PR description, which mentions additional QUASI_CLOSED replicas in the over-replicated CLOSED case.
From the current implementation, it looks like:

  • If the replica on the source datanode is CLOSED and health is OVER_REPLICATED (with the flag on), balancing is allowed without requiring that the “extra” replicas are QUASI_CLOSED (could be extra CLOSED replicas too).
  • If the replica on the source is QUASI_CLOSED, we only require enough CLOSED replicas (>= required count).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants