Skip to content

[7.0] Test | Fix Transient Fault handling and other flaky unit tests#4114

Open
paulmedynski wants to merge 1 commit intorelease/7.0from
cherry-pick/7.0/4080
Open

[7.0] Test | Fix Transient Fault handling and other flaky unit tests#4114
paulmedynski wants to merge 1 commit intorelease/7.0from
cherry-pick/7.0/4080

Conversation

@paulmedynski
Copy link
Copy Markdown
Contributor

@paulmedynski paulmedynski commented Mar 31, 2026

Cherry-pick of #4080 to release/7.0


Original PR Description

[Attempted fix with Copilot]

The Problem

When SqlConnection.Open() runs on .NET Framework (net462), the internal LoginNoFailover method enters parallel/interval-timer mode by default because TNIR (Transparent Network IP Resolution) is enabled. LoginWithFailover always uses interval timers on all platforms.

These interval timers give each login attempt only a small fraction of the total ConnectTimeout:

Path Fraction With 30s timeout
LoginWithFailover 8% (FailoverTimeoutStep) 2.4s per attempt
LoginNoFailover + TNIR 12.5% (FailoverTimeoutStepForTnir) 1.875s per attempt

On a busy Windows CI machine, when one of these interval timers expires mid-login, the client disconnects and retries internally inside the login loop — before the outer transient-fault retry loop ever sees the error. This produces an extra PreLogin on the TDS server that never gets a corresponding Login7 (the client abandoned the connection).

Additional fixes

  • Fixes the multipart identifier getting skipped due to duplicate IDs
  • Fixes failover port test failure, by ensuring connection pools are cleared before connecting to failover partner.
  • Fixes SqlTypeWorkaroundsTests discovery by setting DisableDiscoveryEnumeration to true.

* Test | Fix Transient Fault handling flaky tests

* Attempt to fix

* Fix the MultiPartIdentifier tests getting skipped

* Fix serialization issue of SqlTypeWorkaroundsTests

* Fix one more cases of possible error scenarios
Copilot AI review requested due to automatic review settings March 31, 2026 19:29
@github-project-automation github-project-automation bot moved this to To triage in SqlClient Board Mar 31, 2026
@paulmedynski paulmedynski added this to the 7.0.1 milestone Mar 31, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Cherry-pick to release/7.0 that stabilizes several unit tests by making simulated TDS server counters resilient to internal login retries/timeouts (notably on .NET Framework with TNIR/interval timers), and by fixing a couple of unrelated test flakiness/discovery issues.

Changes:

  • Add Login7Count tracking to the generic simulated TDS server and use it to derive AbandonedPreLoginCount, updating assertions to ignore abandoned pre-login attempts.
  • Reduce flakiness in simulated server connection/failover/routing tests (e.g., disabling pooling in some scenarios; removing flaky traits where tests should now be deterministic).
  • Fix test discovery and data generation issues (xUnit duplicate theory cases; MemberData discovery enumeration).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/ConnectionTests.cs Updates transient/network assertions to discount abandoned PreLogins; disables TNIR on net462 for one test; disables pooling for a retry scenario.
src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/ConnectionRoutingTestsAzure.cs Updates routed transient-fault login count assertions to discount abandoned PreLogins; removes flaky trait.
src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/ConnectionRoutingTests.cs Same as Azure routing tests: uses abandoned-prelogin-aware assertions; removes flaky trait.
src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/ConnectionFailoverTests.cs Uses abandoned-prelogin-aware assertions in multiple places; disables pooling for some tests; adds pool clearing to address reuse in failover scenario.
src/Microsoft.Data.SqlClient/tests/UnitTests/Microsoft/Data/SqlTypes/SqlTypeWorkaroundsTests.cs Sets DisableDiscoveryEnumeration=true on MemberData to stabilize discovery.
src/Microsoft.Data.SqlClient/tests/UnitTests/Microsoft/Data/Common/MultipartIdentifierTests.cs Deduplicates generated theory data to avoid xUnit duplicate test ID skips.
src/Microsoft.Data.SqlClient/tests/tools/TDS/TDS.Servers/TransientTdsErrorTdsServer.cs Ensures Login7Count is incremented even when returning an error without calling the base login handler.
src/Microsoft.Data.SqlClient/tests/tools/TDS/TDS.Servers/GenericTdsServer.cs Introduces Login7Count and AbandonedPreLoginCount counters; increments login7 count in base login handler; minor endpoint initialization refactor.


// Clear the pool to ensure the next connection attempt doesn't reuse
// the pooled connection to the now-disposed primary server.
SqlConnection.ClearAllPools();
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SqlConnection.ClearAllPools() clears pools globally and can introduce cross-test side effects when other tests run in parallel. Since this test only needs to ensure the pooled connection for this specific connection string isn't reused, prefer clearing just this pool (e.g., SqlConnection.ClearPool(connection) / clearing by key) rather than clearing all pools.

Suggested change
SqlConnection.ClearAllPools();
SqlConnection.ClearPool(connection);

Copilot uses AI. Check for mistakes.
Assert.Equal(1, server.PreLoginCount);
Assert.Equal(1, failoverServer.PreLoginCount);
Assert.Equal(0, server.Login7Count);
Assert.Equal(1, failoverServer.PreLoginCount - failoverServer.AbandonedPreLoginCount);
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test still asserts an exact server.PreLoginCount == 1, but the flakiness being addressed elsewhere in this PR comes from extra (abandoned) PreLogin attempts caused by interval-timer/TNIR behavior. To keep this deterministic, assert using AbandonedPreLoginCount/Login7Count (or relax the PreLogin assertion) instead of requiring an exact PreLoginCount.

Suggested change
Assert.Equal(1, failoverServer.PreLoginCount - failoverServer.AbandonedPreLoginCount);
Assert.Equal(1, failoverServer.Login7Count);

Copilot uses AI. Check for mistakes.
@mdaigle mdaigle moved this from To triage to In review in SqlClient Board Apr 1, 2026
@paulmedynski paulmedynski marked this pull request as ready for review April 2, 2026 15:46
@paulmedynski paulmedynski requested a review from a team as a code owner April 2, 2026 15:46
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.06%. Comparing base (864f666) to head (d1b229c).

❗ There is a different number of reports uploaded between BASE (864f666) and HEAD (d1b229c). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (864f666) HEAD (d1b229c)
CI-SqlClient 1 0
Additional details and impacted files
@@               Coverage Diff               @@
##           release/7.0    #4114      +/-   ##
===============================================
- Coverage        73.07%   66.06%   -7.02%     
===============================================
  Files              280      275       -5     
  Lines            42997    65822   +22825     
===============================================
+ Hits             31422    43487   +12065     
- Misses           11575    22335   +10760     
Flag Coverage Δ
CI-SqlClient ?
PR-SqlClient-Project 66.06% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@paulmedynski paulmedynski modified the milestones: 7.0.1, 7.0.2 Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

4 participants