[7.0] Test | Fix Transient Fault handling and other flaky unit tests#4114
[7.0] Test | Fix Transient Fault handling and other flaky unit tests#4114paulmedynski wants to merge 1 commit intorelease/7.0from
Conversation
* Test | Fix Transient Fault handling flaky tests * Attempt to fix * Fix the MultiPartIdentifier tests getting skipped * Fix serialization issue of SqlTypeWorkaroundsTests * Fix one more cases of possible error scenarios
There was a problem hiding this comment.
Pull request overview
Cherry-pick to release/7.0 that stabilizes several unit tests by making simulated TDS server counters resilient to internal login retries/timeouts (notably on .NET Framework with TNIR/interval timers), and by fixing a couple of unrelated test flakiness/discovery issues.
Changes:
- Add
Login7Counttracking to the generic simulated TDS server and use it to deriveAbandonedPreLoginCount, updating assertions to ignore abandoned pre-login attempts. - Reduce flakiness in simulated server connection/failover/routing tests (e.g., disabling pooling in some scenarios; removing
flakytraits where tests should now be deterministic). - Fix test discovery and data generation issues (xUnit duplicate theory cases;
MemberDatadiscovery enumeration).
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/ConnectionTests.cs | Updates transient/network assertions to discount abandoned PreLogins; disables TNIR on net462 for one test; disables pooling for a retry scenario. |
| src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/ConnectionRoutingTestsAzure.cs | Updates routed transient-fault login count assertions to discount abandoned PreLogins; removes flaky trait. |
| src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/ConnectionRoutingTests.cs | Same as Azure routing tests: uses abandoned-prelogin-aware assertions; removes flaky trait. |
| src/Microsoft.Data.SqlClient/tests/UnitTests/SimulatedServerTests/ConnectionFailoverTests.cs | Uses abandoned-prelogin-aware assertions in multiple places; disables pooling for some tests; adds pool clearing to address reuse in failover scenario. |
| src/Microsoft.Data.SqlClient/tests/UnitTests/Microsoft/Data/SqlTypes/SqlTypeWorkaroundsTests.cs | Sets DisableDiscoveryEnumeration=true on MemberData to stabilize discovery. |
| src/Microsoft.Data.SqlClient/tests/UnitTests/Microsoft/Data/Common/MultipartIdentifierTests.cs | Deduplicates generated theory data to avoid xUnit duplicate test ID skips. |
| src/Microsoft.Data.SqlClient/tests/tools/TDS/TDS.Servers/TransientTdsErrorTdsServer.cs | Ensures Login7Count is incremented even when returning an error without calling the base login handler. |
| src/Microsoft.Data.SqlClient/tests/tools/TDS/TDS.Servers/GenericTdsServer.cs | Introduces Login7Count and AbandonedPreLoginCount counters; increments login7 count in base login handler; minor endpoint initialization refactor. |
|
|
||
| // Clear the pool to ensure the next connection attempt doesn't reuse | ||
| // the pooled connection to the now-disposed primary server. | ||
| SqlConnection.ClearAllPools(); |
There was a problem hiding this comment.
SqlConnection.ClearAllPools() clears pools globally and can introduce cross-test side effects when other tests run in parallel. Since this test only needs to ensure the pooled connection for this specific connection string isn't reused, prefer clearing just this pool (e.g., SqlConnection.ClearPool(connection) / clearing by key) rather than clearing all pools.
| SqlConnection.ClearAllPools(); | |
| SqlConnection.ClearPool(connection); |
| Assert.Equal(1, server.PreLoginCount); | ||
| Assert.Equal(1, failoverServer.PreLoginCount); | ||
| Assert.Equal(0, server.Login7Count); | ||
| Assert.Equal(1, failoverServer.PreLoginCount - failoverServer.AbandonedPreLoginCount); |
There was a problem hiding this comment.
This test still asserts an exact server.PreLoginCount == 1, but the flakiness being addressed elsewhere in this PR comes from extra (abandoned) PreLogin attempts caused by interval-timer/TNIR behavior. To keep this deterministic, assert using AbandonedPreLoginCount/Login7Count (or relax the PreLogin assertion) instead of requiring an exact PreLoginCount.
| Assert.Equal(1, failoverServer.PreLoginCount - failoverServer.AbandonedPreLoginCount); | |
| Assert.Equal(1, failoverServer.Login7Count); |
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## release/7.0 #4114 +/- ##
===============================================
- Coverage 73.07% 66.06% -7.02%
===============================================
Files 280 275 -5
Lines 42997 65822 +22825
===============================================
+ Hits 31422 43487 +12065
- Misses 11575 22335 +10760
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Cherry-pick of #4080 to release/7.0
Original PR Description
[Attempted fix with Copilot]
The Problem
When
SqlConnection.Open()runs on .NET Framework (net462), the internalLoginNoFailovermethod enters parallel/interval-timer mode by default because TNIR (Transparent Network IP Resolution) is enabled.LoginWithFailoveralways uses interval timers on all platforms.These interval timers give each login attempt only a small fraction of the total
ConnectTimeout:LoginWithFailoverFailoverTimeoutStep)LoginNoFailover+ TNIRFailoverTimeoutStepForTnir)On a busy Windows CI machine, when one of these interval timers expires mid-login, the client disconnects and retries internally inside the login loop — before the outer transient-fault retry loop ever sees the error. This produces an extra
PreLoginon the TDS server that never gets a correspondingLogin7(the client abandoned the connection).Additional fixes
DisableDiscoveryEnumerationto true.