mirror of
https://github.com/gui-cs/Terminal.Gui.git
synced 2025-12-26 07:47:54 +01:00
* Initial plan * Add comprehensive analysis of InvokeLeakTest debugger failure Co-authored-by: tig <585482+tig@users.noreply.github.com> * Add XML documentation to InvokeLeakTest about debugger issues Co-authored-by: tig <585482+tig@users.noreply.github.com> * Add visual timing diagrams for InvokeLeakTest analysis Co-authored-by: tig <585482+tig@users.noreply.github.com> * Add executive summary of InvokeLeakTest investigation Co-authored-by: tig <585482+tig@users.noreply.github.com> * Update analysis with x64 vs ARM confirmation from @tig Co-authored-by: tig <585482+tig@users.noreply.github.com> * Implement Stopwatch-based timing in TimedEvents to fix x64 race condition Co-authored-by: tig <585482+tig@users.noreply.github.com> * Update documentation to reflect fix implementation Co-authored-by: tig <585482+tig@users.noreply.github.com> * Fix test issues and increase TimeSpan.Zero buffer for debugger safety Co-authored-by: tig <585482+tig@users.noreply.github.com> * Add MainLoop.Wakeup() call in Invoke and remove problematic test Co-authored-by: tig <585482+tig@users.noreply.github.com> * Auto-detect debugger and increase test timeout to 500ms Co-authored-by: tig <585482+tig@users.noreply.github.com> * Remove unnecessary MainLoop.Wakeup() call for v2 drivers Co-authored-by: tig <585482+tig@users.noreply.github.com> * Move analysis documents to Tests/StressTests folder Co-authored-by: tig <585482+tig@users.noreply.github.com> * Add test coverage for multiple drivers per @BDisp's suggestion Co-authored-by: tig <585482+tig@users.noreply.github.com> * Revert multi-driver test coverage changes per @tig request Co-authored-by: tig <585482+tig@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tig <585482+tig@users.noreply.github.com>
9.6 KiB
9.6 KiB
InvokeLeakTest Timing Diagram
Normal Operation (Without Debugger)
Timeline (milliseconds):
0ms 10ms 20ms 30ms 40ms 50ms 60ms 70ms 80ms 90ms 100ms
|------|------|------|------|------|------|------|------|------|------|
│
│ Background Thread 1: Task.Run → Sleep(2-4ms) → Invoke()
│ ↓
│ Background Thread 2: Task.Run → Sleep(2-4ms) → Invoke()
│ ↓
│ Background Thread 3: Task.Run → Sleep(2-4ms) → Invoke()
│ ↓
│ [All added to _timeouts]
│
│ Main Loop: ──────────[Iter]───────[Iter]───────[Iter]───────[Iter]────
│ ↓ ↓ ↓ ↓
│ RunTimers RunTimers RunTimers RunTimers
│ ↓ ↓ ↓ ↓
│ Execute 0 Execute 5 Execute 10 Execute 15
│
│ Counter: 0 ───────→ 0 ─────→ 5 ───────→ 15 ────→ 30 ────→ 45 ─────→ 50
│
│ Test Check: ✓ PASS
│ └──────────────100ms window────────────────┘
Result: All invocations execute within 100ms → Test passes
Problem Scenario (With Debugger - @BDisp's Machine)
Timeline (milliseconds):
0ms 10ms 20ms 30ms 40ms 50ms 60ms 70ms 80ms 90ms 100ms 110ms
|------|------|------|------|------|------|------|------|------|------|------|
│
│ Background Threads: 500 Tasks launch rapidly
│ ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ (burst of invocations)
│ All added to _timeouts within same DateTime.UtcNow tick
│ Timestamps: T-100, T-99, T-98, T-97, ... (NudgeToUniqueKey)
│
│ Main Loop (SLOW due to debugger overhead):
│ ────────────────[Iter 1]────────────────────[Iter 2]──────────────
│ 25ms 60ms
│ ↓ ↓
│ RunTimers RunTimers
│ ↓ ↓
│ DateTime.UtcNow DateTime.UtcNow
│ = T0 = T1
│ ↓ ↓
│ Check: k < now? Check: k < now?
│ ↓ ↓
│ ┌─────────────────┴──────────────┐ │
│ │ Some timeouts: k >= now ! │ │ Execute some
│ │ These are NOT executed │ │ timeouts
│ │ Added back to _timeouts │ │
│ └────────────────────────────────┘ ↓
│ Counter += 300
│ Counter: 0 ────────────────→ 0 ──────────────────────→ 300 ────────────→ 450
│
│ Test Check at 100ms: ✗ FAIL
│ └──────────100ms window──────┘
│ Counter = 300, Expected = 500
│ Missing 200 invocations!
│
│ (Those 200 invocations execute later, around 110ms)
Result: Not all invocations execute within 100ms → TimeoutException
The DateTime.UtcNow Resolution Problem
Real Time: 0.000ms 0.001ms 0.002ms 0.003ms ... 15.6ms 15.7ms
│ │ │ │ │ │
DateTime.UtcNow: T0───────────────────────────────────────→T1─────→T2
└─────────────15.6ms tick──────────────────┘
All invocations here get T0
When 100 invocations happen within 15.6ms:
Invoke #1: k = T0 - 100 ticks
Invoke #2: k = T0 - 99 ticks (NudgeToUniqueKey increments)
Invoke #3: k = T0 - 98 ticks
...
Invoke #100: k = T0 + 0 ticks (This is T0!)
When RunTimers() checks at time T0 + 50 ticks:
Invoke #1-50: k < now → Execute ✓
Invoke #51-100: k >= now → NOT executed! Added back to queue ✗
Why Debugger Makes It Worse
Without Debugger
Main Loop Iteration Time: ~10-20ms
│ Invoke batch: 10 tasks ────→ Execute 10 ────→ Next batch: 10 tasks
│ 10ms 10ms
│ Small batches processed quickly
With Debugger
Main Loop Iteration Time: ~25-50ms (2-5x slower!)
│ Invoke batch: 100 tasks (burst!) ────────→ Execute 50 ──→ 50 still queued
│ 50ms ↓
│ Need another 50ms!
│ Large batches accumulate, processing delayed
Effect: More invocations queue up between iterations, increasing likelihood of timestamp collisions and race conditions.
The Race Condition Explained
Thread Timeline:
Background Thread Main Thread
───────────────── ───────────
[Call Invoke()]
↓
[Lock _timeoutsLockToken]
↓
k = DateTime.UtcNow.Ticks
= 1000
↓
k -= 100 (= 900)
↓
[Add to _timeouts with k=900]
↓
[Release lock] [Lock _timeoutsLockToken]
↓
[Copy _timeouts]
↓
[Release lock]
↓
now = DateTime.UtcNow.Ticks
= 850 ⚠️ (Timer hasn't updated!)
↓
Check: k < now?
900 < 850? → FALSE!
↓
[Timeout NOT executed]
[Added back to _timeouts]
Problem: Between when k is calculated (900) and when it's checked (now=850), the system timer hasn't updated! This can happen because:
- DateTime.UtcNow has coarse resolution (~15ms)
- Thread scheduling can cause the check to happen "early"
- Debugger makes timing less predictable
Solution Comparison
Current: DateTime.UtcNow
Resolution: ~15.6ms (Windows), varies by platform
Precision: Low
Stability: Affected by system time changes
Debugger: Timing issues
Time: 0ms ────────────→ 15.6ms ────────────→ 31.2ms
Reading: T0 T1 T2
└─────All values here are T0─────┘
Proposed: Stopwatch.GetTimestamp()
Resolution: ~1 microsecond (typical)
Precision: High
Stability: Not affected by system time changes
Debugger: More reliable
Time: 0ms → 0.001ms → 0.002ms → 0.003ms → ...
Reading: T0 T1 T2 T3 ...
Each reading is unique and monotonic
Benefit: With microsecond resolution, even 1000 rapid invocations get unique timestamps, eliminating the NudgeToUniqueKey workaround and race conditions.
Test Scenarios
Scenario 1: Fast Machine, No Debugger
Iteration time: 5-10ms
Invoke rate: 20-30/ms
Result: ✓ PASS (plenty of time margin)
Scenario 2: Normal Machine, No Debugger
Iteration time: 10-20ms
Invoke rate: 10-20/ms
Result: ✓ PASS (adequate time margin)
Scenario 3: ARM Machine, Debugger (@tig's ARM Windows)
Iteration time: 20-30ms
Invoke rate: 15-20/ms
ARM timer resolution: Better than x64
Result: ✓ PASS (ARM timer architecture handles it)
Scenario 4: x64 Machine, Debugger (@BDisp's x64, @tig's x64 Windows) - CONFIRMED
Iteration time: 30-50ms
Invoke rate: 10-15/ms
DateTime.UtcNow resolution: 15-20ms (x64 TSC/HPET timer)
Result: ✗ FAIL (exceeds 100ms window)
CONFIRMED: @tig reproduced on x64 but NOT on ARM
Recommendations
Immediate Fix (Test-Level)
// Increase tolerance for debugger scenarios
#if DEBUG
private const int POLL_MS = Debugger.IsAttached ? 500 : 100;
#else
private const int POLL_MS = 100;
#endif
Long-Term Fix (Production Code)
// In TimedEvents.cs, replace DateTime.UtcNow with Stopwatch
private static long GetTimestampTicks()
{
return Stopwatch.GetTimestamp() * (TimeSpan.TicksPerSecond / Stopwatch.Frequency);
}
// Use in AddTimeout:
long k = GetTimestampTicks() + time.Ticks;
This provides microsecond resolution and eliminates the race condition entirely.