13 KiB
InvokeLeakTest Failure Analysis
Issue Summary
The InvokeLeakTest stress test fails only on @BDisp's machine and only when running under a debugger:
- Visual Studio 2022 on Windows (x64)
- Visual Studio 2022 on macOS (Intel-based VM)
- Visual Studio Code on Windows
The test passes in CI/CD environments and when run without a debugger.
Test Description
InvokeLeakTest is a stress test (not a unit test) located in Tests/StressTests/ApplicationStressTests.cs. It:
- Spawns multiple concurrent tasks that call
Application.Invoke()from background threads - Each invocation updates a TextField and increments a counter using
Interlocked.Increment - The test verifies that all invocations complete successfully (no "leaks")
- Runs for 50 passes with 500 increments each (25,000 total invocations)
Test Flow
// Main thread blocks in Application.Run()
Application.Run(top);
// Background thread spawns tasks
for (var j = 0; j < NUM_PASSES; j++) {
for (var i = 0; i < NUM_INCREMENTS; i++) {
Task.Run(() => {
Thread.Sleep(r.Next(2, 4)); // Random 2-4ms delay
Application.Invoke(() => {
tf.Text = $"index{r.Next()}";
Interlocked.Increment(ref _tbCounter);
});
});
}
// Wait for counter to reach expected value with 100ms polling
while (_tbCounter != expectedValue) {
_wakeUp.Wait(POLL_MS); // POLL_MS = 100ms
if (_tbCounter hasn't changed) {
throw new TimeoutException("Invoke lost");
}
}
}
How Application.Invoke Works
Call Chain
Application.Invoke(action)→ callsApplicationImpl.Instance.Invoke(action)ApplicationImpl.Invoke()checks if on main thread:- If on main thread: Execute action immediately
- If on background thread: Add to
_timedEventswithTimeSpan.Zero
TimedEvents.Add():- Calculates timestamp:
k = (DateTime.UtcNow + time).Ticks - For
TimeSpan.Zero, subtracts 100 ticks to ensure immediate execution:k -= 100 - Adds to sorted list:
_timeouts.Add(NudgeToUniqueKey(k), timeout)
- Calculates timestamp:
MainLoop.RunIteration()callsTimedEvents.RunTimers()every iterationTimedEvents.RunTimers():- Takes a copy of
_timeoutsand creates a new list (under lock) - Iterates through copy, executing callbacks where
k < now - Non-repeating callbacks (return false) are not re-added
- Takes a copy of
Critical Code Paths
ApplicationImpl.Invoke (Terminal.Gui/App/ApplicationImpl.cs:306-322)
public void Invoke (Action action)
{
// If we are already on the main UI thread
if (Application.MainThreadId == Thread.CurrentThread.ManagedThreadId)
{
action ();
return;
}
_timedEvents.Add (TimeSpan.Zero,
() =>
{
action ();
return false; // One-shot execution
}
);
}
TimedEvents.AddTimeout (Terminal.Gui/App/Timeout/TimedEvents.cs:124-139)
private void AddTimeout (TimeSpan time, Timeout timeout)
{
lock (_timeoutsLockToken)
{
long k = (DateTime.UtcNow + time).Ticks;
// if user wants to run as soon as possible set timer such that it expires right away
if (time == TimeSpan.Zero)
{
k -= 100; // Subtract 100 ticks to ensure it's "in the past"
}
_timeouts.Add (NudgeToUniqueKey (k), timeout);
Added?.Invoke (this, new (timeout, k));
}
}
TimedEvents.RunTimersImpl (Terminal.Gui/App/Timeout/TimedEvents.cs:160-192)
private void RunTimersImpl ()
{
long now = DateTime.UtcNow.Ticks;
SortedList<long, Timeout> copy;
lock (_timeoutsLockToken)
{
copy = _timeouts;
_timeouts = new ();
}
foreach ((long k, Timeout timeout) in copy)
{
if (k < now) // Execute if scheduled time is in the past
{
if (timeout.Callback ()) // Returns false for Invoke actions
{
AddTimeout (timeout.Span, timeout);
}
}
else // Future timeouts - add back to list
{
lock (_timeoutsLockToken)
{
_timeouts.Add (NudgeToUniqueKey (k), timeout);
}
}
}
}
Hypothesis: Why It Fails Under Debugger on @BDisp's Machine
Primary Hypothesis: DateTime.UtcNow Resolution and Debugger Timing
The test failure likely occurs due to a combination of factors:
1. DateTime.UtcNow Resolution Issues
The code uses DateTime.UtcNow.Ticks for timing, which has platform-dependent resolution:
- Windows: ~15.6ms resolution (system timer tick)
- Some systems: Can be lower/higher depending on timer configuration
- Debugger impact: Can affect system timer behavior
When TimeSpan.Zero invocations are added:
long k = (DateTime.UtcNow + TimeSpan.Zero).Ticks;
k -= 100; // Subtract 100 ticks (10 microseconds)
The problem: If two Invoke calls happen within the same timer tick (< ~15ms on Windows), they get the SAME DateTime.UtcNow value. The NudgeToUniqueKey function increments by 1 tick each collision, but this creates a sequence of timestamps like:
- First call:
now - 100 - Second call (same UtcNow):
now - 99 - Third call (same UtcNow):
now - 98 - ...and so on
2. Race Condition in RunTimersImpl
In RunTimersImpl, this check determines if a timeout should execute:
if (k < now) // k is scheduled time, now is current time
The race: Between when timeouts are added (with k = UtcNow - 100) and when they're checked (with fresh DateTime.UtcNow), time passes. However, if:
- Multiple invocations are added rapidly (within same timer tick)
- The system is under debugger (slower iteration loop)
- The main loop iteration happens to sample
DateTime.UtcNowat an unlucky moment
Some timeouts might have k >= now even though they were intended to be "immediate" (TimeSpan.Zero).
3. Debugger-Specific Timing Effects
When running under a debugger:
a) Slower Main Loop Iterations
- Debugger overhead slows each iteration
- More time between
RunTimerscalls - Allows more tasks to queue up between iterations
b) Timer Resolution Changes
- Debuggers can affect OS timer behavior
- May change quantum/scheduling of threads
- Different thread priorities under debugger
c) DateTime.UtcNow Sampling
- More invocations can accumulate in a single UtcNow "tick"
- Larger batches of timeouts with near-identical timestamps
- Higher chance of
k >= nowrace condition
4. The "Lost Invoke" Scenario
Failure scenario:
Time T0: Background thread calls Invoke()
- k = UtcNow - 100 (let's say 1000 ticks - 100 = 900)
- Added to _timeouts with k=900
Time T1: MainLoop iteration samples UtcNow = 850 ticks (!)
- This can happen if system timer hasn't updated yet
- Check: is k < now? Is 900 < 850? NO!
- Timeout is NOT executed, added back to _timeouts
Time T2: Next iteration, UtcNow = 1100 ticks
- Check: is k < now? Is 900 < 1100? YES!
- Timeout executes
But if the test's 100ms polling window expires before T2, it throws TimeoutException.
5. Why x64 Machines Specifically?
UPDATE: @tig confirmed he can reproduce on his x64 Windows machine but NOT on his ARM Windows machine, validating this hypothesis.
Architecture-specific factors:
- CPU/Chipset: Intel/AMD x64 vs ARM have fundamentally different timer implementations
- x64: Uses legacy TSC (Time Stamp Counter) or HPET (High Precision Event Timer)
- ARM: Uses different timer architecture with potentially better resolution
- VM/Virtualization: MacOS VM on Intel laptop may have timer virtualization quirks
- OS Configuration: Windows timer resolution settings (can be 1ms to 15.6ms)
- Debugger Version: Specific VS2022 build with different debugging hooks
- System Load: Background processes affecting timer accuracy
- Hardware: Specific timer hardware behavior on x64 architecture
Secondary Hypothesis: Thread Scheduling Under Debugger
The test spawns tasks with Task.Run() and small random delays (2-4ms). Under a debugger:
- Thread scheduling may be different
- Task scheduling might be more synchronous
- More tasks could complete within same timer resolution window
- Creates "burst" of invocations that all get same timestamp
Why It Doesn't Fail on ARM
CONFIRMED: @tig cannot reproduce on ARM Windows machine, only on x64 Windows.
ARM environments:
- Run without debugger (no debugging overhead) in CI/CD
- Different timer characteristics - ARM timer architecture has better resolution
- Faster iterations (less time for race conditions)
- ARM CPU architecture uses different timer implementation than x64
- ARM timer subsystem may have higher base resolution or better behavior under load
Evidence Supporting the Hypothesis
-
Test uses 100ms polling:
_wakeUp.Wait(POLL_MS)wherePOLL_MS = 100- This gives a narrow window for all invocations to complete
- Any delay beyond 100ms triggers failure
-
Test spawns 500 concurrent tasks per pass: Each with 2-4ms delay
- Under debugger, these could all queue up in < 100ms
- But execution might take > 100ms due to debugger overhead
-
Only fails under debugger: Strong indicator of timing-related issue
- Debugger affects iteration speed and timer behavior
-
Architecture-specific (CONFIRMED): @tig reproduced on x64 Windows but NOT on ARM Windows
- This strongly supports the timer resolution hypothesis
- x64 timer implementation is more susceptible to this race condition
- ARM timer architecture handles the scenario more gracefully
Recommended Solutions
Solution 1: Use Stopwatch Instead of DateTime.UtcNow (Recommended)
Replace DateTime.UtcNow.Ticks with Stopwatch.GetTimestamp() in TimedEvents:
- Higher resolution (typically microseconds)
- More consistent across platforms
- Less affected by system time adjustments
- Better for interval timing
Solution 2: Increase TimeSpan.Zero Buffer
Change the immediate execution buffer from -100 ticks to something more substantial:
if (time == TimeSpan.Zero)
{
k -= TimeSpan.TicksPerMillisecond * 10; // 10ms in the past instead of 0.01ms
}
Solution 3: Add Wakeup Call on Invoke
When adding a TimeSpan.Zero timeout, explicitly wake up the main loop:
_timedEvents.Add(TimeSpan.Zero, ...);
MainLoop?.Wakeup(); // Force immediate processing
Solution 4: Test-Specific Changes
For the test itself:
- Increase
POLL_MSfrom 100 to 200 or 500 for debugger scenarios - Add conditional:
if (Debugger.IsAttached) POLL_MS = 500; - This accommodates debugger overhead without changing production code
Solution 5: Use Interlocked Operations More Defensively
Add explicit memory barriers and volatile reads to ensure visibility:
volatile int _tbCounter;
// or
Interlocked.MemoryBarrier();
int currentCount = Interlocked.CompareExchange(ref _tbCounter, 0, 0);
Additional Investigation Needed
To confirm hypothesis, @BDisp could:
- Add diagnostics to test:
var sw = Stopwatch.StartNew();
while (_tbCounter != expectedValue) {
_wakeUp.Wait(pollMs);
if (_tbCounter != tbNow) continue;
// Log timing information
Console.WriteLine($"Timeout at {sw.ElapsedMilliseconds}ms");
Console.WriteLine($"Counter: {_tbCounter}, Expected: {expectedValue}");
Console.WriteLine($"Missing: {expectedValue - _tbCounter}");
// Check if invokes are still queued
Console.WriteLine($"TimedEvents count: {Application.TimedEvents?.Timeouts.Count}");
}
- Test timer resolution:
var samples = new List<long>();
for (int i = 0; i < 100; i++) {
samples.Add(DateTime.UtcNow.Ticks);
}
var deltas = samples.Zip(samples.Skip(1), (a, b) => b - a).Where(d => d > 0);
Console.WriteLine($"Min delta: {deltas.Min()} ticks ({deltas.Min() / 10000.0}ms)");
- Monitor TimedEvents queue:
- Add logging in
TimedEvents.RunTimersImplto see when timeouts are deferred - Check if
k >= nowcondition is being hit
Conclusion
The InvokeLeakTest failure under debugger is likely caused by:
- Low resolution of DateTime.UtcNow combined with rapid invocations
- Race condition in timeout execution check (
k < now) - Debugger overhead exacerbating timing issues
- Platform-specific timer behavior on @BDisp's hardware/VM
The most robust fix is to use Stopwatch for timing instead of DateTime.UtcNow, providing:
- Higher resolution timing
- Better consistency across platforms
- Reduced susceptibility to debugger effects
This is a timing/performance issue in the stress test environment, not a functional bug in the production code. The test is correctly identifying edge cases in high-concurrency scenarios that are more likely to manifest under debugger overhead.