11 KiB
Playback Timing Fix Design
Date: 2025-10-07 Author: Claude Code Status: Design Phase
Problem Statement
Symptom
Video playback is jerky ("통통 튄다") at 30fps despite NVDEC hardware acceleration working correctly.
Root Cause Analysis
NOT the problem:
- ❌ CUDA synchronization (cuStreamSynchronize is fine)
- ❌ Fence implementation (works correctly but unnecessary)
- ❌ GPU decode performance (10-15ms, well within budget)
ACTUAL problem:
- ✅ PlaybackController::TimingThreadLoop() does NOT wait for frame processing completion
- ✅ Timing thread sends "next frame" signal every 33.33ms regardless of previous frame status
- ✅ FrameProcessor drops frames when previous frame still processing (m_frameProcessing flag)
Evidence
PlaybackController.cpp:249-275 - Current broken implementation:
void PlaybackController::TimingThreadLoop()
{
while (!m_shouldStopTiming && m_isPlaying) {
// 1. Signal frame processing
if (m_frameReadyCallback) {
m_frameReadyCallback(); // → ProcessFrame()
}
// 2. ❌ IMMEDIATE advance without waiting!
m_currentFrame++;
m_currentTime = m_currentFrame / m_frameRate;
// 3. Sleep until next frame
std::this_thread::sleep_until(nextFrame);
}
}
FrameProcessor.cpp:47-53 - Frame drop mechanism:
bool FrameProcessor::ProcessFrame(...)
{
// Check if previous frame still processing
bool expected = false;
if (!m_frameProcessing.compare_exchange_strong(expected, true)) {
m_framesDropped++; // ← This causes jerky playback!
return false;
}
// ... decode and render ...
}
Solution Design
Approach 1: Fix TimingThreadLoop (RECOMMENDED)
Minimal change, keeps existing architecture
Modified PlaybackController.cpp
void PlaybackController::TimingThreadLoop()
{
auto start = std::chrono::high_resolution_clock::now();
while (!m_shouldStopTiming && m_isPlaying) {
auto frameStart = std::chrono::high_resolution_clock::now();
// 1. Signal frame processing
if (m_frameReadyCallback) {
m_frameReadyCallback();
}
// 2. ✅ WAIT for frame processing completion (max 100ms timeout)
int waitCount = 0;
while (IsFrameProcessing() && waitCount < 100) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
waitCount++;
}
if (waitCount >= 100) {
LOGF_WARNING("[PlaybackController] Frame processing timeout");
}
// 3. Advance frame counter
m_currentFrame++;
m_currentTime = m_currentFrame / m_frameRate;
// 4. Sleep for remaining time to maintain 30fps
auto frameEnd = std::chrono::high_resolution_clock::now();
double elapsed = std::chrono::duration<double, std::milli>(frameEnd - frameStart).count();
auto nextFrameTime = start + std::chrono::microseconds(
static_cast<long long>(33333.0 * m_currentFrame));
std::this_thread::sleep_until(nextFrameTime);
}
}
Required Changes
PlaybackController.h:
class PlaybackController
{
// Add frame processor reference
void SetFrameProcessor(FrameProcessor* processor) { m_frameProcessor = processor; }
bool IsFrameProcessing() const {
return m_frameProcessor && m_frameProcessor->IsProcessing();
}
private:
FrameProcessor* m_frameProcessor = nullptr; // Non-owning pointer
};
VideoPlayerControl2.xaml.cpp:
void VideoPlayerControl2::LoadVideo(...)
{
// ... existing code ...
// Link PlaybackController to FrameProcessor
m_playbackController->SetFrameProcessor(m_frameProcessor.get());
}
Benefits
- ✅ Minimal code change (10-15 lines)
- ✅ Keeps existing architecture intact
- ✅ Easy to test and rollback
- ✅ No risk to other components
- ✅ Frame drops eliminated
Approach 2: Remove Fence + Fix Timing (COMPREHENSIVE)
Reduces complexity while fixing the root cause
Changes
1. Remove CUDA-D3D12 Fence (complexity reduction):
Delete:
D3D12SurfaceHandler::SetD3D12Fence()D3D12SurfaceHandler::SignalD3D12Fence()vavcore_get_sync_fence()APIVavCoreVideoFrame::sync_fence_valuefield
Replace with:
// NVDECAV1Decoder.cpp
VavCoreResult NVDECAV1Decoder::DecodeToSurface(...)
{
// ... decode logic ...
// Copy RGBA to D3D12 texture
m_d3d12Handler->CopyRGBAFrame(...);
// ✅ Simple synchronous wait (15ms max)
cuStreamSynchronize(m_stream);
// ❌ Remove fence signaling
// m_d3d12Handler->SignalD3D12Fence(...);
return VAVCORE_SUCCESS;
}
2. Fix TimingThreadLoop (same as Approach 1)
3. Keep D3D12 Internal Fence (required for SwapChain):
- ✅ Keep
D3D12VideoRenderer::m_fence - ✅ Keep
WaitForFrameCompletion()for triple buffering
Benefits
- ✅ Reduced complexity (remove External Fence)
- ✅ Fixes root cause (timing)
- ✅ Same performance (15ms CPU block < 33.33ms budget)
- ✅ Easier debugging (synchronous flow)
Risks
- ⚠️ More code changes (3 components)
- ⚠️ Requires more thorough testing
Decision Matrix
| Aspect | Approach 1 (Fix Timing Only) | Approach 2 (Remove Fence + Fix) |
|---|---|---|
| Code Changes | 10-15 lines | ~50 lines |
| Complexity Reduction | None | Medium |
| Performance Impact | None | None (15ms < 33.33ms) |
| Risk Level | Very Low | Low |
| Testing Effort | Minimal | Moderate |
| Recommendation | ✅ Start here | Follow-up refactor |
Implementation Plan
Phase 1: Fix Timing (Immediate)
Goal: Stop jerky playback
Tasks:
- Modify
PlaybackController::TimingThreadLoop()to wait for frame completion - Add
SetFrameProcessor()method to PlaybackController - Link FrameProcessor in VideoPlayerControl2
- Test with sample video
Files Modified:
PlaybackController.h(add 3 lines)PlaybackController.cpp(modify TimingThreadLoop, ~15 lines)VideoPlayerControl2.xaml.cpp(add 1 line)
Expected Outcome: Smooth 30fps playback
Phase 2: Remove Fence (Follow-up)
Goal: Reduce technical debt
Tasks:
- Remove External Fence from D3D12SurfaceHandler
- Replace with
cuStreamSynchronize()in NVDECAV1Decoder - Remove
vavcore_get_sync_fence()from VavCore API - Remove fence wait from D3D12VideoRenderer
- Update VavCoreVideoFrame struct
Files Modified:
D3D12SurfaceHandler.h/.cppNVDECAV1Decoder.cppD3D12VideoRenderer.cppVavCore.h(C API)VideoTypes.h(internal)
Expected Outcome: Same performance, simpler code
Performance Analysis
Frame Processing Timeline (30fps = 33.33ms budget)
Current (Broken):
0ms : Timing thread signals frame N
0ms : ProcessFrame(N) starts
10ms : CUDA decode completes (async)
15ms : UI render starts
25ms : Frame N complete
33.33ms: Timing thread signals frame N+1 (regardless of N status!)
→ If frame N not done, FRAME DROP occurs
After Fix:
0ms : Timing thread signals frame N
0ms : ProcessFrame(N) starts
10ms : CUDA decode completes
15ms : UI render starts
25ms : Frame N complete
25ms : Timing thread detects completion, advances counter
33.33ms: Sleep completes, signal frame N+1
→ NO FRAME DROPS
Budget Analysis
- NVDEC decode: 10-15ms
- UI render: 5-10ms
- Total: 15-25ms
- Budget: 33.33ms (30fps)
- Margin: 8-18ms spare time
Conclusion: We have plenty of time budget, synchronous processing is perfectly fine.
Alternative Considered and Rejected
Fence-based Async Processing
Why rejected:
- Current architecture doesn't benefit: Timing thread doesn't overlap work
- 33.33ms budget sufficient: No need for async optimization
- Adds complexity: External fence requires CUDA/D3D12 interop code
- Doesn't fix root cause: Timing thread still needs to wait
When Fence would be beneficial:
- Variable framerate (60fps+)
- Tight timing budget (< 16ms)
- Truly parallel decode/render pipeline
Our case: Fixed 30fps, 33.33ms budget → synchronous is simpler and sufficient
Testing Plan
Phase 1 Testing (Timing Fix)
Unit Tests: None required (integration testing sufficient)
Integration Tests:
-
Smooth Playback Test:
- Load 30fps test video
- Play for 30 seconds
- Verify no frame drops
- Verify smooth visual playback
-
Frame Timing Test:
- Log actual frame intervals
- Verify 33.33ms ± 2ms intervals
- Check m_framesDropped counter (should be 0)
-
Long Duration Test:
- Play 5-minute video
- Monitor frame drops
- Check for memory leaks
Success Criteria:
- ✅ Zero frame drops over 5 minutes
- ✅ Consistent 33.33ms frame intervals
- ✅ Smooth visual playback (no jerking)
Phase 2 Testing (Fence Removal)
Additional Tests:
-
Performance Regression Test:
- Measure decode time before/after
- Verify < 20ms average decode time
- Check CPU usage (should be same)
-
Multi-format Test:
- Test with different resolutions (720p, 1080p, 4K)
- Verify all formats work
-
Decoder Compatibility Test:
- Test NVDEC (affected by change)
- Test VPL (should be unaffected)
- Test dav1d fallback (should be unaffected)
Success Criteria:
- ✅ Same performance as before
- ✅ No visual regression
- ✅ All decoders work correctly
Rollback Plan
Phase 1 Rollback
If timing fix causes issues:
- Revert PlaybackController.cpp changes
- Revert PlaybackController.h changes
- Revert VideoPlayerControl2.xaml.cpp link
Time: < 5 minutes
Phase 2 Rollback
If Fence removal causes issues:
- Git revert to Phase 1 completion
- Keep timing fix, restore Fence code
Time: < 10 minutes
Implementation Notes
Thread Safety Considerations
FrameProcessor::IsProcessing():
- Called from Timing Thread
- Reads atomic bool
m_frameProcessing - ✅ Thread-safe (atomic operation)
PlaybackController::SetFrameProcessor():
- Called from UI Thread during LoadVideo
- Non-owning pointer (FrameProcessor lifetime > PlaybackController)
- ✅ Safe (no concurrent access during init)
Performance Considerations
CPU Blocking in cuStreamSynchronize():
- Blocks for ~15ms
- Timing thread sleeps for 33.33ms anyway
- ✅ No performance impact (within budget)
Wait Loop in TimingThreadLoop():
- 1ms sleep per iteration
- Max 100 iterations (100ms timeout)
- Typical: 25ms wait = 25 iterations
- ✅ Low CPU overhead
Success Metrics
Immediate (Phase 1)
- Zero frame drops during playback
- Smooth visual playback (user confirmed)
- Consistent 30fps timing (logs verified)
Follow-up (Phase 2)
- Code complexity reduced (fence code removed)
- Same or better performance
- All decoders working (NVDEC, VPL, dav1d)
References
Related Files:
vav2/platforms/windows/applications/vav2player/Vav2Player/src/Playback/PlaybackController.cppvav2/platforms/windows/applications/vav2player/Vav2Player/src/Playback/FrameProcessor.cppvav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.cppvav2/platforms/windows/applications/vav2player/Vav2Player/src/Rendering/D3D12VideoRenderer.cpp
Previous Attempts:
- Fence implementation (2025-10-07): Did not fix jerky playback
- Timing analysis: Identified root cause in PlaybackController
Approval
Approved by: [User] Date: [Pending] Implementation Start: [Pending]