445 lines
11 KiB
Markdown
445 lines
11 KiB
Markdown
|
|
# Playback Timing Fix Design
|
||
|
|
|
||
|
|
**Date**: 2025-10-07
|
||
|
|
**Author**: Claude Code
|
||
|
|
**Status**: Design Phase
|
||
|
|
|
||
|
|
## Problem Statement
|
||
|
|
|
||
|
|
### Symptom
|
||
|
|
Video playback is jerky ("통통 튄다") at 30fps despite NVDEC hardware acceleration working correctly.
|
||
|
|
|
||
|
|
### Root Cause Analysis
|
||
|
|
|
||
|
|
**NOT the problem**:
|
||
|
|
- ❌ CUDA synchronization (cuStreamSynchronize is fine)
|
||
|
|
- ❌ Fence implementation (works correctly but unnecessary)
|
||
|
|
- ❌ GPU decode performance (10-15ms, well within budget)
|
||
|
|
|
||
|
|
**ACTUAL problem**:
|
||
|
|
- ✅ **PlaybackController::TimingThreadLoop()** does NOT wait for frame processing completion
|
||
|
|
- ✅ Timing thread sends "next frame" signal every 33.33ms regardless of previous frame status
|
||
|
|
- ✅ FrameProcessor drops frames when previous frame still processing (m_frameProcessing flag)
|
||
|
|
|
||
|
|
### Evidence
|
||
|
|
|
||
|
|
**PlaybackController.cpp:249-275** - Current broken implementation:
|
||
|
|
```cpp
|
||
|
|
void PlaybackController::TimingThreadLoop()
|
||
|
|
{
|
||
|
|
while (!m_shouldStopTiming && m_isPlaying) {
|
||
|
|
// 1. Signal frame processing
|
||
|
|
if (m_frameReadyCallback) {
|
||
|
|
m_frameReadyCallback(); // → ProcessFrame()
|
||
|
|
}
|
||
|
|
|
||
|
|
// 2. ❌ IMMEDIATE advance without waiting!
|
||
|
|
m_currentFrame++;
|
||
|
|
m_currentTime = m_currentFrame / m_frameRate;
|
||
|
|
|
||
|
|
// 3. Sleep until next frame
|
||
|
|
std::this_thread::sleep_until(nextFrame);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**FrameProcessor.cpp:47-53** - Frame drop mechanism:
|
||
|
|
```cpp
|
||
|
|
bool FrameProcessor::ProcessFrame(...)
|
||
|
|
{
|
||
|
|
// Check if previous frame still processing
|
||
|
|
bool expected = false;
|
||
|
|
if (!m_frameProcessing.compare_exchange_strong(expected, true)) {
|
||
|
|
m_framesDropped++; // ← This causes jerky playback!
|
||
|
|
return false;
|
||
|
|
}
|
||
|
|
// ... decode and render ...
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Solution Design
|
||
|
|
|
||
|
|
### Approach 1: Fix TimingThreadLoop (RECOMMENDED)
|
||
|
|
|
||
|
|
**Minimal change, keeps existing architecture**
|
||
|
|
|
||
|
|
#### Modified PlaybackController.cpp
|
||
|
|
```cpp
|
||
|
|
void PlaybackController::TimingThreadLoop()
|
||
|
|
{
|
||
|
|
auto start = std::chrono::high_resolution_clock::now();
|
||
|
|
|
||
|
|
while (!m_shouldStopTiming && m_isPlaying) {
|
||
|
|
auto frameStart = std::chrono::high_resolution_clock::now();
|
||
|
|
|
||
|
|
// 1. Signal frame processing
|
||
|
|
if (m_frameReadyCallback) {
|
||
|
|
m_frameReadyCallback();
|
||
|
|
}
|
||
|
|
|
||
|
|
// 2. ✅ WAIT for frame processing completion (max 100ms timeout)
|
||
|
|
int waitCount = 0;
|
||
|
|
while (IsFrameProcessing() && waitCount < 100) {
|
||
|
|
std::this_thread::sleep_for(std::chrono::milliseconds(1));
|
||
|
|
waitCount++;
|
||
|
|
}
|
||
|
|
|
||
|
|
if (waitCount >= 100) {
|
||
|
|
LOGF_WARNING("[PlaybackController] Frame processing timeout");
|
||
|
|
}
|
||
|
|
|
||
|
|
// 3. Advance frame counter
|
||
|
|
m_currentFrame++;
|
||
|
|
m_currentTime = m_currentFrame / m_frameRate;
|
||
|
|
|
||
|
|
// 4. Sleep for remaining time to maintain 30fps
|
||
|
|
auto frameEnd = std::chrono::high_resolution_clock::now();
|
||
|
|
double elapsed = std::chrono::duration<double, std::milli>(frameEnd - frameStart).count();
|
||
|
|
|
||
|
|
auto nextFrameTime = start + std::chrono::microseconds(
|
||
|
|
static_cast<long long>(33333.0 * m_currentFrame));
|
||
|
|
std::this_thread::sleep_until(nextFrameTime);
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Required Changes
|
||
|
|
|
||
|
|
**PlaybackController.h**:
|
||
|
|
```cpp
|
||
|
|
class PlaybackController
|
||
|
|
{
|
||
|
|
// Add frame processor reference
|
||
|
|
void SetFrameProcessor(FrameProcessor* processor) { m_frameProcessor = processor; }
|
||
|
|
bool IsFrameProcessing() const {
|
||
|
|
return m_frameProcessor && m_frameProcessor->IsProcessing();
|
||
|
|
}
|
||
|
|
|
||
|
|
private:
|
||
|
|
FrameProcessor* m_frameProcessor = nullptr; // Non-owning pointer
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
**VideoPlayerControl2.xaml.cpp**:
|
||
|
|
```cpp
|
||
|
|
void VideoPlayerControl2::LoadVideo(...)
|
||
|
|
{
|
||
|
|
// ... existing code ...
|
||
|
|
|
||
|
|
// Link PlaybackController to FrameProcessor
|
||
|
|
m_playbackController->SetFrameProcessor(m_frameProcessor.get());
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Benefits
|
||
|
|
- ✅ Minimal code change (10-15 lines)
|
||
|
|
- ✅ Keeps existing architecture intact
|
||
|
|
- ✅ Easy to test and rollback
|
||
|
|
- ✅ No risk to other components
|
||
|
|
- ✅ Frame drops eliminated
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Approach 2: Remove Fence + Fix Timing (COMPREHENSIVE)
|
||
|
|
|
||
|
|
**Reduces complexity while fixing the root cause**
|
||
|
|
|
||
|
|
#### Changes
|
||
|
|
|
||
|
|
**1. Remove CUDA-D3D12 Fence** (complexity reduction):
|
||
|
|
|
||
|
|
Delete:
|
||
|
|
- `D3D12SurfaceHandler::SetD3D12Fence()`
|
||
|
|
- `D3D12SurfaceHandler::SignalD3D12Fence()`
|
||
|
|
- `vavcore_get_sync_fence()` API
|
||
|
|
- `VavCoreVideoFrame::sync_fence_value` field
|
||
|
|
|
||
|
|
Replace with:
|
||
|
|
```cpp
|
||
|
|
// NVDECAV1Decoder.cpp
|
||
|
|
VavCoreResult NVDECAV1Decoder::DecodeToSurface(...)
|
||
|
|
{
|
||
|
|
// ... decode logic ...
|
||
|
|
|
||
|
|
// Copy RGBA to D3D12 texture
|
||
|
|
m_d3d12Handler->CopyRGBAFrame(...);
|
||
|
|
|
||
|
|
// ✅ Simple synchronous wait (15ms max)
|
||
|
|
cuStreamSynchronize(m_stream);
|
||
|
|
|
||
|
|
// ❌ Remove fence signaling
|
||
|
|
// m_d3d12Handler->SignalD3D12Fence(...);
|
||
|
|
|
||
|
|
return VAVCORE_SUCCESS;
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**2. Fix TimingThreadLoop** (same as Approach 1)
|
||
|
|
|
||
|
|
**3. Keep D3D12 Internal Fence** (required for SwapChain):
|
||
|
|
- ✅ Keep `D3D12VideoRenderer::m_fence`
|
||
|
|
- ✅ Keep `WaitForFrameCompletion()` for triple buffering
|
||
|
|
|
||
|
|
#### Benefits
|
||
|
|
- ✅ Reduced complexity (remove External Fence)
|
||
|
|
- ✅ Fixes root cause (timing)
|
||
|
|
- ✅ Same performance (15ms CPU block < 33.33ms budget)
|
||
|
|
- ✅ Easier debugging (synchronous flow)
|
||
|
|
|
||
|
|
#### Risks
|
||
|
|
- ⚠️ More code changes (3 components)
|
||
|
|
- ⚠️ Requires more thorough testing
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Decision Matrix
|
||
|
|
|
||
|
|
| Aspect | Approach 1 (Fix Timing Only) | Approach 2 (Remove Fence + Fix) |
|
||
|
|
|--------|------------------------------|----------------------------------|
|
||
|
|
| Code Changes | 10-15 lines | ~50 lines |
|
||
|
|
| Complexity Reduction | None | Medium |
|
||
|
|
| Performance Impact | None | None (15ms < 33.33ms) |
|
||
|
|
| Risk Level | Very Low | Low |
|
||
|
|
| Testing Effort | Minimal | Moderate |
|
||
|
|
| **Recommendation** | ✅ Start here | Follow-up refactor |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation Plan
|
||
|
|
|
||
|
|
### Phase 1: Fix Timing (Immediate)
|
||
|
|
|
||
|
|
**Goal**: Stop jerky playback
|
||
|
|
|
||
|
|
**Tasks**:
|
||
|
|
1. Modify `PlaybackController::TimingThreadLoop()` to wait for frame completion
|
||
|
|
2. Add `SetFrameProcessor()` method to PlaybackController
|
||
|
|
3. Link FrameProcessor in VideoPlayerControl2
|
||
|
|
4. Test with sample video
|
||
|
|
|
||
|
|
**Files Modified**:
|
||
|
|
- `PlaybackController.h` (add 3 lines)
|
||
|
|
- `PlaybackController.cpp` (modify TimingThreadLoop, ~15 lines)
|
||
|
|
- `VideoPlayerControl2.xaml.cpp` (add 1 line)
|
||
|
|
|
||
|
|
**Expected Outcome**: Smooth 30fps playback
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Phase 2: Remove Fence (Follow-up)
|
||
|
|
|
||
|
|
**Goal**: Reduce technical debt
|
||
|
|
|
||
|
|
**Tasks**:
|
||
|
|
1. Remove External Fence from D3D12SurfaceHandler
|
||
|
|
2. Replace with `cuStreamSynchronize()` in NVDECAV1Decoder
|
||
|
|
3. Remove `vavcore_get_sync_fence()` from VavCore API
|
||
|
|
4. Remove fence wait from D3D12VideoRenderer
|
||
|
|
5. Update VavCoreVideoFrame struct
|
||
|
|
|
||
|
|
**Files Modified**:
|
||
|
|
- `D3D12SurfaceHandler.h/.cpp`
|
||
|
|
- `NVDECAV1Decoder.cpp`
|
||
|
|
- `D3D12VideoRenderer.cpp`
|
||
|
|
- `VavCore.h` (C API)
|
||
|
|
- `VideoTypes.h` (internal)
|
||
|
|
|
||
|
|
**Expected Outcome**: Same performance, simpler code
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Performance Analysis
|
||
|
|
|
||
|
|
### Frame Processing Timeline (30fps = 33.33ms budget)
|
||
|
|
|
||
|
|
**Current (Broken)**:
|
||
|
|
```
|
||
|
|
0ms : Timing thread signals frame N
|
||
|
|
0ms : ProcessFrame(N) starts
|
||
|
|
10ms : CUDA decode completes (async)
|
||
|
|
15ms : UI render starts
|
||
|
|
25ms : Frame N complete
|
||
|
|
33.33ms: Timing thread signals frame N+1 (regardless of N status!)
|
||
|
|
→ If frame N not done, FRAME DROP occurs
|
||
|
|
```
|
||
|
|
|
||
|
|
**After Fix**:
|
||
|
|
```
|
||
|
|
0ms : Timing thread signals frame N
|
||
|
|
0ms : ProcessFrame(N) starts
|
||
|
|
10ms : CUDA decode completes
|
||
|
|
15ms : UI render starts
|
||
|
|
25ms : Frame N complete
|
||
|
|
25ms : Timing thread detects completion, advances counter
|
||
|
|
33.33ms: Sleep completes, signal frame N+1
|
||
|
|
→ NO FRAME DROPS
|
||
|
|
```
|
||
|
|
|
||
|
|
### Budget Analysis
|
||
|
|
- NVDEC decode: 10-15ms
|
||
|
|
- UI render: 5-10ms
|
||
|
|
- **Total**: 15-25ms
|
||
|
|
- **Budget**: 33.33ms (30fps)
|
||
|
|
- **Margin**: 8-18ms spare time
|
||
|
|
|
||
|
|
**Conclusion**: We have plenty of time budget, synchronous processing is perfectly fine.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternative Considered and Rejected
|
||
|
|
|
||
|
|
### Fence-based Async Processing
|
||
|
|
|
||
|
|
**Why rejected**:
|
||
|
|
1. **Current architecture doesn't benefit**: Timing thread doesn't overlap work
|
||
|
|
2. **33.33ms budget sufficient**: No need for async optimization
|
||
|
|
3. **Adds complexity**: External fence requires CUDA/D3D12 interop code
|
||
|
|
4. **Doesn't fix root cause**: Timing thread still needs to wait
|
||
|
|
|
||
|
|
**When Fence would be beneficial**:
|
||
|
|
- Variable framerate (60fps+)
|
||
|
|
- Tight timing budget (< 16ms)
|
||
|
|
- Truly parallel decode/render pipeline
|
||
|
|
|
||
|
|
**Our case**: Fixed 30fps, 33.33ms budget → synchronous is simpler and sufficient
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Testing Plan
|
||
|
|
|
||
|
|
### Phase 1 Testing (Timing Fix)
|
||
|
|
|
||
|
|
**Unit Tests**: None required (integration testing sufficient)
|
||
|
|
|
||
|
|
**Integration Tests**:
|
||
|
|
1. **Smooth Playback Test**:
|
||
|
|
- Load 30fps test video
|
||
|
|
- Play for 30 seconds
|
||
|
|
- Verify no frame drops
|
||
|
|
- Verify smooth visual playback
|
||
|
|
|
||
|
|
2. **Frame Timing Test**:
|
||
|
|
- Log actual frame intervals
|
||
|
|
- Verify 33.33ms ± 2ms intervals
|
||
|
|
- Check m_framesDropped counter (should be 0)
|
||
|
|
|
||
|
|
3. **Long Duration Test**:
|
||
|
|
- Play 5-minute video
|
||
|
|
- Monitor frame drops
|
||
|
|
- Check for memory leaks
|
||
|
|
|
||
|
|
**Success Criteria**:
|
||
|
|
- ✅ Zero frame drops over 5 minutes
|
||
|
|
- ✅ Consistent 33.33ms frame intervals
|
||
|
|
- ✅ Smooth visual playback (no jerking)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
### Phase 2 Testing (Fence Removal)
|
||
|
|
|
||
|
|
**Additional Tests**:
|
||
|
|
1. **Performance Regression Test**:
|
||
|
|
- Measure decode time before/after
|
||
|
|
- Verify < 20ms average decode time
|
||
|
|
- Check CPU usage (should be same)
|
||
|
|
|
||
|
|
2. **Multi-format Test**:
|
||
|
|
- Test with different resolutions (720p, 1080p, 4K)
|
||
|
|
- Verify all formats work
|
||
|
|
|
||
|
|
3. **Decoder Compatibility Test**:
|
||
|
|
- Test NVDEC (affected by change)
|
||
|
|
- Test VPL (should be unaffected)
|
||
|
|
- Test dav1d fallback (should be unaffected)
|
||
|
|
|
||
|
|
**Success Criteria**:
|
||
|
|
- ✅ Same performance as before
|
||
|
|
- ✅ No visual regression
|
||
|
|
- ✅ All decoders work correctly
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Rollback Plan
|
||
|
|
|
||
|
|
### Phase 1 Rollback
|
||
|
|
**If timing fix causes issues**:
|
||
|
|
1. Revert PlaybackController.cpp changes
|
||
|
|
2. Revert PlaybackController.h changes
|
||
|
|
3. Revert VideoPlayerControl2.xaml.cpp link
|
||
|
|
|
||
|
|
**Time**: < 5 minutes
|
||
|
|
|
||
|
|
### Phase 2 Rollback
|
||
|
|
**If Fence removal causes issues**:
|
||
|
|
1. Git revert to Phase 1 completion
|
||
|
|
2. Keep timing fix, restore Fence code
|
||
|
|
|
||
|
|
**Time**: < 10 minutes
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation Notes
|
||
|
|
|
||
|
|
### Thread Safety Considerations
|
||
|
|
|
||
|
|
**FrameProcessor::IsProcessing()**:
|
||
|
|
- Called from Timing Thread
|
||
|
|
- Reads atomic bool `m_frameProcessing`
|
||
|
|
- ✅ Thread-safe (atomic operation)
|
||
|
|
|
||
|
|
**PlaybackController::SetFrameProcessor()**:
|
||
|
|
- Called from UI Thread during LoadVideo
|
||
|
|
- Non-owning pointer (FrameProcessor lifetime > PlaybackController)
|
||
|
|
- ✅ Safe (no concurrent access during init)
|
||
|
|
|
||
|
|
### Performance Considerations
|
||
|
|
|
||
|
|
**CPU Blocking in cuStreamSynchronize()**:
|
||
|
|
- Blocks for ~15ms
|
||
|
|
- Timing thread sleeps for 33.33ms anyway
|
||
|
|
- ✅ No performance impact (within budget)
|
||
|
|
|
||
|
|
**Wait Loop in TimingThreadLoop()**:
|
||
|
|
- 1ms sleep per iteration
|
||
|
|
- Max 100 iterations (100ms timeout)
|
||
|
|
- Typical: 25ms wait = 25 iterations
|
||
|
|
- ✅ Low CPU overhead
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Success Metrics
|
||
|
|
|
||
|
|
### Immediate (Phase 1)
|
||
|
|
- [ ] Zero frame drops during playback
|
||
|
|
- [ ] Smooth visual playback (user confirmed)
|
||
|
|
- [ ] Consistent 30fps timing (logs verified)
|
||
|
|
|
||
|
|
### Follow-up (Phase 2)
|
||
|
|
- [ ] Code complexity reduced (fence code removed)
|
||
|
|
- [ ] Same or better performance
|
||
|
|
- [ ] All decoders working (NVDEC, VPL, dav1d)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
**Related Files**:
|
||
|
|
- `vav2/platforms/windows/applications/vav2player/Vav2Player/src/Playback/PlaybackController.cpp`
|
||
|
|
- `vav2/platforms/windows/applications/vav2player/Vav2Player/src/Playback/FrameProcessor.cpp`
|
||
|
|
- `vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.cpp`
|
||
|
|
- `vav2/platforms/windows/applications/vav2player/Vav2Player/src/Rendering/D3D12VideoRenderer.cpp`
|
||
|
|
|
||
|
|
**Previous Attempts**:
|
||
|
|
- Fence implementation (2025-10-07): Did not fix jerky playback
|
||
|
|
- Timing analysis: Identified root cause in PlaybackController
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Approval
|
||
|
|
|
||
|
|
**Approved by**: [User]
|
||
|
|
**Date**: [Pending]
|
||
|
|
**Implementation Start**: [Pending]
|