Files
video-v1/vav2/docs/working/Playback_Timing_Fix_Design.md

445 lines
11 KiB
Markdown
Raw Normal View History

2025-10-07 14:53:33 +09:00
# Playback Timing Fix Design
**Date**: 2025-10-07
**Author**: Claude Code
**Status**: Design Phase
## Problem Statement
### Symptom
Video playback is jerky ("통통 튄다") at 30fps despite NVDEC hardware acceleration working correctly.
### Root Cause Analysis
**NOT the problem**:
- ❌ CUDA synchronization (cuStreamSynchronize is fine)
- ❌ Fence implementation (works correctly but unnecessary)
- ❌ GPU decode performance (10-15ms, well within budget)
**ACTUAL problem**:
-**PlaybackController::TimingThreadLoop()** does NOT wait for frame processing completion
- ✅ Timing thread sends "next frame" signal every 33.33ms regardless of previous frame status
- ✅ FrameProcessor drops frames when previous frame still processing (m_frameProcessing flag)
### Evidence
**PlaybackController.cpp:249-275** - Current broken implementation:
```cpp
void PlaybackController::TimingThreadLoop()
{
while (!m_shouldStopTiming && m_isPlaying) {
// 1. Signal frame processing
if (m_frameReadyCallback) {
m_frameReadyCallback(); // → ProcessFrame()
}
// 2. ❌ IMMEDIATE advance without waiting!
m_currentFrame++;
m_currentTime = m_currentFrame / m_frameRate;
// 3. Sleep until next frame
std::this_thread::sleep_until(nextFrame);
}
}
```
**FrameProcessor.cpp:47-53** - Frame drop mechanism:
```cpp
bool FrameProcessor::ProcessFrame(...)
{
// Check if previous frame still processing
bool expected = false;
if (!m_frameProcessing.compare_exchange_strong(expected, true)) {
m_framesDropped++; // ← This causes jerky playback!
return false;
}
// ... decode and render ...
}
```
---
## Solution Design
### Approach 1: Fix TimingThreadLoop (RECOMMENDED)
**Minimal change, keeps existing architecture**
#### Modified PlaybackController.cpp
```cpp
void PlaybackController::TimingThreadLoop()
{
auto start = std::chrono::high_resolution_clock::now();
while (!m_shouldStopTiming && m_isPlaying) {
auto frameStart = std::chrono::high_resolution_clock::now();
// 1. Signal frame processing
if (m_frameReadyCallback) {
m_frameReadyCallback();
}
// 2. ✅ WAIT for frame processing completion (max 100ms timeout)
int waitCount = 0;
while (IsFrameProcessing() && waitCount < 100) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
waitCount++;
}
if (waitCount >= 100) {
LOGF_WARNING("[PlaybackController] Frame processing timeout");
}
// 3. Advance frame counter
m_currentFrame++;
m_currentTime = m_currentFrame / m_frameRate;
// 4. Sleep for remaining time to maintain 30fps
auto frameEnd = std::chrono::high_resolution_clock::now();
double elapsed = std::chrono::duration<double, std::milli>(frameEnd - frameStart).count();
auto nextFrameTime = start + std::chrono::microseconds(
static_cast<long long>(33333.0 * m_currentFrame));
std::this_thread::sleep_until(nextFrameTime);
}
}
```
#### Required Changes
**PlaybackController.h**:
```cpp
class PlaybackController
{
// Add frame processor reference
void SetFrameProcessor(FrameProcessor* processor) { m_frameProcessor = processor; }
bool IsFrameProcessing() const {
return m_frameProcessor && m_frameProcessor->IsProcessing();
}
private:
FrameProcessor* m_frameProcessor = nullptr; // Non-owning pointer
};
```
**VideoPlayerControl2.xaml.cpp**:
```cpp
void VideoPlayerControl2::LoadVideo(...)
{
// ... existing code ...
// Link PlaybackController to FrameProcessor
m_playbackController->SetFrameProcessor(m_frameProcessor.get());
}
```
#### Benefits
- ✅ Minimal code change (10-15 lines)
- ✅ Keeps existing architecture intact
- ✅ Easy to test and rollback
- ✅ No risk to other components
- ✅ Frame drops eliminated
---
### Approach 2: Remove Fence + Fix Timing (COMPREHENSIVE)
**Reduces complexity while fixing the root cause**
#### Changes
**1. Remove CUDA-D3D12 Fence** (complexity reduction):
Delete:
- `D3D12SurfaceHandler::SetD3D12Fence()`
- `D3D12SurfaceHandler::SignalD3D12Fence()`
- `vavcore_get_sync_fence()` API
- `VavCoreVideoFrame::sync_fence_value` field
Replace with:
```cpp
// NVDECAV1Decoder.cpp
VavCoreResult NVDECAV1Decoder::DecodeToSurface(...)
{
// ... decode logic ...
// Copy RGBA to D3D12 texture
m_d3d12Handler->CopyRGBAFrame(...);
// ✅ Simple synchronous wait (15ms max)
cuStreamSynchronize(m_stream);
// ❌ Remove fence signaling
// m_d3d12Handler->SignalD3D12Fence(...);
return VAVCORE_SUCCESS;
}
```
**2. Fix TimingThreadLoop** (same as Approach 1)
**3. Keep D3D12 Internal Fence** (required for SwapChain):
- ✅ Keep `D3D12VideoRenderer::m_fence`
- ✅ Keep `WaitForFrameCompletion()` for triple buffering
#### Benefits
- ✅ Reduced complexity (remove External Fence)
- ✅ Fixes root cause (timing)
- ✅ Same performance (15ms CPU block < 33.33ms budget)
- ✅ Easier debugging (synchronous flow)
#### Risks
- ⚠️ More code changes (3 components)
- ⚠️ Requires more thorough testing
---
## Decision Matrix
| Aspect | Approach 1 (Fix Timing Only) | Approach 2 (Remove Fence + Fix) |
|--------|------------------------------|----------------------------------|
| Code Changes | 10-15 lines | ~50 lines |
| Complexity Reduction | None | Medium |
| Performance Impact | None | None (15ms < 33.33ms) |
| Risk Level | Very Low | Low |
| Testing Effort | Minimal | Moderate |
| **Recommendation** | ✅ Start here | Follow-up refactor |
---
## Implementation Plan
### Phase 1: Fix Timing (Immediate)
**Goal**: Stop jerky playback
**Tasks**:
1. Modify `PlaybackController::TimingThreadLoop()` to wait for frame completion
2. Add `SetFrameProcessor()` method to PlaybackController
3. Link FrameProcessor in VideoPlayerControl2
4. Test with sample video
**Files Modified**:
- `PlaybackController.h` (add 3 lines)
- `PlaybackController.cpp` (modify TimingThreadLoop, ~15 lines)
- `VideoPlayerControl2.xaml.cpp` (add 1 line)
**Expected Outcome**: Smooth 30fps playback
---
### Phase 2: Remove Fence (Follow-up)
**Goal**: Reduce technical debt
**Tasks**:
1. Remove External Fence from D3D12SurfaceHandler
2. Replace with `cuStreamSynchronize()` in NVDECAV1Decoder
3. Remove `vavcore_get_sync_fence()` from VavCore API
4. Remove fence wait from D3D12VideoRenderer
5. Update VavCoreVideoFrame struct
**Files Modified**:
- `D3D12SurfaceHandler.h/.cpp`
- `NVDECAV1Decoder.cpp`
- `D3D12VideoRenderer.cpp`
- `VavCore.h` (C API)
- `VideoTypes.h` (internal)
**Expected Outcome**: Same performance, simpler code
---
## Performance Analysis
### Frame Processing Timeline (30fps = 33.33ms budget)
**Current (Broken)**:
```
0ms : Timing thread signals frame N
0ms : ProcessFrame(N) starts
10ms : CUDA decode completes (async)
15ms : UI render starts
25ms : Frame N complete
33.33ms: Timing thread signals frame N+1 (regardless of N status!)
→ If frame N not done, FRAME DROP occurs
```
**After Fix**:
```
0ms : Timing thread signals frame N
0ms : ProcessFrame(N) starts
10ms : CUDA decode completes
15ms : UI render starts
25ms : Frame N complete
25ms : Timing thread detects completion, advances counter
33.33ms: Sleep completes, signal frame N+1
→ NO FRAME DROPS
```
### Budget Analysis
- NVDEC decode: 10-15ms
- UI render: 5-10ms
- **Total**: 15-25ms
- **Budget**: 33.33ms (30fps)
- **Margin**: 8-18ms spare time
**Conclusion**: We have plenty of time budget, synchronous processing is perfectly fine.
---
## Alternative Considered and Rejected
### Fence-based Async Processing
**Why rejected**:
1. **Current architecture doesn't benefit**: Timing thread doesn't overlap work
2. **33.33ms budget sufficient**: No need for async optimization
3. **Adds complexity**: External fence requires CUDA/D3D12 interop code
4. **Doesn't fix root cause**: Timing thread still needs to wait
**When Fence would be beneficial**:
- Variable framerate (60fps+)
- Tight timing budget (< 16ms)
- Truly parallel decode/render pipeline
**Our case**: Fixed 30fps, 33.33ms budget → synchronous is simpler and sufficient
---
## Testing Plan
### Phase 1 Testing (Timing Fix)
**Unit Tests**: None required (integration testing sufficient)
**Integration Tests**:
1. **Smooth Playback Test**:
- Load 30fps test video
- Play for 30 seconds
- Verify no frame drops
- Verify smooth visual playback
2. **Frame Timing Test**:
- Log actual frame intervals
- Verify 33.33ms ± 2ms intervals
- Check m_framesDropped counter (should be 0)
3. **Long Duration Test**:
- Play 5-minute video
- Monitor frame drops
- Check for memory leaks
**Success Criteria**:
- ✅ Zero frame drops over 5 minutes
- ✅ Consistent 33.33ms frame intervals
- ✅ Smooth visual playback (no jerking)
---
### Phase 2 Testing (Fence Removal)
**Additional Tests**:
1. **Performance Regression Test**:
- Measure decode time before/after
- Verify < 20ms average decode time
- Check CPU usage (should be same)
2. **Multi-format Test**:
- Test with different resolutions (720p, 1080p, 4K)
- Verify all formats work
3. **Decoder Compatibility Test**:
- Test NVDEC (affected by change)
- Test VPL (should be unaffected)
- Test dav1d fallback (should be unaffected)
**Success Criteria**:
- ✅ Same performance as before
- ✅ No visual regression
- ✅ All decoders work correctly
---
## Rollback Plan
### Phase 1 Rollback
**If timing fix causes issues**:
1. Revert PlaybackController.cpp changes
2. Revert PlaybackController.h changes
3. Revert VideoPlayerControl2.xaml.cpp link
**Time**: < 5 minutes
### Phase 2 Rollback
**If Fence removal causes issues**:
1. Git revert to Phase 1 completion
2. Keep timing fix, restore Fence code
**Time**: < 10 minutes
---
## Implementation Notes
### Thread Safety Considerations
**FrameProcessor::IsProcessing()**:
- Called from Timing Thread
- Reads atomic bool `m_frameProcessing`
- ✅ Thread-safe (atomic operation)
**PlaybackController::SetFrameProcessor()**:
- Called from UI Thread during LoadVideo
- Non-owning pointer (FrameProcessor lifetime > PlaybackController)
- ✅ Safe (no concurrent access during init)
### Performance Considerations
**CPU Blocking in cuStreamSynchronize()**:
- Blocks for ~15ms
- Timing thread sleeps for 33.33ms anyway
- ✅ No performance impact (within budget)
**Wait Loop in TimingThreadLoop()**:
- 1ms sleep per iteration
- Max 100 iterations (100ms timeout)
- Typical: 25ms wait = 25 iterations
- ✅ Low CPU overhead
---
## Success Metrics
### Immediate (Phase 1)
- [ ] Zero frame drops during playback
- [ ] Smooth visual playback (user confirmed)
- [ ] Consistent 30fps timing (logs verified)
### Follow-up (Phase 2)
- [ ] Code complexity reduced (fence code removed)
- [ ] Same or better performance
- [ ] All decoders working (NVDEC, VPL, dav1d)
---
## References
**Related Files**:
- `vav2/platforms/windows/applications/vav2player/Vav2Player/src/Playback/PlaybackController.cpp`
- `vav2/platforms/windows/applications/vav2player/Vav2Player/src/Playback/FrameProcessor.cpp`
- `vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.cpp`
- `vav2/platforms/windows/applications/vav2player/Vav2Player/src/Rendering/D3D12VideoRenderer.cpp`
**Previous Attempts**:
- Fence implementation (2025-10-07): Did not fix jerky playback
- Timing analysis: Identified root cause in PlaybackController
---
## Approval
**Approved by**: [User]
**Date**: [Pending]
**Implementation Start**: [Pending]