587 lines
17 KiB
Markdown
587 lines
17 KiB
Markdown
# GlobalFrameBudget Design Document
|
||
|
||
## 1. Overview
|
||
|
||
### Purpose
|
||
|
||
**Problem**: When 4 VideoPlayerControl2 instances play simultaneously, they all hit the initial buffering bottleneck (frames 16-18) at the same time, causing NVDEC queue overflow with QUEUE_DELAY of 35-42ms (exceeding the 33.33ms budget for 30fps).
|
||
|
||
**Solution**: Implement a global frame processing budget manager that limits concurrent frame processing during the bottleneck phase, reducing the load from 4 players to 3 maximum, bringing QUEUE_DELAY down to ~28-33ms (within budget).
|
||
|
||
### Key Constraints
|
||
|
||
- **NVDEC DPB_SIZE = 16**: Required by AV1 sequence header (min_num_decode_surfaces=9 for test video, up to 12+ for complex GOPs)
|
||
- **INITIAL_BUFFERING = 16**: NVDEC requires full DPB filling for B-frame reordering
|
||
- **Cannot reduce buffer sizes**: Tested DPB_SIZE=4/8 both crash with "Invalid CurrPicIdx"
|
||
- **Must maintain sync**: All 4 players should remain synchronized after initial buffering
|
||
|
||
---
|
||
|
||
## 2. Architecture Diagram
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ GlobalFrameBudget │
|
||
│ (Singleton) │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ State: │
|
||
│ - m_activeFrames: atomic<int> (current active frames) │
|
||
│ - MAX_CONCURRENT_FRAMES = 3 (bottleneck phase limit) │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ Public API: │
|
||
│ + TryAcquireFrameSlot(playerId, frameNumber) → bool │
|
||
│ + ReleaseFrameSlot(playerId) → void │
|
||
│ + GetActiveFrameCount() → int │
|
||
│ + GetStatistics() → BudgetStatistics │
|
||
│ + ResetStatistics() → void │
|
||
└─────────────────────────────────────────────────────────────┘
|
||
▲
|
||
│ uses
|
||
┌────────────────┼────────────────┐
|
||
│ │ │
|
||
┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐
|
||
│ Player#0 │ │ Player#1 │ │ Player#2 │ ...
|
||
│FrameProc │ │FrameProc │ │FrameProc │
|
||
└──────────┘ └──────────┘ └──────────┘
|
||
|
||
Call Flow:
|
||
1. FrameProcessor::ProcessFrame()
|
||
→ Check Phase == TRIPLE_FILLING?
|
||
2. YES → TryAcquireFrameSlot()
|
||
→ m_activeFrames < 3?
|
||
3. YES → Proceed with decode
|
||
→ ReleaseFrameSlot() after render completes
|
||
4. NO → Skip frame (m_framesDropped++)
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Processing Phases
|
||
|
||
FrameProcessor operates in 3 distinct phases:
|
||
|
||
```cpp
|
||
enum class Phase {
|
||
INITIAL_BUFFERING, // frames 0-15: NULL surface submission to NVDEC DPB
|
||
TRIPLE_FILLING, // frames 16-18: triple buffer filling (BOTTLENECK)
|
||
NORMAL_PLAYBACK // frames 19+: stable rendering
|
||
};
|
||
```
|
||
|
||
### Phase Details
|
||
|
||
| Phase | Frame Range | Behavior | QUEUE_DELAY |
|
||
|--------------------|-------------|-----------------------------------|------------------|
|
||
| INITIAL_BUFFERING | 0-15 | NULL surface, no render | 6-15ms (stable) |
|
||
| TRIPLE_FILLING | 16-18 | Fill triple buffer, first renders | 35-42ms (SPIKE) |
|
||
| NORMAL_PLAYBACK | 19+ | Steady state rendering | 6-22ms (stable) |
|
||
|
||
**GlobalFrameBudget is only active during TRIPLE_FILLING phase.**
|
||
|
||
---
|
||
|
||
## 4. Class Interface
|
||
|
||
### Header: GlobalFrameBudget.h
|
||
|
||
```cpp
|
||
namespace Vav2Player {
|
||
|
||
class GlobalFrameBudget
|
||
{
|
||
public:
|
||
static GlobalFrameBudget& GetInstance();
|
||
|
||
// Acquire permission to process frame
|
||
// Returns: true if slot acquired, false if budget limit reached
|
||
bool TryAcquireFrameSlot(int playerId, uint64_t frameNumber);
|
||
|
||
// Release slot after processing complete
|
||
void ReleaseFrameSlot(int playerId);
|
||
|
||
// Query current state
|
||
int GetActiveFrameCount() const { return m_activeFrames.load(); }
|
||
|
||
// Statistics
|
||
struct BudgetStatistics {
|
||
uint64_t totalAcquireAttempts;
|
||
uint64_t successfulAcquires;
|
||
uint64_t rejectedAcquires;
|
||
double rejectionRate;
|
||
};
|
||
|
||
BudgetStatistics GetStatistics() const;
|
||
void ResetStatistics();
|
||
|
||
private:
|
||
GlobalFrameBudget() = default;
|
||
~GlobalFrameBudget() = default;
|
||
|
||
// Disable copy/move
|
||
GlobalFrameBudget(const GlobalFrameBudget&) = delete;
|
||
GlobalFrameBudget& operator=(const GlobalFrameBudget&) = delete;
|
||
|
||
// Configuration
|
||
static constexpr int MAX_CONCURRENT_FRAMES_BOTTLENECK = 3;
|
||
|
||
// State
|
||
std::atomic<int> m_activeFrames{0};
|
||
|
||
// Statistics
|
||
std::atomic<uint64_t> m_totalAcquireAttempts{0};
|
||
std::atomic<uint64_t> m_successfulAcquires{0};
|
||
std::atomic<uint64_t> m_rejectedAcquires{0};
|
||
};
|
||
|
||
} // namespace Vav2Player
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Integration with FrameProcessor
|
||
|
||
### FrameProcessor Changes
|
||
|
||
**FrameProcessor.h additions:**
|
||
|
||
```cpp
|
||
class FrameProcessor
|
||
{
|
||
public:
|
||
// Processing phase query
|
||
enum class Phase {
|
||
INITIAL_BUFFERING,
|
||
TRIPLE_FILLING,
|
||
NORMAL_PLAYBACK
|
||
};
|
||
|
||
Phase GetCurrentPhase() const;
|
||
|
||
private:
|
||
// Track if budget slot was acquired (for proper release)
|
||
std::atomic<bool> m_budgetSlotAcquired{false};
|
||
};
|
||
```
|
||
|
||
**FrameProcessor.cpp integration:**
|
||
|
||
```cpp
|
||
bool FrameProcessor::ProcessFrame(VavCorePlayer* player,
|
||
std::function<void(bool)> onComplete)
|
||
{
|
||
// Existing: Skip if previous frame still processing
|
||
if (m_frameProcessing.load()) {
|
||
m_framesDropped++;
|
||
return false;
|
||
}
|
||
|
||
Phase currentPhase = GetCurrentPhase();
|
||
|
||
// NEW: Apply GlobalFrameBudget during bottleneck phase
|
||
if (currentPhase == Phase::TRIPLE_FILLING) {
|
||
if (!GlobalFrameBudget::GetInstance().TryAcquireFrameSlot(
|
||
m_playerInstanceId, m_framesDecoded)) {
|
||
|
||
LOGF_DEBUG("[Player#%d] Frame %llu SKIPPED (global budget limit)",
|
||
m_playerInstanceId, m_framesDecoded.load());
|
||
m_framesDropped++;
|
||
return false;
|
||
}
|
||
m_budgetSlotAcquired = true;
|
||
}
|
||
|
||
m_frameProcessing = true;
|
||
|
||
// ... existing decode logic ...
|
||
|
||
// UI thread callback with budget release
|
||
m_dispatcherQueue.TryEnqueue([this, renderIndex, onComplete]() {
|
||
bool renderSuccess = m_renderer->RenderFrame(renderIndex);
|
||
|
||
// NEW: Release budget slot after render complete
|
||
if (m_budgetSlotAcquired.load()) {
|
||
GlobalFrameBudget::GetInstance().ReleaseFrameSlot(m_playerInstanceId);
|
||
m_budgetSlotAcquired = false;
|
||
}
|
||
|
||
m_frameProcessing = false;
|
||
onComplete(renderSuccess);
|
||
});
|
||
|
||
m_framesDecoded++;
|
||
return true;
|
||
}
|
||
|
||
FrameProcessor::Phase FrameProcessor::GetCurrentPhase() const
|
||
{
|
||
uint64_t decoded = m_framesDecoded.load();
|
||
|
||
if (decoded < VAVCORE_NVDEC_INITIAL_BUFFERING) {
|
||
return Phase::INITIAL_BUFFERING;
|
||
}
|
||
else if (decoded < VAVCORE_NVDEC_INITIAL_BUFFERING + VAV2PLAYER_TRIPLE_BUFFER_SIZE) {
|
||
return Phase::TRIPLE_FILLING;
|
||
}
|
||
else {
|
||
return Phase::NORMAL_PLAYBACK;
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Simulation Scenario
|
||
|
||
### Timeline with 4 Players
|
||
|
||
```
|
||
Initial State: m_activeFrames = 0, MAX = 3
|
||
|
||
t=0ms: All 4 players call Play(), start frame 0
|
||
|
||
Phase 1 (frames 0-15): INITIAL_BUFFERING
|
||
- All 4 players process normally
|
||
- No GlobalFrameBudget involvement
|
||
- QUEUE_DELAY: 6-15ms (stable)
|
||
|
||
t=533ms: All 4 players reach frame 16
|
||
|
||
Phase 2 (frames 16-18): TRIPLE_FILLING (BOTTLENECK)
|
||
|
||
Frame 16:
|
||
t=533ms: Player#0 TryAcquire → m_activeFrames: 0→1 ✅
|
||
t=533ms: Player#1 TryAcquire → m_activeFrames: 1→2 ✅
|
||
t=533ms: Player#2 TryAcquire → m_activeFrames: 2→3 ✅
|
||
t=533ms: Player#3 TryAcquire → REJECTED (3 >= 3) ❌ [FRAME SKIPPED]
|
||
|
||
t=543ms: Player#0 render complete → Release → m_activeFrames: 3→2
|
||
t=543ms: Player#3 ProcessFrame (retry frame 16) → TryAcquire → 2→3 ✅
|
||
|
||
Frame 17:
|
||
Similar pattern: One player skips, retries after slot release
|
||
|
||
Frame 18:
|
||
Similar pattern: One player skips, retries after slot release
|
||
|
||
t=633ms: All 4 players reach frame 19
|
||
|
||
Phase 3 (frames 19+): NORMAL_PLAYBACK
|
||
- All 4 players process normally
|
||
- No GlobalFrameBudget involvement
|
||
- QUEUE_DELAY: 6-22ms (stable)
|
||
|
||
Result:
|
||
- Bottleneck phase: 4 players → max 3 concurrent
|
||
- NVDEC queue load: 25% reduction
|
||
- QUEUE_DELAY: 35-42ms → ~28-33ms (within 33.33ms budget)
|
||
- Player synchronization: Maintained (skipped frames retry immediately)
|
||
```
|
||
|
||
---
|
||
|
||
## 7. Thread Safety
|
||
|
||
### Lock-Free Design
|
||
|
||
All operations use atomic primitives for thread safety without mutexes:
|
||
|
||
```cpp
|
||
bool GlobalFrameBudget::TryAcquireFrameSlot(int playerId, uint64_t frameNumber)
|
||
{
|
||
// Atomic read
|
||
int current = m_activeFrames.load(std::memory_order_acquire);
|
||
|
||
// Check limit
|
||
if (current >= MAX_CONCURRENT_FRAMES_BOTTLENECK) {
|
||
return false; // Fast path rejection
|
||
}
|
||
|
||
// Lock-free CAS loop
|
||
while (current < MAX_CONCURRENT_FRAMES_BOTTLENECK) {
|
||
if (m_activeFrames.compare_exchange_weak(current, current + 1,
|
||
std::memory_order_acq_rel,
|
||
std::memory_order_acquire)) {
|
||
return true; // Successfully acquired
|
||
}
|
||
// compare_exchange_weak failed - current was updated, retry
|
||
}
|
||
|
||
return false; // Budget exhausted during retry
|
||
}
|
||
|
||
void GlobalFrameBudget::ReleaseFrameSlot(int playerId)
|
||
{
|
||
// Atomic decrement
|
||
m_activeFrames.fetch_sub(1, std::memory_order_acq_rel);
|
||
}
|
||
```
|
||
|
||
### Memory Ordering Rationale
|
||
|
||
- **acquire/release**: Ensures proper synchronization between acquire and release operations
|
||
- **relaxed** (statistics): Non-critical counters, accuracy not critical for correctness
|
||
|
||
---
|
||
|
||
## 8. Performance Impact
|
||
|
||
### Expected Improvements
|
||
|
||
**Before (No GlobalFrameBudget):**
|
||
- Frames 16-18: All 4 players decode simultaneously
|
||
- NVDEC queue: 4 concurrent submissions
|
||
- QUEUE_DELAY: 35-42ms (exceeds 33.33ms budget)
|
||
- Result: Stutter/frame drops
|
||
|
||
**After (With GlobalFrameBudget):**
|
||
- Frames 16-18: Max 3 players decode concurrently
|
||
- NVDEC queue: 3 concurrent submissions (25% reduction)
|
||
- QUEUE_DELAY: ~28-33ms (within 33.33ms budget)
|
||
- Result: Smooth playback
|
||
|
||
### Measured Metrics (from time.log)
|
||
|
||
| Metric | Without Budget | With Budget (Expected) |
|
||
|-----------------------|----------------|------------------------|
|
||
| QUEUE_DELAY (frame 16)| 35-42ms | 28-33ms |
|
||
| Frames dropped | 0-2 | 1-3 (brief skip) |
|
||
| Total playback time | ~600ms | ~650ms (+8% initially) |
|
||
| Sync after frame 19 | Perfect | Perfect |
|
||
|
||
---
|
||
|
||
## 9. Statistics and Monitoring
|
||
|
||
### BudgetStatistics Structure
|
||
|
||
```cpp
|
||
struct BudgetStatistics {
|
||
uint64_t totalAcquireAttempts; // Total TryAcquireFrameSlot() calls
|
||
uint64_t successfulAcquires; // Slots acquired successfully
|
||
uint64_t rejectedAcquires; // Rejections due to budget limit
|
||
double rejectionRate; // rejectedAcquires / totalAcquireAttempts
|
||
};
|
||
```
|
||
|
||
### Usage Example
|
||
|
||
```cpp
|
||
// After playback test
|
||
auto stats = GlobalFrameBudget::GetInstance().GetStatistics();
|
||
|
||
LOGF_INFO("GlobalFrameBudget Statistics:");
|
||
LOGF_INFO(" Total attempts: %llu", stats.totalAcquireAttempts);
|
||
LOGF_INFO(" Successful: %llu", stats.successfulAcquires);
|
||
LOGF_INFO(" Rejected: %llu", stats.rejectedAcquires);
|
||
LOGF_INFO(" Rejection rate: %.2f%%", stats.rejectionRate * 100.0);
|
||
|
||
// Expected results with 4 players:
|
||
// Total attempts: ~12 (4 players × 3 frames)
|
||
// Successful: ~12 (all eventually succeed)
|
||
// Rejected: ~3-6 (transient rejections, immediate retry)
|
||
// Rejection rate: ~25-50% (acceptable due to immediate retry)
|
||
```
|
||
|
||
---
|
||
|
||
## 10. Configuration Tuning
|
||
|
||
### MAX_CONCURRENT_FRAMES_BOTTLENECK
|
||
|
||
**Current value: 3**
|
||
|
||
Rationale:
|
||
- 4 concurrent → 35-42ms QUEUE_DELAY (exceeds budget)
|
||
- 3 concurrent → ~28-33ms QUEUE_DELAY (within budget)
|
||
- 2 concurrent → Would be too conservative, longer total time
|
||
|
||
**Tuning guide:**
|
||
- Increase if QUEUE_DELAY still exceeds budget
|
||
- Decrease if want more aggressive load reduction
|
||
- Monitor via BudgetStatistics.rejectionRate
|
||
|
||
### Phase Detection Thresholds
|
||
|
||
**Current values:**
|
||
- INITIAL_BUFFERING: frames 0-15 (VAVCORE_NVDEC_INITIAL_BUFFERING)
|
||
- TRIPLE_FILLING: frames 16-18 (+VAV2PLAYER_TRIPLE_BUFFER_SIZE)
|
||
- NORMAL_PLAYBACK: frames 19+
|
||
|
||
**Tuning guide:**
|
||
- Extend TRIPLE_FILLING range if stuttering persists after frame 18
|
||
- Reduce if budget overhead is unnecessary
|
||
|
||
---
|
||
|
||
## 11. Error Handling
|
||
|
||
### Slot Leak Prevention
|
||
|
||
**Problem**: If ReleaseFrameSlot() is not called, m_activeFrames never decrements, causing permanent budget exhaustion.
|
||
|
||
**Solution**: Sanity check in ReleaseFrameSlot():
|
||
|
||
```cpp
|
||
void GlobalFrameBudget::ReleaseFrameSlot(int playerId)
|
||
{
|
||
int previous = m_activeFrames.fetch_sub(1, std::memory_order_acq_rel);
|
||
|
||
// Sanity check
|
||
if (previous <= 0) {
|
||
LOGF_ERROR("[GlobalFrameBudget] Player#%d attempted to release but m_activeFrames was %d!",
|
||
playerId, previous);
|
||
}
|
||
}
|
||
```
|
||
|
||
### Proper Cleanup Pattern
|
||
|
||
```cpp
|
||
// In FrameProcessor::ProcessFrame()
|
||
bool slotAcquired = false;
|
||
|
||
if (currentPhase == Phase::TRIPLE_FILLING) {
|
||
if (GlobalFrameBudget::GetInstance().TryAcquireFrameSlot(...)) {
|
||
slotAcquired = true;
|
||
} else {
|
||
return false; // Skip frame
|
||
}
|
||
}
|
||
|
||
// Ensure release happens in all code paths
|
||
auto cleanup = [&]() {
|
||
if (slotAcquired) {
|
||
GlobalFrameBudget::GetInstance().ReleaseFrameSlot(...);
|
||
}
|
||
};
|
||
|
||
// Normal path: UI thread callback
|
||
m_dispatcherQueue.TryEnqueue([cleanup, ...]() {
|
||
// ... render ...
|
||
cleanup();
|
||
});
|
||
|
||
// Error path: immediate cleanup
|
||
if (decodeError) {
|
||
cleanup();
|
||
return false;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 12. Future Enhancements
|
||
|
||
### Adaptive Budget
|
||
|
||
Dynamically adjust MAX_CONCURRENT_FRAMES based on measured QUEUE_DELAY:
|
||
|
||
```cpp
|
||
class AdaptiveFrameBudget : public GlobalFrameBudget
|
||
{
|
||
private:
|
||
std::atomic<int> m_maxConcurrent{3}; // Dynamic limit
|
||
|
||
public:
|
||
void UpdateBudget(double measuredQueueDelay) {
|
||
if (measuredQueueDelay > 35.0) {
|
||
// Too high, reduce concurrency
|
||
m_maxConcurrent.store(std::max(1, m_maxConcurrent.load() - 1));
|
||
} else if (measuredQueueDelay < 25.0) {
|
||
// Safe margin, can increase
|
||
m_maxConcurrent.store(std::min(4, m_maxConcurrent.load() + 1));
|
||
}
|
||
}
|
||
};
|
||
```
|
||
|
||
### Per-Decoder Budget
|
||
|
||
Different decoders may have different queue capacities:
|
||
|
||
```cpp
|
||
struct DecoderBudget {
|
||
int maxConcurrentNVDEC = 3;
|
||
int maxConcurrentVPL = 4;
|
||
int maxConcurrentAMF = 3;
|
||
};
|
||
```
|
||
|
||
### Priority-Based Slot Allocation
|
||
|
||
Assign priority to players for fair scheduling:
|
||
|
||
```cpp
|
||
bool TryAcquireFrameSlot(int playerId, int priority, uint64_t frameNumber);
|
||
```
|
||
|
||
---
|
||
|
||
## 13. Testing Plan
|
||
|
||
### Unit Tests
|
||
|
||
1. **Basic slot acquisition:**
|
||
- Acquire 3 slots → all succeed
|
||
- Acquire 4th slot → fail
|
||
|
||
2. **Slot release:**
|
||
- Acquire 3 → release 1 → acquire 1 more → succeed
|
||
|
||
3. **Statistics tracking:**
|
||
- Verify counters increment correctly
|
||
|
||
### Integration Tests
|
||
|
||
1. **Single player:**
|
||
- GlobalFrameBudget should not interfere
|
||
- Verify normal playback
|
||
|
||
2. **4 simultaneous players:**
|
||
- Monitor QUEUE_DELAY during frames 16-18
|
||
- Verify stays within 33.33ms budget
|
||
- Check synchronization after frame 19
|
||
|
||
3. **Stress test:**
|
||
- 8 simultaneous players
|
||
- Verify budget prevents complete stall
|
||
|
||
### Performance Benchmarks
|
||
|
||
Compare time.log with/without GlobalFrameBudget:
|
||
- QUEUE_DELAY distribution
|
||
- Total frame drop count
|
||
- Playback smoothness (subjective)
|
||
|
||
---
|
||
|
||
## 14. Implementation Checklist
|
||
|
||
- [x] Create GlobalFrameBudget.h
|
||
- [x] Implement GlobalFrameBudget.cpp
|
||
- [ ] Add Phase enum to FrameProcessor.h
|
||
- [ ] Implement GetCurrentPhase() in FrameProcessor.cpp
|
||
- [ ] Integrate TryAcquireFrameSlot() in ProcessFrame()
|
||
- [ ] Integrate ReleaseFrameSlot() in UI callback
|
||
- [ ] Add m_budgetSlotAcquired tracking
|
||
- [ ] Add GlobalFrameBudget.cpp to Vav2Player.vcxproj
|
||
- [ ] Build and verify compilation
|
||
- [ ] Test with 4 simultaneous players
|
||
- [ ] Analyze time.log for improvements
|
||
- [ ] Document final results
|
||
|
||
---
|
||
|
||
## 15. References
|
||
|
||
- **Original Issue**: time.log analysis showing 35-42ms QUEUE_DELAY spikes
|
||
- **Root Cause**: 4 players simultaneously hitting frames 16-18 (triple buffer filling)
|
||
- **DPB Size Investigation**: DPB_SIZE=4/8 crashes, must remain 16
|
||
- **NVDEC Spec**: min_num_decode_surfaces=9 from AV1 sequence header (codec spec, not H/W)
|
||
|
||
---
|
||
|
||
*Document Version: 1.0*
|
||
*Last Updated: 2025-10-11*
|
||
*Author: Claude Code*
|