Files
video-v1/vav2/docs/working/MediaCodec_Async_Decoding_Design.md

358 lines
11 KiB
Markdown
Raw Normal View History

# MediaCodec Asynchronous Decoding Design
## Document Information
- **Created**: 2025-10-12
- **Status**: Implementation Required
- **Target Platform**: Android (NDK 26)
- **Use Case**: Simultaneous 4K video playback (4 instances)
---
## Problem Statement
### Current Implementation (Synchronous Mode)
```cpp
// MediaCodecAV1Decoder::DecodeToSurface (current)
bool ProcessInputBuffer(data, size) {
ssize_t index = AMediaCodec_dequeueInputBuffer(10000); // 10ms blocking
// ... copy data ...
AMediaCodec_queueInputBuffer(...);
}
bool ProcessOutputBuffer(VideoFrame& frame) {
AMediaCodecBufferInfo info;
ssize_t index = AMediaCodec_dequeueOutputBuffer(&info, 10000); // 10ms blocking
// ... process frame ...
}
```
**Bottleneck for 4x Simultaneous 4K Playback:**
- Each decoder thread blocks 10-20ms per frame on dequeue operations
- 4 threads × 10-20ms blocking = significant CPU idle time
- Thread contention increases frame drop probability
- Poor CPU utilization during blocking periods
### Performance Impact (Estimated)
| Scenario | Sync Mode | Async Mode |
|----------|-----------|------------|
| Single 4K video | 30fps ✅ | 30fps ✅ |
| 4x 4K videos | 20-25fps ⚠️ | 28-30fps ✅ |
| CPU utilization | 40-50% (blocking) | 70-80% (event-driven) |
| Thread blocking | 10-20ms/frame | 0ms (callback) |
---
## Asynchronous Mode Benefits
### 1. Reduced Thread Blocking
```cpp
// Async mode: Non-blocking input
AMediaCodec_queueInputBuffer(...); // Returns immediately
// Output handled by callback (separate thread)
onAsyncOutputAvailable(index, bufferInfo) {
// Process frame in callback thread
// Push to queue for main thread consumption
}
```
### 2. Better CPU Utilization
- **Sync mode**: Thread sleeps during dequeue operations
- **Async mode**: Callbacks notify when frames ready, threads can do other work
### 3. Improved Pipeline Efficiency
```
Sync Mode:
Thread 1: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 2: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 3: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 4: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Total Blocking: 40ms per frame cycle
Async Mode:
Thread 1: [Queue Input] → [Continue]
Thread 2: [Queue Input] → [Continue]
Thread 3: [Queue Input] → [Continue]
Thread 4: [Queue Input] → [Continue]
Callback Threads: [Process outputs concurrently]
Total Blocking: 0ms
```
### 4. Memory Bandwidth Optimization
- 4K AV1 frame: ~12MB (3840×2160 YUV420)
- 4x instances: 48MB/frame × 30fps = **1.4GB/s bandwidth**
- Async mode allows better bandwidth scheduling by hardware
---
## Current Implementation Status
### ✅ Already Implemented
1. **MediaCodecAsyncHandler** - Complete implementation
- Location: `vav2/platforms/android/vavcore/src/Decoder/MediaCodecAsyncHandler.h/.cpp`
- Async callbacks: `onInputBufferAvailable`, `onAsyncOutputAvailable`, `onFormatChanged`, `onError`
- Frame queue management with mutex/condition_variable
- Thread-safe async frame data structure
2. **Static Callback Dispatchers**
```cpp
OnAsyncInputAvailable()
OnAsyncOutputAvailable()
OnAsyncFormatChanged()
OnAsyncError()
```
3. **Async Frame Queue**
```cpp
struct AsyncFrameData {
std::unique_ptr<VideoFrame> frame;
int64_t timestamp_us;
bool is_keyframe; // Placeholder for NDK 26
std::chrono::steady_clock::time_point decode_start_time;
};
std::queue<AsyncFrameData> m_async_output_queue;
```
### ❌ Missing Implementation
1. **DecodeToSurface** does not use async path
- Current: Calls `ProcessInputBuffer()``ProcessOutputBuffer()` (sync)
- Required: Call `DecodeFrameAsync()` when async mode enabled
2. **ProcessAsyncOutputFrame** incomplete
- Current: Placeholder implementation (line 236-256 in MediaCodecAsyncHandler.cpp)
- Required: Proper frame processing for Vulkan/ImageReader pipeline
3. **Async Mode Activation**
- Current: `InitializeAsyncMode()` called but not actually used
- Required: Enable async mode for multi-instance scenarios
---
## Implementation Plan
### Phase 1: Complete ProcessAsyncOutputFrame (High Priority)
**File**: `MediaCodecAsyncHandler.cpp:236-256`
**Current (Incomplete)**:
```cpp
bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
int32_t output_index,
AMediaCodecBufferInfo* buffer_info,
VideoFrame& output_frame) {
// TODO: Process output buffer and fill VideoFrame
// For now, just release the buffer
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
return true;
}
```
**Required Implementation**:
```cpp
bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
int32_t output_index,
AMediaCodecBufferInfo* buffer_info,
VideoFrame& output_frame) {
if (!m_codec || output_index < 0 || !buffer_info) {
return false;
}
// Step 1: Get MediaCodec output buffer
size_t buffer_size = 0;
uint8_t* output_buffer = AMediaCodec_getOutputBuffer(
m_codec, output_index, &buffer_size);
if (!output_buffer) {
LogError("Failed to get output buffer");
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
return false;
}
// Step 2: Fill VideoFrame metadata
output_frame.timestamp_us = buffer_info->presentationTimeUs;
output_frame.is_keyframe = false; // NDK 26 limitation
output_frame.surface_type = VAVCORE_SURFACE_ANDROID_HARDWARE_BUFFER;
// Step 3: Acquire AHardwareBuffer from ImageReader
// Delegate to MediaCodecSurfaceManager
AHardwareBuffer* ahb = m_decoder->GetSurfaceManager()->AcquireLatestImage();
if (!ahb) {
LogError("Failed to acquire AHardwareBuffer from ImageReader");
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
return false;
}
// Step 4: Store AHardwareBuffer in VideoFrame
output_frame.ahardware_buffer = ahb;
// Step 5: Release MediaCodec buffer (render to ImageReader surface)
AMediaCodec_releaseOutputBuffer(m_codec, output_index, true); // render=true
return true;
}
```
### Phase 2: Integrate Async Path in DecodeToSurface
**File**: `MediaCodecAV1Decoder.cpp` (DecodeToSurface method)
**Add Mode Selection**:
```cpp
bool MediaCodecAV1Decoder::DecodeToSurface(
const uint8_t* packet_data,
size_t packet_size,
VavCoreSurfaceType target_type,
void* target_surface,
VideoFrame& output_frame) {
// Check if async mode enabled and beneficial
if (m_async_handler->IsAsyncModeEnabled()) {
return DecodeFrameAsync(packet_data, packet_size, output_frame);
}
// Fall back to sync mode (current implementation)
if (!ProcessInputBuffer(packet_data, packet_size)) {
return false;
}
return ProcessOutputBuffer(output_frame);
}
```
### Phase 3: Add Multi-Instance Detection
**File**: `MediaCodecAV1Decoder.cpp` (Initialize method)
**Auto-Enable Async for Multi-Instance**:
```cpp
bool MediaCodecAV1Decoder::Initialize(const VideoMetadata& metadata) {
// ... existing initialization ...
// Enable async mode for high-resolution or multi-instance scenarios
if (metadata.width >= 3840 || ShouldEnableAsyncMode()) {
if (m_async_handler->EnableAsyncMode(true)) {
LogInfo("Async decoding enabled for high-resolution video");
}
}
return FinalizeInitialization();
}
```
### Phase 4: Testing
**Test Cases**:
1. Single 4K video playback (async vs sync benchmark)
2. 4x 4K videos simultaneously (target 28-30fps all instances)
3. Memory bandwidth monitoring (adb logcat performance)
4. Thread contention analysis (systrace)
---
## API Design
### User-Facing Configuration
```cpp
// VavCore C API addition (optional)
VAVCORE_API void vavcore_enable_async_decoding(VavCoreDecoder* decoder, bool enable);
VAVCORE_API bool vavcore_is_async_enabled(VavCoreDecoder* decoder);
```
### Internal Auto-Detection
```cpp
// Auto-enable async for:
// 1. Resolution >= 4K (3840x2160)
// 2. Multiple decoder instances detected
// 3. High-end SoC (Snapdragon 8 Elite, Exynos 2400)
bool MediaCodecAV1Decoder::ShouldEnableAsyncMode() const {
// Check resolution
if (m_width >= 3840 && m_height >= 2160) {
return true;
}
// Check device capability (Samsung Galaxy S24, etc.)
std::string soc = GetSoCName();
if (soc.find("SM8650") != std::string::npos || // Snapdragon 8 Elite
soc.find("Exynos2400") != std::string::npos) {
return true;
}
return false;
}
```
---
## Performance Expectations
### Baseline (Current Sync Mode)
- Single 4K: 30fps ✅
- 4x 4K: 20-25fps ⚠️ (frame drops, stuttering)
### Target (Async Mode)
- Single 4K: 30fps ✅ (same performance)
- 4x 4K: 28-30fps ✅ (smooth playback)
- CPU utilization: +20-30% improvement
- Thread blocking: -80% reduction
### Hardware Requirements
- **Minimum**: Android 8.0 (API 26) with NDK 26
- **Optimal**: Snapdragon 8 Gen 2+ or Exynos 2300+
- **Memory**: Sufficient bandwidth for 1.4GB/s (4x 4K)
---
## Risk Analysis
### Low Risk
- ✅ MediaCodecAsyncHandler already implemented
- ✅ No NDK version upgrade required (stays at NDK 26)
- ✅ Keyframe detection not needed (WebM provides it)
### Medium Risk
- ⚠️ Thread synchronization complexity (mitigated by existing queue implementation)
- ⚠️ Memory bandwidth saturation on mid-range devices
### Mitigation Strategies
1. **Fallback to Sync**: If async initialization fails, use sync mode
2. **Progressive Rollout**: Enable async only for high-end devices initially
3. **Performance Monitoring**: Add metrics to detect frame drops
---
## References
### Implementation Files
- **MediaCodecAsyncHandler.h/.cpp**: Async callback management
- **MediaCodecAV1Decoder.h/.cpp**: Main decoder integration
- **MediaCodecSurfaceManager.h/.cpp**: ImageReader/AHardwareBuffer handling
### Android Documentation
- [MediaCodec Asynchronous Processing](https://developer.android.com/reference/android/media/MediaCodec#asynchronous-processing-using-buffers)
- [AMediaCodec_setAsyncNotifyCallback](https://developer.android.com/ndk/reference/group/media#amediacodec_setasyncnotifycallback)
### Performance Analysis
- NVDEC async decoding (Windows reference): PollingThread pattern
- Expected gain: 1-3ms per frame (not measured, theoretical from pipelining)
---
## Conclusion
**Recommendation**: Implement async decoding for 4x simultaneous 4K playback use case.
**Expected Outcome**:
- Significant performance improvement for multi-instance scenarios
- Minimal risk (infrastructure already exists)
- Better resource utilization on high-end devices
**Next Steps**:
1. Complete `ProcessAsyncOutputFrame()` implementation (Phase 1)
2. Integrate async path in `DecodeToSurface()` (Phase 2)
3. Add auto-detection logic (Phase 3)
4. Test with 4x 4K videos (Phase 4)
---
*Document created by Claude Code*
*Last updated: 2025-10-12*