MediaCodec Asynchronous Decoding Design
This commit is contained in:
357
vav2/docs/working/MediaCodec_Async_Decoding_Design.md
Normal file
357
vav2/docs/working/MediaCodec_Async_Decoding_Design.md
Normal file
@@ -0,0 +1,357 @@
|
||||
# MediaCodec Asynchronous Decoding Design
|
||||
|
||||
## Document Information
|
||||
- **Created**: 2025-10-12
|
||||
- **Status**: Implementation Required
|
||||
- **Target Platform**: Android (NDK 26)
|
||||
- **Use Case**: Simultaneous 4K video playback (4 instances)
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
### Current Implementation (Synchronous Mode)
|
||||
```cpp
|
||||
// MediaCodecAV1Decoder::DecodeToSurface (current)
|
||||
bool ProcessInputBuffer(data, size) {
|
||||
ssize_t index = AMediaCodec_dequeueInputBuffer(10000); // 10ms blocking
|
||||
// ... copy data ...
|
||||
AMediaCodec_queueInputBuffer(...);
|
||||
}
|
||||
|
||||
bool ProcessOutputBuffer(VideoFrame& frame) {
|
||||
AMediaCodecBufferInfo info;
|
||||
ssize_t index = AMediaCodec_dequeueOutputBuffer(&info, 10000); // 10ms blocking
|
||||
// ... process frame ...
|
||||
}
|
||||
```
|
||||
|
||||
**Bottleneck for 4x Simultaneous 4K Playback:**
|
||||
- Each decoder thread blocks 10-20ms per frame on dequeue operations
|
||||
- 4 threads × 10-20ms blocking = significant CPU idle time
|
||||
- Thread contention increases frame drop probability
|
||||
- Poor CPU utilization during blocking periods
|
||||
|
||||
### Performance Impact (Estimated)
|
||||
| Scenario | Sync Mode | Async Mode |
|
||||
|----------|-----------|------------|
|
||||
| Single 4K video | 30fps ✅ | 30fps ✅ |
|
||||
| 4x 4K videos | 20-25fps ⚠️ | 28-30fps ✅ |
|
||||
| CPU utilization | 40-50% (blocking) | 70-80% (event-driven) |
|
||||
| Thread blocking | 10-20ms/frame | 0ms (callback) |
|
||||
|
||||
---
|
||||
|
||||
## Asynchronous Mode Benefits
|
||||
|
||||
### 1. Reduced Thread Blocking
|
||||
```cpp
|
||||
// Async mode: Non-blocking input
|
||||
AMediaCodec_queueInputBuffer(...); // Returns immediately
|
||||
|
||||
// Output handled by callback (separate thread)
|
||||
onAsyncOutputAvailable(index, bufferInfo) {
|
||||
// Process frame in callback thread
|
||||
// Push to queue for main thread consumption
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Better CPU Utilization
|
||||
- **Sync mode**: Thread sleeps during dequeue operations
|
||||
- **Async mode**: Callbacks notify when frames ready, threads can do other work
|
||||
|
||||
### 3. Improved Pipeline Efficiency
|
||||
```
|
||||
Sync Mode:
|
||||
Thread 1: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
|
||||
Thread 2: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
|
||||
Thread 3: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
|
||||
Thread 4: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
|
||||
Total Blocking: 40ms per frame cycle
|
||||
|
||||
Async Mode:
|
||||
Thread 1: [Queue Input] → [Continue]
|
||||
Thread 2: [Queue Input] → [Continue]
|
||||
Thread 3: [Queue Input] → [Continue]
|
||||
Thread 4: [Queue Input] → [Continue]
|
||||
Callback Threads: [Process outputs concurrently]
|
||||
Total Blocking: 0ms
|
||||
```
|
||||
|
||||
### 4. Memory Bandwidth Optimization
|
||||
- 4K AV1 frame: ~12MB (3840×2160 YUV420)
|
||||
- 4x instances: 48MB/frame × 30fps = **1.4GB/s bandwidth**
|
||||
- Async mode allows better bandwidth scheduling by hardware
|
||||
|
||||
---
|
||||
|
||||
## Current Implementation Status
|
||||
|
||||
### ✅ Already Implemented
|
||||
1. **MediaCodecAsyncHandler** - Complete implementation
|
||||
- Location: `vav2/platforms/android/vavcore/src/Decoder/MediaCodecAsyncHandler.h/.cpp`
|
||||
- Async callbacks: `onInputBufferAvailable`, `onAsyncOutputAvailable`, `onFormatChanged`, `onError`
|
||||
- Frame queue management with mutex/condition_variable
|
||||
- Thread-safe async frame data structure
|
||||
|
||||
2. **Static Callback Dispatchers**
|
||||
```cpp
|
||||
OnAsyncInputAvailable()
|
||||
OnAsyncOutputAvailable()
|
||||
OnAsyncFormatChanged()
|
||||
OnAsyncError()
|
||||
```
|
||||
|
||||
3. **Async Frame Queue**
|
||||
```cpp
|
||||
struct AsyncFrameData {
|
||||
std::unique_ptr<VideoFrame> frame;
|
||||
int64_t timestamp_us;
|
||||
bool is_keyframe; // Placeholder for NDK 26
|
||||
std::chrono::steady_clock::time_point decode_start_time;
|
||||
};
|
||||
std::queue<AsyncFrameData> m_async_output_queue;
|
||||
```
|
||||
|
||||
### ❌ Missing Implementation
|
||||
1. **DecodeToSurface** does not use async path
|
||||
- Current: Calls `ProcessInputBuffer()` → `ProcessOutputBuffer()` (sync)
|
||||
- Required: Call `DecodeFrameAsync()` when async mode enabled
|
||||
|
||||
2. **ProcessAsyncOutputFrame** incomplete
|
||||
- Current: Placeholder implementation (line 236-256 in MediaCodecAsyncHandler.cpp)
|
||||
- Required: Proper frame processing for Vulkan/ImageReader pipeline
|
||||
|
||||
3. **Async Mode Activation**
|
||||
- Current: `InitializeAsyncMode()` called but not actually used
|
||||
- Required: Enable async mode for multi-instance scenarios
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Complete ProcessAsyncOutputFrame (High Priority)
|
||||
**File**: `MediaCodecAsyncHandler.cpp:236-256`
|
||||
|
||||
**Current (Incomplete)**:
|
||||
```cpp
|
||||
bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
|
||||
int32_t output_index,
|
||||
AMediaCodecBufferInfo* buffer_info,
|
||||
VideoFrame& output_frame) {
|
||||
|
||||
// TODO: Process output buffer and fill VideoFrame
|
||||
// For now, just release the buffer
|
||||
|
||||
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
|
||||
return true;
|
||||
}
|
||||
```
|
||||
|
||||
**Required Implementation**:
|
||||
```cpp
|
||||
bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
|
||||
int32_t output_index,
|
||||
AMediaCodecBufferInfo* buffer_info,
|
||||
VideoFrame& output_frame) {
|
||||
|
||||
if (!m_codec || output_index < 0 || !buffer_info) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Step 1: Get MediaCodec output buffer
|
||||
size_t buffer_size = 0;
|
||||
uint8_t* output_buffer = AMediaCodec_getOutputBuffer(
|
||||
m_codec, output_index, &buffer_size);
|
||||
|
||||
if (!output_buffer) {
|
||||
LogError("Failed to get output buffer");
|
||||
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
|
||||
return false;
|
||||
}
|
||||
|
||||
// Step 2: Fill VideoFrame metadata
|
||||
output_frame.timestamp_us = buffer_info->presentationTimeUs;
|
||||
output_frame.is_keyframe = false; // NDK 26 limitation
|
||||
output_frame.surface_type = VAVCORE_SURFACE_ANDROID_HARDWARE_BUFFER;
|
||||
|
||||
// Step 3: Acquire AHardwareBuffer from ImageReader
|
||||
// Delegate to MediaCodecSurfaceManager
|
||||
AHardwareBuffer* ahb = m_decoder->GetSurfaceManager()->AcquireLatestImage();
|
||||
if (!ahb) {
|
||||
LogError("Failed to acquire AHardwareBuffer from ImageReader");
|
||||
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
|
||||
return false;
|
||||
}
|
||||
|
||||
// Step 4: Store AHardwareBuffer in VideoFrame
|
||||
output_frame.ahardware_buffer = ahb;
|
||||
|
||||
// Step 5: Release MediaCodec buffer (render to ImageReader surface)
|
||||
AMediaCodec_releaseOutputBuffer(m_codec, output_index, true); // render=true
|
||||
|
||||
return true;
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 2: Integrate Async Path in DecodeToSurface
|
||||
**File**: `MediaCodecAV1Decoder.cpp` (DecodeToSurface method)
|
||||
|
||||
**Add Mode Selection**:
|
||||
```cpp
|
||||
bool MediaCodecAV1Decoder::DecodeToSurface(
|
||||
const uint8_t* packet_data,
|
||||
size_t packet_size,
|
||||
VavCoreSurfaceType target_type,
|
||||
void* target_surface,
|
||||
VideoFrame& output_frame) {
|
||||
|
||||
// Check if async mode enabled and beneficial
|
||||
if (m_async_handler->IsAsyncModeEnabled()) {
|
||||
return DecodeFrameAsync(packet_data, packet_size, output_frame);
|
||||
}
|
||||
|
||||
// Fall back to sync mode (current implementation)
|
||||
if (!ProcessInputBuffer(packet_data, packet_size)) {
|
||||
return false;
|
||||
}
|
||||
return ProcessOutputBuffer(output_frame);
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 3: Add Multi-Instance Detection
|
||||
**File**: `MediaCodecAV1Decoder.cpp` (Initialize method)
|
||||
|
||||
**Auto-Enable Async for Multi-Instance**:
|
||||
```cpp
|
||||
bool MediaCodecAV1Decoder::Initialize(const VideoMetadata& metadata) {
|
||||
// ... existing initialization ...
|
||||
|
||||
// Enable async mode for high-resolution or multi-instance scenarios
|
||||
if (metadata.width >= 3840 || ShouldEnableAsyncMode()) {
|
||||
if (m_async_handler->EnableAsyncMode(true)) {
|
||||
LogInfo("Async decoding enabled for high-resolution video");
|
||||
}
|
||||
}
|
||||
|
||||
return FinalizeInitialization();
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 4: Testing
|
||||
**Test Cases**:
|
||||
1. Single 4K video playback (async vs sync benchmark)
|
||||
2. 4x 4K videos simultaneously (target 28-30fps all instances)
|
||||
3. Memory bandwidth monitoring (adb logcat performance)
|
||||
4. Thread contention analysis (systrace)
|
||||
|
||||
---
|
||||
|
||||
## API Design
|
||||
|
||||
### User-Facing Configuration
|
||||
```cpp
|
||||
// VavCore C API addition (optional)
|
||||
VAVCORE_API void vavcore_enable_async_decoding(VavCoreDecoder* decoder, bool enable);
|
||||
VAVCORE_API bool vavcore_is_async_enabled(VavCoreDecoder* decoder);
|
||||
```
|
||||
|
||||
### Internal Auto-Detection
|
||||
```cpp
|
||||
// Auto-enable async for:
|
||||
// 1. Resolution >= 4K (3840x2160)
|
||||
// 2. Multiple decoder instances detected
|
||||
// 3. High-end SoC (Snapdragon 8 Elite, Exynos 2400)
|
||||
|
||||
bool MediaCodecAV1Decoder::ShouldEnableAsyncMode() const {
|
||||
// Check resolution
|
||||
if (m_width >= 3840 && m_height >= 2160) {
|
||||
return true;
|
||||
}
|
||||
|
||||
// Check device capability (Samsung Galaxy S24, etc.)
|
||||
std::string soc = GetSoCName();
|
||||
if (soc.find("SM8650") != std::string::npos || // Snapdragon 8 Elite
|
||||
soc.find("Exynos2400") != std::string::npos) {
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Expectations
|
||||
|
||||
### Baseline (Current Sync Mode)
|
||||
- Single 4K: 30fps ✅
|
||||
- 4x 4K: 20-25fps ⚠️ (frame drops, stuttering)
|
||||
|
||||
### Target (Async Mode)
|
||||
- Single 4K: 30fps ✅ (same performance)
|
||||
- 4x 4K: 28-30fps ✅ (smooth playback)
|
||||
- CPU utilization: +20-30% improvement
|
||||
- Thread blocking: -80% reduction
|
||||
|
||||
### Hardware Requirements
|
||||
- **Minimum**: Android 8.0 (API 26) with NDK 26
|
||||
- **Optimal**: Snapdragon 8 Gen 2+ or Exynos 2300+
|
||||
- **Memory**: Sufficient bandwidth for 1.4GB/s (4x 4K)
|
||||
|
||||
---
|
||||
|
||||
## Risk Analysis
|
||||
|
||||
### Low Risk
|
||||
- ✅ MediaCodecAsyncHandler already implemented
|
||||
- ✅ No NDK version upgrade required (stays at NDK 26)
|
||||
- ✅ Keyframe detection not needed (WebM provides it)
|
||||
|
||||
### Medium Risk
|
||||
- ⚠️ Thread synchronization complexity (mitigated by existing queue implementation)
|
||||
- ⚠️ Memory bandwidth saturation on mid-range devices
|
||||
|
||||
### Mitigation Strategies
|
||||
1. **Fallback to Sync**: If async initialization fails, use sync mode
|
||||
2. **Progressive Rollout**: Enable async only for high-end devices initially
|
||||
3. **Performance Monitoring**: Add metrics to detect frame drops
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
### Implementation Files
|
||||
- **MediaCodecAsyncHandler.h/.cpp**: Async callback management
|
||||
- **MediaCodecAV1Decoder.h/.cpp**: Main decoder integration
|
||||
- **MediaCodecSurfaceManager.h/.cpp**: ImageReader/AHardwareBuffer handling
|
||||
|
||||
### Android Documentation
|
||||
- [MediaCodec Asynchronous Processing](https://developer.android.com/reference/android/media/MediaCodec#asynchronous-processing-using-buffers)
|
||||
- [AMediaCodec_setAsyncNotifyCallback](https://developer.android.com/ndk/reference/group/media#amediacodec_setasyncnotifycallback)
|
||||
|
||||
### Performance Analysis
|
||||
- NVDEC async decoding (Windows reference): PollingThread pattern
|
||||
- Expected gain: 1-3ms per frame (not measured, theoretical from pipelining)
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Recommendation**: Implement async decoding for 4x simultaneous 4K playback use case.
|
||||
|
||||
**Expected Outcome**:
|
||||
- Significant performance improvement for multi-instance scenarios
|
||||
- Minimal risk (infrastructure already exists)
|
||||
- Better resource utilization on high-end devices
|
||||
|
||||
**Next Steps**:
|
||||
1. Complete `ProcessAsyncOutputFrame()` implementation (Phase 1)
|
||||
2. Integrate async path in `DecodeToSurface()` (Phase 2)
|
||||
3. Add auto-detection logic (Phase 3)
|
||||
4. Test with 4x 4K videos (Phase 4)
|
||||
|
||||
---
|
||||
|
||||
*Document created by Claude Code*
|
||||
*Last updated: 2025-10-12*
|
||||
Reference in New Issue
Block a user