358 lines
11 KiB
Markdown
358 lines
11 KiB
Markdown
|
|
# MediaCodec Asynchronous Decoding Design
|
|||
|
|
|
|||
|
|
## Document Information
|
|||
|
|
- **Created**: 2025-10-12
|
|||
|
|
- **Status**: Implementation Required
|
|||
|
|
- **Target Platform**: Android (NDK 26)
|
|||
|
|
- **Use Case**: Simultaneous 4K video playback (4 instances)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Problem Statement
|
|||
|
|
|
|||
|
|
### Current Implementation (Synchronous Mode)
|
|||
|
|
```cpp
|
|||
|
|
// MediaCodecAV1Decoder::DecodeToSurface (current)
|
|||
|
|
bool ProcessInputBuffer(data, size) {
|
|||
|
|
ssize_t index = AMediaCodec_dequeueInputBuffer(10000); // 10ms blocking
|
|||
|
|
// ... copy data ...
|
|||
|
|
AMediaCodec_queueInputBuffer(...);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
bool ProcessOutputBuffer(VideoFrame& frame) {
|
|||
|
|
AMediaCodecBufferInfo info;
|
|||
|
|
ssize_t index = AMediaCodec_dequeueOutputBuffer(&info, 10000); // 10ms blocking
|
|||
|
|
// ... process frame ...
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Bottleneck for 4x Simultaneous 4K Playback:**
|
|||
|
|
- Each decoder thread blocks 10-20ms per frame on dequeue operations
|
|||
|
|
- 4 threads × 10-20ms blocking = significant CPU idle time
|
|||
|
|
- Thread contention increases frame drop probability
|
|||
|
|
- Poor CPU utilization during blocking periods
|
|||
|
|
|
|||
|
|
### Performance Impact (Estimated)
|
|||
|
|
| Scenario | Sync Mode | Async Mode |
|
|||
|
|
|----------|-----------|------------|
|
|||
|
|
| Single 4K video | 30fps ✅ | 30fps ✅ |
|
|||
|
|
| 4x 4K videos | 20-25fps ⚠️ | 28-30fps ✅ |
|
|||
|
|
| CPU utilization | 40-50% (blocking) | 70-80% (event-driven) |
|
|||
|
|
| Thread blocking | 10-20ms/frame | 0ms (callback) |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Asynchronous Mode Benefits
|
|||
|
|
|
|||
|
|
### 1. Reduced Thread Blocking
|
|||
|
|
```cpp
|
|||
|
|
// Async mode: Non-blocking input
|
|||
|
|
AMediaCodec_queueInputBuffer(...); // Returns immediately
|
|||
|
|
|
|||
|
|
// Output handled by callback (separate thread)
|
|||
|
|
onAsyncOutputAvailable(index, bufferInfo) {
|
|||
|
|
// Process frame in callback thread
|
|||
|
|
// Push to queue for main thread consumption
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. Better CPU Utilization
|
|||
|
|
- **Sync mode**: Thread sleeps during dequeue operations
|
|||
|
|
- **Async mode**: Callbacks notify when frames ready, threads can do other work
|
|||
|
|
|
|||
|
|
### 3. Improved Pipeline Efficiency
|
|||
|
|
```
|
|||
|
|
Sync Mode:
|
|||
|
|
Thread 1: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
|
|||
|
|
Thread 2: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
|
|||
|
|
Thread 3: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
|
|||
|
|
Thread 4: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
|
|||
|
|
Total Blocking: 40ms per frame cycle
|
|||
|
|
|
|||
|
|
Async Mode:
|
|||
|
|
Thread 1: [Queue Input] → [Continue]
|
|||
|
|
Thread 2: [Queue Input] → [Continue]
|
|||
|
|
Thread 3: [Queue Input] → [Continue]
|
|||
|
|
Thread 4: [Queue Input] → [Continue]
|
|||
|
|
Callback Threads: [Process outputs concurrently]
|
|||
|
|
Total Blocking: 0ms
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. Memory Bandwidth Optimization
|
|||
|
|
- 4K AV1 frame: ~12MB (3840×2160 YUV420)
|
|||
|
|
- 4x instances: 48MB/frame × 30fps = **1.4GB/s bandwidth**
|
|||
|
|
- Async mode allows better bandwidth scheduling by hardware
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Current Implementation Status
|
|||
|
|
|
|||
|
|
### ✅ Already Implemented
|
|||
|
|
1. **MediaCodecAsyncHandler** - Complete implementation
|
|||
|
|
- Location: `vav2/platforms/android/vavcore/src/Decoder/MediaCodecAsyncHandler.h/.cpp`
|
|||
|
|
- Async callbacks: `onInputBufferAvailable`, `onAsyncOutputAvailable`, `onFormatChanged`, `onError`
|
|||
|
|
- Frame queue management with mutex/condition_variable
|
|||
|
|
- Thread-safe async frame data structure
|
|||
|
|
|
|||
|
|
2. **Static Callback Dispatchers**
|
|||
|
|
```cpp
|
|||
|
|
OnAsyncInputAvailable()
|
|||
|
|
OnAsyncOutputAvailable()
|
|||
|
|
OnAsyncFormatChanged()
|
|||
|
|
OnAsyncError()
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
3. **Async Frame Queue**
|
|||
|
|
```cpp
|
|||
|
|
struct AsyncFrameData {
|
|||
|
|
std::unique_ptr<VideoFrame> frame;
|
|||
|
|
int64_t timestamp_us;
|
|||
|
|
bool is_keyframe; // Placeholder for NDK 26
|
|||
|
|
std::chrono::steady_clock::time_point decode_start_time;
|
|||
|
|
};
|
|||
|
|
std::queue<AsyncFrameData> m_async_output_queue;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### ❌ Missing Implementation
|
|||
|
|
1. **DecodeToSurface** does not use async path
|
|||
|
|
- Current: Calls `ProcessInputBuffer()` → `ProcessOutputBuffer()` (sync)
|
|||
|
|
- Required: Call `DecodeFrameAsync()` when async mode enabled
|
|||
|
|
|
|||
|
|
2. **ProcessAsyncOutputFrame** incomplete
|
|||
|
|
- Current: Placeholder implementation (line 236-256 in MediaCodecAsyncHandler.cpp)
|
|||
|
|
- Required: Proper frame processing for Vulkan/ImageReader pipeline
|
|||
|
|
|
|||
|
|
3. **Async Mode Activation**
|
|||
|
|
- Current: `InitializeAsyncMode()` called but not actually used
|
|||
|
|
- Required: Enable async mode for multi-instance scenarios
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Implementation Plan
|
|||
|
|
|
|||
|
|
### Phase 1: Complete ProcessAsyncOutputFrame (High Priority)
|
|||
|
|
**File**: `MediaCodecAsyncHandler.cpp:236-256`
|
|||
|
|
|
|||
|
|
**Current (Incomplete)**:
|
|||
|
|
```cpp
|
|||
|
|
bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
|
|||
|
|
int32_t output_index,
|
|||
|
|
AMediaCodecBufferInfo* buffer_info,
|
|||
|
|
VideoFrame& output_frame) {
|
|||
|
|
|
|||
|
|
// TODO: Process output buffer and fill VideoFrame
|
|||
|
|
// For now, just release the buffer
|
|||
|
|
|
|||
|
|
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
|
|||
|
|
return true;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Required Implementation**:
|
|||
|
|
```cpp
|
|||
|
|
bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
|
|||
|
|
int32_t output_index,
|
|||
|
|
AMediaCodecBufferInfo* buffer_info,
|
|||
|
|
VideoFrame& output_frame) {
|
|||
|
|
|
|||
|
|
if (!m_codec || output_index < 0 || !buffer_info) {
|
|||
|
|
return false;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Step 1: Get MediaCodec output buffer
|
|||
|
|
size_t buffer_size = 0;
|
|||
|
|
uint8_t* output_buffer = AMediaCodec_getOutputBuffer(
|
|||
|
|
m_codec, output_index, &buffer_size);
|
|||
|
|
|
|||
|
|
if (!output_buffer) {
|
|||
|
|
LogError("Failed to get output buffer");
|
|||
|
|
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
|
|||
|
|
return false;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Step 2: Fill VideoFrame metadata
|
|||
|
|
output_frame.timestamp_us = buffer_info->presentationTimeUs;
|
|||
|
|
output_frame.is_keyframe = false; // NDK 26 limitation
|
|||
|
|
output_frame.surface_type = VAVCORE_SURFACE_ANDROID_HARDWARE_BUFFER;
|
|||
|
|
|
|||
|
|
// Step 3: Acquire AHardwareBuffer from ImageReader
|
|||
|
|
// Delegate to MediaCodecSurfaceManager
|
|||
|
|
AHardwareBuffer* ahb = m_decoder->GetSurfaceManager()->AcquireLatestImage();
|
|||
|
|
if (!ahb) {
|
|||
|
|
LogError("Failed to acquire AHardwareBuffer from ImageReader");
|
|||
|
|
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
|
|||
|
|
return false;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Step 4: Store AHardwareBuffer in VideoFrame
|
|||
|
|
output_frame.ahardware_buffer = ahb;
|
|||
|
|
|
|||
|
|
// Step 5: Release MediaCodec buffer (render to ImageReader surface)
|
|||
|
|
AMediaCodec_releaseOutputBuffer(m_codec, output_index, true); // render=true
|
|||
|
|
|
|||
|
|
return true;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Phase 2: Integrate Async Path in DecodeToSurface
|
|||
|
|
**File**: `MediaCodecAV1Decoder.cpp` (DecodeToSurface method)
|
|||
|
|
|
|||
|
|
**Add Mode Selection**:
|
|||
|
|
```cpp
|
|||
|
|
bool MediaCodecAV1Decoder::DecodeToSurface(
|
|||
|
|
const uint8_t* packet_data,
|
|||
|
|
size_t packet_size,
|
|||
|
|
VavCoreSurfaceType target_type,
|
|||
|
|
void* target_surface,
|
|||
|
|
VideoFrame& output_frame) {
|
|||
|
|
|
|||
|
|
// Check if async mode enabled and beneficial
|
|||
|
|
if (m_async_handler->IsAsyncModeEnabled()) {
|
|||
|
|
return DecodeFrameAsync(packet_data, packet_size, output_frame);
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Fall back to sync mode (current implementation)
|
|||
|
|
if (!ProcessInputBuffer(packet_data, packet_size)) {
|
|||
|
|
return false;
|
|||
|
|
}
|
|||
|
|
return ProcessOutputBuffer(output_frame);
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Phase 3: Add Multi-Instance Detection
|
|||
|
|
**File**: `MediaCodecAV1Decoder.cpp` (Initialize method)
|
|||
|
|
|
|||
|
|
**Auto-Enable Async for Multi-Instance**:
|
|||
|
|
```cpp
|
|||
|
|
bool MediaCodecAV1Decoder::Initialize(const VideoMetadata& metadata) {
|
|||
|
|
// ... existing initialization ...
|
|||
|
|
|
|||
|
|
// Enable async mode for high-resolution or multi-instance scenarios
|
|||
|
|
if (metadata.width >= 3840 || ShouldEnableAsyncMode()) {
|
|||
|
|
if (m_async_handler->EnableAsyncMode(true)) {
|
|||
|
|
LogInfo("Async decoding enabled for high-resolution video");
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return FinalizeInitialization();
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Phase 4: Testing
|
|||
|
|
**Test Cases**:
|
|||
|
|
1. Single 4K video playback (async vs sync benchmark)
|
|||
|
|
2. 4x 4K videos simultaneously (target 28-30fps all instances)
|
|||
|
|
3. Memory bandwidth monitoring (adb logcat performance)
|
|||
|
|
4. Thread contention analysis (systrace)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## API Design
|
|||
|
|
|
|||
|
|
### User-Facing Configuration
|
|||
|
|
```cpp
|
|||
|
|
// VavCore C API addition (optional)
|
|||
|
|
VAVCORE_API void vavcore_enable_async_decoding(VavCoreDecoder* decoder, bool enable);
|
|||
|
|
VAVCORE_API bool vavcore_is_async_enabled(VavCoreDecoder* decoder);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Internal Auto-Detection
|
|||
|
|
```cpp
|
|||
|
|
// Auto-enable async for:
|
|||
|
|
// 1. Resolution >= 4K (3840x2160)
|
|||
|
|
// 2. Multiple decoder instances detected
|
|||
|
|
// 3. High-end SoC (Snapdragon 8 Elite, Exynos 2400)
|
|||
|
|
|
|||
|
|
bool MediaCodecAV1Decoder::ShouldEnableAsyncMode() const {
|
|||
|
|
// Check resolution
|
|||
|
|
if (m_width >= 3840 && m_height >= 2160) {
|
|||
|
|
return true;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
// Check device capability (Samsung Galaxy S24, etc.)
|
|||
|
|
std::string soc = GetSoCName();
|
|||
|
|
if (soc.find("SM8650") != std::string::npos || // Snapdragon 8 Elite
|
|||
|
|
soc.find("Exynos2400") != std::string::npos) {
|
|||
|
|
return true;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
return false;
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Performance Expectations
|
|||
|
|
|
|||
|
|
### Baseline (Current Sync Mode)
|
|||
|
|
- Single 4K: 30fps ✅
|
|||
|
|
- 4x 4K: 20-25fps ⚠️ (frame drops, stuttering)
|
|||
|
|
|
|||
|
|
### Target (Async Mode)
|
|||
|
|
- Single 4K: 30fps ✅ (same performance)
|
|||
|
|
- 4x 4K: 28-30fps ✅ (smooth playback)
|
|||
|
|
- CPU utilization: +20-30% improvement
|
|||
|
|
- Thread blocking: -80% reduction
|
|||
|
|
|
|||
|
|
### Hardware Requirements
|
|||
|
|
- **Minimum**: Android 8.0 (API 26) with NDK 26
|
|||
|
|
- **Optimal**: Snapdragon 8 Gen 2+ or Exynos 2300+
|
|||
|
|
- **Memory**: Sufficient bandwidth for 1.4GB/s (4x 4K)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Risk Analysis
|
|||
|
|
|
|||
|
|
### Low Risk
|
|||
|
|
- ✅ MediaCodecAsyncHandler already implemented
|
|||
|
|
- ✅ No NDK version upgrade required (stays at NDK 26)
|
|||
|
|
- ✅ Keyframe detection not needed (WebM provides it)
|
|||
|
|
|
|||
|
|
### Medium Risk
|
|||
|
|
- ⚠️ Thread synchronization complexity (mitigated by existing queue implementation)
|
|||
|
|
- ⚠️ Memory bandwidth saturation on mid-range devices
|
|||
|
|
|
|||
|
|
### Mitigation Strategies
|
|||
|
|
1. **Fallback to Sync**: If async initialization fails, use sync mode
|
|||
|
|
2. **Progressive Rollout**: Enable async only for high-end devices initially
|
|||
|
|
3. **Performance Monitoring**: Add metrics to detect frame drops
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## References
|
|||
|
|
|
|||
|
|
### Implementation Files
|
|||
|
|
- **MediaCodecAsyncHandler.h/.cpp**: Async callback management
|
|||
|
|
- **MediaCodecAV1Decoder.h/.cpp**: Main decoder integration
|
|||
|
|
- **MediaCodecSurfaceManager.h/.cpp**: ImageReader/AHardwareBuffer handling
|
|||
|
|
|
|||
|
|
### Android Documentation
|
|||
|
|
- [MediaCodec Asynchronous Processing](https://developer.android.com/reference/android/media/MediaCodec#asynchronous-processing-using-buffers)
|
|||
|
|
- [AMediaCodec_setAsyncNotifyCallback](https://developer.android.com/ndk/reference/group/media#amediacodec_setasyncnotifycallback)
|
|||
|
|
|
|||
|
|
### Performance Analysis
|
|||
|
|
- NVDEC async decoding (Windows reference): PollingThread pattern
|
|||
|
|
- Expected gain: 1-3ms per frame (not measured, theoretical from pipelining)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Conclusion
|
|||
|
|
|
|||
|
|
**Recommendation**: Implement async decoding for 4x simultaneous 4K playback use case.
|
|||
|
|
|
|||
|
|
**Expected Outcome**:
|
|||
|
|
- Significant performance improvement for multi-instance scenarios
|
|||
|
|
- Minimal risk (infrastructure already exists)
|
|||
|
|
- Better resource utilization on high-end devices
|
|||
|
|
|
|||
|
|
**Next Steps**:
|
|||
|
|
1. Complete `ProcessAsyncOutputFrame()` implementation (Phase 1)
|
|||
|
|
2. Integrate async path in `DecodeToSurface()` (Phase 2)
|
|||
|
|
3. Add auto-detection logic (Phase 3)
|
|||
|
|
4. Test with 4x 4K videos (Phase 4)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*Document created by Claude Code*
|
|||
|
|
*Last updated: 2025-10-12*
|