11 KiB
11 KiB
MediaCodec Asynchronous Decoding Design
Document Information
- Created: 2025-10-12
- Status: Implementation Required
- Target Platform: Android (NDK 26)
- Use Case: Simultaneous 4K video playback (4 instances)
Problem Statement
Current Implementation (Synchronous Mode)
// MediaCodecAV1Decoder::DecodeToSurface (current)
bool ProcessInputBuffer(data, size) {
ssize_t index = AMediaCodec_dequeueInputBuffer(10000); // 10ms blocking
// ... copy data ...
AMediaCodec_queueInputBuffer(...);
}
bool ProcessOutputBuffer(VideoFrame& frame) {
AMediaCodecBufferInfo info;
ssize_t index = AMediaCodec_dequeueOutputBuffer(&info, 10000); // 10ms blocking
// ... process frame ...
}
Bottleneck for 4x Simultaneous 4K Playback:
- Each decoder thread blocks 10-20ms per frame on dequeue operations
- 4 threads × 10-20ms blocking = significant CPU idle time
- Thread contention increases frame drop probability
- Poor CPU utilization during blocking periods
Performance Impact (Estimated)
| Scenario | Sync Mode | Async Mode |
|---|---|---|
| Single 4K video | 30fps ✅ | 30fps ✅ |
| 4x 4K videos | 20-25fps ⚠️ | 28-30fps ✅ |
| CPU utilization | 40-50% (blocking) | 70-80% (event-driven) |
| Thread blocking | 10-20ms/frame | 0ms (callback) |
Asynchronous Mode Benefits
1. Reduced Thread Blocking
// Async mode: Non-blocking input
AMediaCodec_queueInputBuffer(...); // Returns immediately
// Output handled by callback (separate thread)
onAsyncOutputAvailable(index, bufferInfo) {
// Process frame in callback thread
// Push to queue for main thread consumption
}
2. Better CPU Utilization
- Sync mode: Thread sleeps during dequeue operations
- Async mode: Callbacks notify when frames ready, threads can do other work
3. Improved Pipeline Efficiency
Sync Mode:
Thread 1: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 2: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 3: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 4: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Total Blocking: 40ms per frame cycle
Async Mode:
Thread 1: [Queue Input] → [Continue]
Thread 2: [Queue Input] → [Continue]
Thread 3: [Queue Input] → [Continue]
Thread 4: [Queue Input] → [Continue]
Callback Threads: [Process outputs concurrently]
Total Blocking: 0ms
4. Memory Bandwidth Optimization
- 4K AV1 frame: ~12MB (3840×2160 YUV420)
- 4x instances: 48MB/frame × 30fps = 1.4GB/s bandwidth
- Async mode allows better bandwidth scheduling by hardware
Current Implementation Status
✅ Already Implemented
-
MediaCodecAsyncHandler - Complete implementation
- Location:
vav2/platforms/android/vavcore/src/Decoder/MediaCodecAsyncHandler.h/.cpp - Async callbacks:
onInputBufferAvailable,onAsyncOutputAvailable,onFormatChanged,onError - Frame queue management with mutex/condition_variable
- Thread-safe async frame data structure
- Location:
-
Static Callback Dispatchers
OnAsyncInputAvailable() OnAsyncOutputAvailable() OnAsyncFormatChanged() OnAsyncError() -
Async Frame Queue
struct AsyncFrameData { std::unique_ptr<VideoFrame> frame; int64_t timestamp_us; bool is_keyframe; // Placeholder for NDK 26 std::chrono::steady_clock::time_point decode_start_time; }; std::queue<AsyncFrameData> m_async_output_queue;
❌ Missing Implementation
-
DecodeToSurface does not use async path
- Current: Calls
ProcessInputBuffer()→ProcessOutputBuffer()(sync) - Required: Call
DecodeFrameAsync()when async mode enabled
- Current: Calls
-
ProcessAsyncOutputFrame incomplete
- Current: Placeholder implementation (line 236-256 in MediaCodecAsyncHandler.cpp)
- Required: Proper frame processing for Vulkan/ImageReader pipeline
-
Async Mode Activation
- Current:
InitializeAsyncMode()called but not actually used - Required: Enable async mode for multi-instance scenarios
- Current:
Implementation Plan
Phase 1: Complete ProcessAsyncOutputFrame (High Priority)
File: MediaCodecAsyncHandler.cpp:236-256
Current (Incomplete):
bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
int32_t output_index,
AMediaCodecBufferInfo* buffer_info,
VideoFrame& output_frame) {
// TODO: Process output buffer and fill VideoFrame
// For now, just release the buffer
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
return true;
}
Required Implementation:
bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
int32_t output_index,
AMediaCodecBufferInfo* buffer_info,
VideoFrame& output_frame) {
if (!m_codec || output_index < 0 || !buffer_info) {
return false;
}
// Step 1: Get MediaCodec output buffer
size_t buffer_size = 0;
uint8_t* output_buffer = AMediaCodec_getOutputBuffer(
m_codec, output_index, &buffer_size);
if (!output_buffer) {
LogError("Failed to get output buffer");
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
return false;
}
// Step 2: Fill VideoFrame metadata
output_frame.timestamp_us = buffer_info->presentationTimeUs;
output_frame.is_keyframe = false; // NDK 26 limitation
output_frame.surface_type = VAVCORE_SURFACE_ANDROID_HARDWARE_BUFFER;
// Step 3: Acquire AHardwareBuffer from ImageReader
// Delegate to MediaCodecSurfaceManager
AHardwareBuffer* ahb = m_decoder->GetSurfaceManager()->AcquireLatestImage();
if (!ahb) {
LogError("Failed to acquire AHardwareBuffer from ImageReader");
AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
return false;
}
// Step 4: Store AHardwareBuffer in VideoFrame
output_frame.ahardware_buffer = ahb;
// Step 5: Release MediaCodec buffer (render to ImageReader surface)
AMediaCodec_releaseOutputBuffer(m_codec, output_index, true); // render=true
return true;
}
Phase 2: Integrate Async Path in DecodeToSurface
File: MediaCodecAV1Decoder.cpp (DecodeToSurface method)
Add Mode Selection:
bool MediaCodecAV1Decoder::DecodeToSurface(
const uint8_t* packet_data,
size_t packet_size,
VavCoreSurfaceType target_type,
void* target_surface,
VideoFrame& output_frame) {
// Check if async mode enabled and beneficial
if (m_async_handler->IsAsyncModeEnabled()) {
return DecodeFrameAsync(packet_data, packet_size, output_frame);
}
// Fall back to sync mode (current implementation)
if (!ProcessInputBuffer(packet_data, packet_size)) {
return false;
}
return ProcessOutputBuffer(output_frame);
}
Phase 3: Add Multi-Instance Detection
File: MediaCodecAV1Decoder.cpp (Initialize method)
Auto-Enable Async for Multi-Instance:
bool MediaCodecAV1Decoder::Initialize(const VideoMetadata& metadata) {
// ... existing initialization ...
// Enable async mode for high-resolution or multi-instance scenarios
if (metadata.width >= 3840 || ShouldEnableAsyncMode()) {
if (m_async_handler->EnableAsyncMode(true)) {
LogInfo("Async decoding enabled for high-resolution video");
}
}
return FinalizeInitialization();
}
Phase 4: Testing
Test Cases:
- Single 4K video playback (async vs sync benchmark)
- 4x 4K videos simultaneously (target 28-30fps all instances)
- Memory bandwidth monitoring (adb logcat performance)
- Thread contention analysis (systrace)
API Design
User-Facing Configuration
// VavCore C API addition (optional)
VAVCORE_API void vavcore_enable_async_decoding(VavCoreDecoder* decoder, bool enable);
VAVCORE_API bool vavcore_is_async_enabled(VavCoreDecoder* decoder);
Internal Auto-Detection
// Auto-enable async for:
// 1. Resolution >= 4K (3840x2160)
// 2. Multiple decoder instances detected
// 3. High-end SoC (Snapdragon 8 Elite, Exynos 2400)
bool MediaCodecAV1Decoder::ShouldEnableAsyncMode() const {
// Check resolution
if (m_width >= 3840 && m_height >= 2160) {
return true;
}
// Check device capability (Samsung Galaxy S24, etc.)
std::string soc = GetSoCName();
if (soc.find("SM8650") != std::string::npos || // Snapdragon 8 Elite
soc.find("Exynos2400") != std::string::npos) {
return true;
}
return false;
}
Performance Expectations
Baseline (Current Sync Mode)
- Single 4K: 30fps ✅
- 4x 4K: 20-25fps ⚠️ (frame drops, stuttering)
Target (Async Mode)
- Single 4K: 30fps ✅ (same performance)
- 4x 4K: 28-30fps ✅ (smooth playback)
- CPU utilization: +20-30% improvement
- Thread blocking: -80% reduction
Hardware Requirements
- Minimum: Android 8.0 (API 26) with NDK 26
- Optimal: Snapdragon 8 Gen 2+ or Exynos 2300+
- Memory: Sufficient bandwidth for 1.4GB/s (4x 4K)
Risk Analysis
Low Risk
- ✅ MediaCodecAsyncHandler already implemented
- ✅ No NDK version upgrade required (stays at NDK 26)
- ✅ Keyframe detection not needed (WebM provides it)
Medium Risk
- ⚠️ Thread synchronization complexity (mitigated by existing queue implementation)
- ⚠️ Memory bandwidth saturation on mid-range devices
Mitigation Strategies
- Fallback to Sync: If async initialization fails, use sync mode
- Progressive Rollout: Enable async only for high-end devices initially
- Performance Monitoring: Add metrics to detect frame drops
References
Implementation Files
- MediaCodecAsyncHandler.h/.cpp: Async callback management
- MediaCodecAV1Decoder.h/.cpp: Main decoder integration
- MediaCodecSurfaceManager.h/.cpp: ImageReader/AHardwareBuffer handling
Android Documentation
Performance Analysis
- NVDEC async decoding (Windows reference): PollingThread pattern
- Expected gain: 1-3ms per frame (not measured, theoretical from pipelining)
Conclusion
Recommendation: Implement async decoding for 4x simultaneous 4K playback use case.
Expected Outcome:
- Significant performance improvement for multi-instance scenarios
- Minimal risk (infrastructure already exists)
- Better resource utilization on high-end devices
Next Steps:
- Complete
ProcessAsyncOutputFrame()implementation (Phase 1) - Integrate async path in
DecodeToSurface()(Phase 2) - Add auto-detection logic (Phase 3)
- Test with 4x 4K videos (Phase 4)
Document created by Claude Code Last updated: 2025-10-12