# MediaCodec Asynchronous Decoding Design ## Document Information - **Created**: 2025-10-12 - **Status**: Implementation Required - **Target Platform**: Android (NDK 26) - **Use Case**: Simultaneous 4K video playback (4 instances) --- ## Problem Statement ### Current Implementation (Synchronous Mode) ```cpp // MediaCodecAV1Decoder::DecodeToSurface (current) bool ProcessInputBuffer(data, size) { ssize_t index = AMediaCodec_dequeueInputBuffer(10000); // 10ms blocking // ... copy data ... AMediaCodec_queueInputBuffer(...); } bool ProcessOutputBuffer(VideoFrame& frame) { AMediaCodecBufferInfo info; ssize_t index = AMediaCodec_dequeueOutputBuffer(&info, 10000); // 10ms blocking // ... process frame ... } ``` **Bottleneck for 4x Simultaneous 4K Playback:** - Each decoder thread blocks 10-20ms per frame on dequeue operations - 4 threads × 10-20ms blocking = significant CPU idle time - Thread contention increases frame drop probability - Poor CPU utilization during blocking periods ### Performance Impact (Estimated) | Scenario | Sync Mode | Async Mode | |----------|-----------|------------| | Single 4K video | 30fps ✅ | 30fps ✅ | | 4x 4K videos | 20-25fps ⚠️ | 28-30fps ✅ | | CPU utilization | 40-50% (blocking) | 70-80% (event-driven) | | Thread blocking | 10-20ms/frame | 0ms (callback) | --- ## Asynchronous Mode Benefits ### 1. Reduced Thread Blocking ```cpp // Async mode: Non-blocking input AMediaCodec_queueInputBuffer(...); // Returns immediately // Output handled by callback (separate thread) onAsyncOutputAvailable(index, bufferInfo) { // Process frame in callback thread // Push to queue for main thread consumption } ``` ### 2. Better CPU Utilization - **Sync mode**: Thread sleeps during dequeue operations - **Async mode**: Callbacks notify when frames ready, threads can do other work ### 3. Improved Pipeline Efficiency ``` Sync Mode: Thread 1: [Block 10ms] → [Process 5ms] → [Block 10ms] → ... Thread 2: [Block 10ms] → [Process 5ms] → [Block 10ms] → ... Thread 3: [Block 10ms] → [Process 5ms] → [Block 10ms] → ... Thread 4: [Block 10ms] → [Process 5ms] → [Block 10ms] → ... Total Blocking: 40ms per frame cycle Async Mode: Thread 1: [Queue Input] → [Continue] Thread 2: [Queue Input] → [Continue] Thread 3: [Queue Input] → [Continue] Thread 4: [Queue Input] → [Continue] Callback Threads: [Process outputs concurrently] Total Blocking: 0ms ``` ### 4. Memory Bandwidth Optimization - 4K AV1 frame: ~12MB (3840×2160 YUV420) - 4x instances: 48MB/frame × 30fps = **1.4GB/s bandwidth** - Async mode allows better bandwidth scheduling by hardware --- ## Current Implementation Status ### ✅ Already Implemented 1. **MediaCodecAsyncHandler** - Complete implementation - Location: `vav2/platforms/android/vavcore/src/Decoder/MediaCodecAsyncHandler.h/.cpp` - Async callbacks: `onInputBufferAvailable`, `onAsyncOutputAvailable`, `onFormatChanged`, `onError` - Frame queue management with mutex/condition_variable - Thread-safe async frame data structure 2. **Static Callback Dispatchers** ```cpp OnAsyncInputAvailable() OnAsyncOutputAvailable() OnAsyncFormatChanged() OnAsyncError() ``` 3. **Async Frame Queue** ```cpp struct AsyncFrameData { std::unique_ptr frame; int64_t timestamp_us; bool is_keyframe; // Placeholder for NDK 26 std::chrono::steady_clock::time_point decode_start_time; }; std::queue m_async_output_queue; ``` ### ❌ Missing Implementation 1. **DecodeToSurface** does not use async path - Current: Calls `ProcessInputBuffer()` → `ProcessOutputBuffer()` (sync) - Required: Call `DecodeFrameAsync()` when async mode enabled 2. **ProcessAsyncOutputFrame** incomplete - Current: Placeholder implementation (line 236-256 in MediaCodecAsyncHandler.cpp) - Required: Proper frame processing for Vulkan/ImageReader pipeline 3. **Async Mode Activation** - Current: `InitializeAsyncMode()` called but not actually used - Required: Enable async mode for multi-instance scenarios --- ## Implementation Plan ### Phase 1: Complete ProcessAsyncOutputFrame (High Priority) **File**: `MediaCodecAsyncHandler.cpp:236-256` **Current (Incomplete)**: ```cpp bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame( int32_t output_index, AMediaCodecBufferInfo* buffer_info, VideoFrame& output_frame) { // TODO: Process output buffer and fill VideoFrame // For now, just release the buffer AMediaCodec_releaseOutputBuffer(m_codec, output_index, false); return true; } ``` **Required Implementation**: ```cpp bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame( int32_t output_index, AMediaCodecBufferInfo* buffer_info, VideoFrame& output_frame) { if (!m_codec || output_index < 0 || !buffer_info) { return false; } // Step 1: Get MediaCodec output buffer size_t buffer_size = 0; uint8_t* output_buffer = AMediaCodec_getOutputBuffer( m_codec, output_index, &buffer_size); if (!output_buffer) { LogError("Failed to get output buffer"); AMediaCodec_releaseOutputBuffer(m_codec, output_index, false); return false; } // Step 2: Fill VideoFrame metadata output_frame.timestamp_us = buffer_info->presentationTimeUs; output_frame.is_keyframe = false; // NDK 26 limitation output_frame.surface_type = VAVCORE_SURFACE_ANDROID_HARDWARE_BUFFER; // Step 3: Acquire AHardwareBuffer from ImageReader // Delegate to MediaCodecSurfaceManager AHardwareBuffer* ahb = m_decoder->GetSurfaceManager()->AcquireLatestImage(); if (!ahb) { LogError("Failed to acquire AHardwareBuffer from ImageReader"); AMediaCodec_releaseOutputBuffer(m_codec, output_index, false); return false; } // Step 4: Store AHardwareBuffer in VideoFrame output_frame.ahardware_buffer = ahb; // Step 5: Release MediaCodec buffer (render to ImageReader surface) AMediaCodec_releaseOutputBuffer(m_codec, output_index, true); // render=true return true; } ``` ### Phase 2: Integrate Async Path in DecodeToSurface **File**: `MediaCodecAV1Decoder.cpp` (DecodeToSurface method) **Add Mode Selection**: ```cpp bool MediaCodecAV1Decoder::DecodeToSurface( const uint8_t* packet_data, size_t packet_size, VavCoreSurfaceType target_type, void* target_surface, VideoFrame& output_frame) { // Check if async mode enabled and beneficial if (m_async_handler->IsAsyncModeEnabled()) { return DecodeFrameAsync(packet_data, packet_size, output_frame); } // Fall back to sync mode (current implementation) if (!ProcessInputBuffer(packet_data, packet_size)) { return false; } return ProcessOutputBuffer(output_frame); } ``` ### Phase 3: Add Multi-Instance Detection **File**: `MediaCodecAV1Decoder.cpp` (Initialize method) **Auto-Enable Async for Multi-Instance**: ```cpp bool MediaCodecAV1Decoder::Initialize(const VideoMetadata& metadata) { // ... existing initialization ... // Enable async mode for high-resolution or multi-instance scenarios if (metadata.width >= 3840 || ShouldEnableAsyncMode()) { if (m_async_handler->EnableAsyncMode(true)) { LogInfo("Async decoding enabled for high-resolution video"); } } return FinalizeInitialization(); } ``` ### Phase 4: Testing **Test Cases**: 1. Single 4K video playback (async vs sync benchmark) 2. 4x 4K videos simultaneously (target 28-30fps all instances) 3. Memory bandwidth monitoring (adb logcat performance) 4. Thread contention analysis (systrace) --- ## API Design ### User-Facing Configuration ```cpp // VavCore C API addition (optional) VAVCORE_API void vavcore_enable_async_decoding(VavCoreDecoder* decoder, bool enable); VAVCORE_API bool vavcore_is_async_enabled(VavCoreDecoder* decoder); ``` ### Internal Auto-Detection ```cpp // Auto-enable async for: // 1. Resolution >= 4K (3840x2160) // 2. Multiple decoder instances detected // 3. High-end SoC (Snapdragon 8 Elite, Exynos 2400) bool MediaCodecAV1Decoder::ShouldEnableAsyncMode() const { // Check resolution if (m_width >= 3840 && m_height >= 2160) { return true; } // Check device capability (Samsung Galaxy S24, etc.) std::string soc = GetSoCName(); if (soc.find("SM8650") != std::string::npos || // Snapdragon 8 Elite soc.find("Exynos2400") != std::string::npos) { return true; } return false; } ``` --- ## Performance Expectations ### Baseline (Current Sync Mode) - Single 4K: 30fps ✅ - 4x 4K: 20-25fps ⚠️ (frame drops, stuttering) ### Target (Async Mode) - Single 4K: 30fps ✅ (same performance) - 4x 4K: 28-30fps ✅ (smooth playback) - CPU utilization: +20-30% improvement - Thread blocking: -80% reduction ### Hardware Requirements - **Minimum**: Android 8.0 (API 26) with NDK 26 - **Optimal**: Snapdragon 8 Gen 2+ or Exynos 2300+ - **Memory**: Sufficient bandwidth for 1.4GB/s (4x 4K) --- ## Risk Analysis ### Low Risk - ✅ MediaCodecAsyncHandler already implemented - ✅ No NDK version upgrade required (stays at NDK 26) - ✅ Keyframe detection not needed (WebM provides it) ### Medium Risk - ⚠️ Thread synchronization complexity (mitigated by existing queue implementation) - ⚠️ Memory bandwidth saturation on mid-range devices ### Mitigation Strategies 1. **Fallback to Sync**: If async initialization fails, use sync mode 2. **Progressive Rollout**: Enable async only for high-end devices initially 3. **Performance Monitoring**: Add metrics to detect frame drops --- ## References ### Implementation Files - **MediaCodecAsyncHandler.h/.cpp**: Async callback management - **MediaCodecAV1Decoder.h/.cpp**: Main decoder integration - **MediaCodecSurfaceManager.h/.cpp**: ImageReader/AHardwareBuffer handling ### Android Documentation - [MediaCodec Asynchronous Processing](https://developer.android.com/reference/android/media/MediaCodec#asynchronous-processing-using-buffers) - [AMediaCodec_setAsyncNotifyCallback](https://developer.android.com/ndk/reference/group/media#amediacodec_setasyncnotifycallback) ### Performance Analysis - NVDEC async decoding (Windows reference): PollingThread pattern - Expected gain: 1-3ms per frame (not measured, theoretical from pipelining) --- ## Conclusion **Recommendation**: Implement async decoding for 4x simultaneous 4K playback use case. **Expected Outcome**: - Significant performance improvement for multi-instance scenarios - Minimal risk (infrastructure already exists) - Better resource utilization on high-end devices **Next Steps**: 1. Complete `ProcessAsyncOutputFrame()` implementation (Phase 1) 2. Integrate async path in `DecodeToSurface()` (Phase 2) 3. Add auto-detection logic (Phase 3) 4. Test with 4x 4K videos (Phase 4) --- *Document created by Claude Code* *Last updated: 2025-10-12*