Files
video-v1/vav2/docs/working/MediaCodec_Async_Decoding_Design.md

11 KiB
Raw Permalink Blame History

MediaCodec Asynchronous Decoding Design

Document Information

  • Created: 2025-10-12
  • Status: Implementation Required
  • Target Platform: Android (NDK 26)
  • Use Case: Simultaneous 4K video playback (4 instances)

Problem Statement

Current Implementation (Synchronous Mode)

// MediaCodecAV1Decoder::DecodeToSurface (current)
bool ProcessInputBuffer(data, size) {
    ssize_t index = AMediaCodec_dequeueInputBuffer(10000);  // 10ms blocking
    // ... copy data ...
    AMediaCodec_queueInputBuffer(...);
}

bool ProcessOutputBuffer(VideoFrame& frame) {
    AMediaCodecBufferInfo info;
    ssize_t index = AMediaCodec_dequeueOutputBuffer(&info, 10000);  // 10ms blocking
    // ... process frame ...
}

Bottleneck for 4x Simultaneous 4K Playback:

  • Each decoder thread blocks 10-20ms per frame on dequeue operations
  • 4 threads × 10-20ms blocking = significant CPU idle time
  • Thread contention increases frame drop probability
  • Poor CPU utilization during blocking periods

Performance Impact (Estimated)

Scenario Sync Mode Async Mode
Single 4K video 30fps 30fps
4x 4K videos 20-25fps ⚠️ 28-30fps
CPU utilization 40-50% (blocking) 70-80% (event-driven)
Thread blocking 10-20ms/frame 0ms (callback)

Asynchronous Mode Benefits

1. Reduced Thread Blocking

// Async mode: Non-blocking input
AMediaCodec_queueInputBuffer(...);  // Returns immediately

// Output handled by callback (separate thread)
onAsyncOutputAvailable(index, bufferInfo) {
    // Process frame in callback thread
    // Push to queue for main thread consumption
}

2. Better CPU Utilization

  • Sync mode: Thread sleeps during dequeue operations
  • Async mode: Callbacks notify when frames ready, threads can do other work

3. Improved Pipeline Efficiency

Sync Mode:
Thread 1: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 2: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 3: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Thread 4: [Block 10ms] → [Process 5ms] → [Block 10ms] → ...
Total Blocking: 40ms per frame cycle

Async Mode:
Thread 1: [Queue Input] → [Continue]
Thread 2: [Queue Input] → [Continue]
Thread 3: [Queue Input] → [Continue]
Thread 4: [Queue Input] → [Continue]
Callback Threads: [Process outputs concurrently]
Total Blocking: 0ms

4. Memory Bandwidth Optimization

  • 4K AV1 frame: ~12MB (3840×2160 YUV420)
  • 4x instances: 48MB/frame × 30fps = 1.4GB/s bandwidth
  • Async mode allows better bandwidth scheduling by hardware

Current Implementation Status

Already Implemented

  1. MediaCodecAsyncHandler - Complete implementation

    • Location: vav2/platforms/android/vavcore/src/Decoder/MediaCodecAsyncHandler.h/.cpp
    • Async callbacks: onInputBufferAvailable, onAsyncOutputAvailable, onFormatChanged, onError
    • Frame queue management with mutex/condition_variable
    • Thread-safe async frame data structure
  2. Static Callback Dispatchers

    OnAsyncInputAvailable()
    OnAsyncOutputAvailable()
    OnAsyncFormatChanged()
    OnAsyncError()
    
  3. Async Frame Queue

    struct AsyncFrameData {
        std::unique_ptr<VideoFrame> frame;
        int64_t timestamp_us;
        bool is_keyframe;  // Placeholder for NDK 26
        std::chrono::steady_clock::time_point decode_start_time;
    };
    std::queue<AsyncFrameData> m_async_output_queue;
    

Missing Implementation

  1. DecodeToSurface does not use async path

    • Current: Calls ProcessInputBuffer()ProcessOutputBuffer() (sync)
    • Required: Call DecodeFrameAsync() when async mode enabled
  2. ProcessAsyncOutputFrame incomplete

    • Current: Placeholder implementation (line 236-256 in MediaCodecAsyncHandler.cpp)
    • Required: Proper frame processing for Vulkan/ImageReader pipeline
  3. Async Mode Activation

    • Current: InitializeAsyncMode() called but not actually used
    • Required: Enable async mode for multi-instance scenarios

Implementation Plan

Phase 1: Complete ProcessAsyncOutputFrame (High Priority)

File: MediaCodecAsyncHandler.cpp:236-256

Current (Incomplete):

bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
    int32_t output_index,
    AMediaCodecBufferInfo* buffer_info,
    VideoFrame& output_frame) {

    // TODO: Process output buffer and fill VideoFrame
    // For now, just release the buffer

    AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
    return true;
}

Required Implementation:

bool MediaCodecAsyncHandler::ProcessAsyncOutputFrame(
    int32_t output_index,
    AMediaCodecBufferInfo* buffer_info,
    VideoFrame& output_frame) {

    if (!m_codec || output_index < 0 || !buffer_info) {
        return false;
    }

    // Step 1: Get MediaCodec output buffer
    size_t buffer_size = 0;
    uint8_t* output_buffer = AMediaCodec_getOutputBuffer(
        m_codec, output_index, &buffer_size);

    if (!output_buffer) {
        LogError("Failed to get output buffer");
        AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
        return false;
    }

    // Step 2: Fill VideoFrame metadata
    output_frame.timestamp_us = buffer_info->presentationTimeUs;
    output_frame.is_keyframe = false;  // NDK 26 limitation
    output_frame.surface_type = VAVCORE_SURFACE_ANDROID_HARDWARE_BUFFER;

    // Step 3: Acquire AHardwareBuffer from ImageReader
    // Delegate to MediaCodecSurfaceManager
    AHardwareBuffer* ahb = m_decoder->GetSurfaceManager()->AcquireLatestImage();
    if (!ahb) {
        LogError("Failed to acquire AHardwareBuffer from ImageReader");
        AMediaCodec_releaseOutputBuffer(m_codec, output_index, false);
        return false;
    }

    // Step 4: Store AHardwareBuffer in VideoFrame
    output_frame.ahardware_buffer = ahb;

    // Step 5: Release MediaCodec buffer (render to ImageReader surface)
    AMediaCodec_releaseOutputBuffer(m_codec, output_index, true);  // render=true

    return true;
}

Phase 2: Integrate Async Path in DecodeToSurface

File: MediaCodecAV1Decoder.cpp (DecodeToSurface method)

Add Mode Selection:

bool MediaCodecAV1Decoder::DecodeToSurface(
    const uint8_t* packet_data,
    size_t packet_size,
    VavCoreSurfaceType target_type,
    void* target_surface,
    VideoFrame& output_frame) {

    // Check if async mode enabled and beneficial
    if (m_async_handler->IsAsyncModeEnabled()) {
        return DecodeFrameAsync(packet_data, packet_size, output_frame);
    }

    // Fall back to sync mode (current implementation)
    if (!ProcessInputBuffer(packet_data, packet_size)) {
        return false;
    }
    return ProcessOutputBuffer(output_frame);
}

Phase 3: Add Multi-Instance Detection

File: MediaCodecAV1Decoder.cpp (Initialize method)

Auto-Enable Async for Multi-Instance:

bool MediaCodecAV1Decoder::Initialize(const VideoMetadata& metadata) {
    // ... existing initialization ...

    // Enable async mode for high-resolution or multi-instance scenarios
    if (metadata.width >= 3840 || ShouldEnableAsyncMode()) {
        if (m_async_handler->EnableAsyncMode(true)) {
            LogInfo("Async decoding enabled for high-resolution video");
        }
    }

    return FinalizeInitialization();
}

Phase 4: Testing

Test Cases:

  1. Single 4K video playback (async vs sync benchmark)
  2. 4x 4K videos simultaneously (target 28-30fps all instances)
  3. Memory bandwidth monitoring (adb logcat performance)
  4. Thread contention analysis (systrace)

API Design

User-Facing Configuration

// VavCore C API addition (optional)
VAVCORE_API void vavcore_enable_async_decoding(VavCoreDecoder* decoder, bool enable);
VAVCORE_API bool vavcore_is_async_enabled(VavCoreDecoder* decoder);

Internal Auto-Detection

// Auto-enable async for:
// 1. Resolution >= 4K (3840x2160)
// 2. Multiple decoder instances detected
// 3. High-end SoC (Snapdragon 8 Elite, Exynos 2400)

bool MediaCodecAV1Decoder::ShouldEnableAsyncMode() const {
    // Check resolution
    if (m_width >= 3840 && m_height >= 2160) {
        return true;
    }

    // Check device capability (Samsung Galaxy S24, etc.)
    std::string soc = GetSoCName();
    if (soc.find("SM8650") != std::string::npos ||  // Snapdragon 8 Elite
        soc.find("Exynos2400") != std::string::npos) {
        return true;
    }

    return false;
}

Performance Expectations

Baseline (Current Sync Mode)

  • Single 4K: 30fps
  • 4x 4K: 20-25fps ⚠️ (frame drops, stuttering)

Target (Async Mode)

  • Single 4K: 30fps (same performance)
  • 4x 4K: 28-30fps (smooth playback)
  • CPU utilization: +20-30% improvement
  • Thread blocking: -80% reduction

Hardware Requirements

  • Minimum: Android 8.0 (API 26) with NDK 26
  • Optimal: Snapdragon 8 Gen 2+ or Exynos 2300+
  • Memory: Sufficient bandwidth for 1.4GB/s (4x 4K)

Risk Analysis

Low Risk

  • MediaCodecAsyncHandler already implemented
  • No NDK version upgrade required (stays at NDK 26)
  • Keyframe detection not needed (WebM provides it)

Medium Risk

  • ⚠️ Thread synchronization complexity (mitigated by existing queue implementation)
  • ⚠️ Memory bandwidth saturation on mid-range devices

Mitigation Strategies

  1. Fallback to Sync: If async initialization fails, use sync mode
  2. Progressive Rollout: Enable async only for high-end devices initially
  3. Performance Monitoring: Add metrics to detect frame drops

References

Implementation Files

  • MediaCodecAsyncHandler.h/.cpp: Async callback management
  • MediaCodecAV1Decoder.h/.cpp: Main decoder integration
  • MediaCodecSurfaceManager.h/.cpp: ImageReader/AHardwareBuffer handling

Android Documentation

Performance Analysis

  • NVDEC async decoding (Windows reference): PollingThread pattern
  • Expected gain: 1-3ms per frame (not measured, theoretical from pipelining)

Conclusion

Recommendation: Implement async decoding for 4x simultaneous 4K playback use case.

Expected Outcome:

  • Significant performance improvement for multi-instance scenarios
  • Minimal risk (infrastructure already exists)
  • Better resource utilization on high-end devices

Next Steps:

  1. Complete ProcessAsyncOutputFrame() implementation (Phase 1)
  2. Integrate async path in DecodeToSurface() (Phase 2)
  3. Add auto-detection logic (Phase 3)
  4. Test with 4x 4K videos (Phase 4)

Document created by Claude Code Last updated: 2025-10-12