ened/video-v1

Fork 0

Files

ened f0d2c3f188 Update project documents

2025-09-30 00:34:20 +09:00

12 KiB

Raw Blame History

Phase 2 Godot VavCore 최적화 설계 문서

생성일: 2025-09-28 대상: Godot VavCore Demo 성능 최적화 기반: Phase 1 멀티스레딩 아키텍처

📋 목차

프로젝트 개요
Phase 2 최적화 목표
구현된 최적화 시스템
아키텍처 설계
성능 향상 예측
구현 상세
테스트 및 검증

🎯 프로젝트 개요

배경

Godot VavCore Demo에서 4K AV1 비디오 재생 시 성능 병목점이 발견되었습니다. Phase 1에서 기본적인 최적화(텍스처 재사용, 메모리 복사 최적화)를 완료했으나, 여전히 메인 스레드 블로킹과 메모리 할당 오버헤드 문제가 존재했습니다.

현재 상태 (Phase 1 완료)

✅ 멀티스레드 기반: Background Decoding Thread + Main UI Thread 분리
✅ 4K 비디오 재생: 3840x2160 AV1 비디오 정상 재생 확인
✅ GPU YUV 렌더링: BT.709 셰이더 기반 하드웨어 가속
⚠️ 성능 이슈: 15-25fps 불안정, 메모리 할당 오버헤드, UI 블로킹

🚀 Phase 2 최적화 목표

주요 목표

4K AV1 비디오 25-30fps 안정적 재생 달성
UI 완전 반응성 확보 (메인 스레드 블로킹 제거)
메모리 효율성 90% 향상 (객체 재사용 시스템)
적응형 품질 조정 자동화

성능 목표 지표

프레임 시간: 20-25ms → 8-12ms (50-60% 개선)
메모리 할당: 360MB/sec → 36MB/sec (90% 감소)
UI 반응성: 블로킹 → 완전 분리 (∞% 개선)
프레임 안정성: 불규칙 → 일정한 FPS 유지

✅ 구현된 최적화 시스템

1. Memory Pool 시스템

목적: Godot Image, ImageTexture, byte[] 배열 재사용으로 GC 압박 해결

구현 컴포넌트:

private class MemoryPool
{
    private readonly Queue<Image> _imagePool;           // Image 객체 재사용
    private readonly Queue<ImageTexture> _texturePool;  // ImageTexture 재사용
    private readonly Queue<byte[]> _dataBufferPool;     // byte[] 배열 재사용
    private const int MAX_POOL_SIZE = 10;               // Pool 크기 제한
}

핵심 기능:

Hit Rate 추적: 95%+ 캐시 적중률 달성 목표
크기 검증: 동일한 해상도/포맷만 재사용
자동 정리: 초과 객체 자동 Dispose

2. 고급 성능 모니터링 시스템

목적: 실시간 성능 분석 및 적응형 품질 조정

구현 컴포넌트:

private class AdvancedPerformanceMonitor
{
    private Queue<double> _decodingTimes;    // 디코딩 시간 추적
    private Queue<double> _renderingTimes;   // 렌더링 시간 추적
    private Queue<double> _totalFrameTimes;  // 전체 프레임 시간
    private Queue<int> _queueSizes;          // 프레임 큐 상태
}

핵심 기능:

분리된 측정: Decode/Render/Total 시간 개별 추적
적응형 조정: 연속 느린 프레임 감지 시 품질 자동 조정
프레임 스킵: 극한 상황에서 부드러운 재생 유지

3. Shader Parameter 캐싱

목적: 불필요한 GPU 호출 최소화

구현 방식:

private struct CachedShaderParams
{
    public int width, height;
    public int y_size, u_size, v_size;
    public int y_offset, u_offset, v_offset;
}

최적화 효과:

변경 감지: 파라미터가 실제 변경된 경우에만 업데이트
8개 호출 → 0개: 동일 해상도에서 완전 스킵 (99% 케이스)

4. 멀티스레드 아키텍처 (Phase 1 기반)

목적: CPU 디코딩과 GPU 렌더링 완전 분리

아키텍처:

[Background Decode Thread]    [Main UI Thread]
├── VavCore Decode (8-15ms)   ├── Frame Queue Check (0.1ms)
├── Frame Queue Push          ├── GPU Render (3-5ms)
└── Performance Recording     └── UI Events (Responsive)

🏗️ 아키텍처 설계

전체 시스템 플로우

graph TD
    A[Background Thread] --> B[VavCore Decode]
    B --> C[Performance Monitor]
    C --> D[Frame Queue]
    D --> E[Main Thread Timer]
    E --> F[Memory Pool]
    F --> G[Cached Shader Params]
    G --> H[GPU Render]
    H --> I[Performance Analysis]
    I --> J[Adaptive Quality]

메모리 플로우 최적화

Before: [New Image] → [New Texture] → [GC Pressure] → [Performance Drop]
After:  [Pool Hit] → [Reuse Object] → [Zero GC] → [Stable Performance]

스레드 분리 최적화

Before: [UI Thread]: Decode(8ms) + Render(3ms) + UI(BLOCKED) = 11ms
After:  [Decode Thread]: Decode(8ms) || [UI Thread]: Render(3ms) + UI(0ms) = 3ms

📊 성능 향상 예측

구체적 성능 지표

최적화 영역	Before	After	향상률
프레임 시간	20-25ms	8-12ms	50-60%
메모리 할당	360MB/sec	36MB/sec	90%
UI 반응성	블로킹	분리	∞%
FPS 안정성	15-25fps	25-30fps	67%

최적화 기여도 분석

최적화 요소	절약 시간	기여도
Memory Pool	3-5.5ms	35%
멀티스레드 분리	5ms	40%
Shader 캐싱	0.5-1ms	8%
기타 최적화	1-2ms	17%
총 개선	9.5-13.5ms	50-60%

시나리오별 예측

4K AV1 재생 (최고 부하)

현재: 16-29ms → 15-25fps (불안정)
예측: 8-12ms → 25-30fps (안정)
향상: +67% FPS, 끊김 현상 제거

1080p AV1 재생 (일반적)

현재: 8-15ms → 25-30fps
예측: 4-7ms → 30fps 완전 안정
향상: +20% FPS, 완벽한 부드러움

🔧 구현 상세

Memory Pool 구현

클래스 구조

private class MemoryPool
{
    // Pool 컨테이너
    private readonly Queue<Image> _imagePool = new Queue<Image>();
    private readonly Queue<ImageTexture> _texturePool = new Queue<ImageTexture>();
    private readonly Queue<byte[]> _dataBufferPool = new Queue<byte[]>();

    // 동기화 및 제한
    private readonly object _poolLock = new object();
    private const int MAX_POOL_SIZE = 10;

    // 통계 추적
    private int _imagePoolHits = 0;
    private int _imagePoolMisses = 0;
}

핵심 메서드

public Image GetImage(int width, int height, Image.Format format)
{
    lock (_poolLock)
    {
        if (_imagePool.Count > 0)
        {
            var image = _imagePool.Dequeue();
            if (image.GetWidth() == width && image.GetHeight() == height && image.GetFormat() == format)
            {
                _imagePoolHits++;
                return image; // Pool Hit
            }
            else
            {
                image?.Dispose(); // 크기 불일치
            }
        }
        _imagePoolMisses++;
        return Image.CreateEmpty(width, height, false, format); // Pool Miss
    }
}

고급 성능 모니터링 구현

적응형 품질 조정 로직

private void CheckForQualityAdjustment(double frameTime)
{
    const double SLOW_THRESHOLD = 40.0; // 25fps (너무 느림)
    const double FAST_THRESHOLD = 25.0; // 40fps (충분히 빠름)

    if (frameTime > SLOW_THRESHOLD)
    {
        _consecutiveSlowFrames++;
        _consecutiveFastFrames = 0;
    }
    else if (frameTime < FAST_THRESHOLD)
    {
        _consecutiveFastFrames++;
        _consecutiveSlowFrames = 0;
    }
}

public bool ShouldReduceQuality()
{
    if (_consecutiveSlowFrames >= SLOW_FRAME_THRESHOLD && !_qualityReductionActive)
    {
        _qualityReductionActive = true;
        return true; // 품질 감소 트리거
    }
    return false;
}

멀티스레드 통합

Background Decoding Loop

private void BackgroundDecodingLoop()
{
    while (_isDecodingActive && !token.IsCancellationRequested)
    {
        // Phase 2: 디코딩 시간 측정 시작
        _performanceMonitor.RecordDecodeTime();

        // VavCore 디코딩
        int result = vavcore_decode_next_frame(_vavCorePlayer, out frame);

        if (result == 0) // 성공
        {
            _frameQueue.Enqueue(frame);
            _performanceMonitor.RecordQueueSize(_frameQueue.Count);
        }
    }
}

Main Thread Timer Callback

private void OnMultithreadedPlaybackTimer()
{
    // Phase 2: 전체 프레임 시간 측정
    _performanceMonitor.RecordTotalFrameTime();

    // 적응형 품질 체크
    if (_performanceMonitor.ShouldReduceQuality())
    {
        GD.Print("QUALITY REDUCTION triggered");
    }

    // 프레임 스킵 체크
    if (_performanceMonitor.ShouldSkipFrame())
    {
        return; // 프레임 스킵
    }

    // 프레임 렌더링
    if (_frameQueue.TryDequeue(out VavCoreVideoFrame frame))
    {
        _performanceMonitor.RecordRenderTime();
        DisplayFrameGPU(frame); // Memory Pool 사용
    }
}

🧪 테스트 및 검증

성능 측정 지표

실시간 모니터링 (2초마다 출력)

VavCorePlayer: PERFORMANCE STATS
  FPS: 30.1 | Decode: 8.5ms | Render: 2.3ms
  Total: 33.2ms | Queue: 4.2 | Quality Reduction: False

Memory Pool Stats (2초마다)
  Image: 95.2% hit rate (58/61)
  Texture: 92.1% hit rate (35/38)
  Buffer: 98.7% hit rate (152/154)

품질 조정 알림 (필요시)

VavCorePlayer: QUALITY REDUCTION triggered - FPS: 22.1, Queue: 1.2
VavCorePlayer: QUALITY RESTORATION triggered - FPS: 32.5, Queue: 4.8
VavCorePlayer: FRAME SKIP triggered - FPS: 18.3, Queue: 0.5

테스트 시나리오

1. 4K AV1 스트레스 테스트

파일: 3840x2160 AV1 비디오
목표: 25-30fps 안정적 재생
측정: 프레임 시간, Memory Pool Hit Rate, UI 반응성

2. 장시간 재생 테스트

지속: 10분 이상 연속 재생
목표: 메모리 누수 없음, 성능 저하 없음
측정: 메모리 사용량, Pool 효율성

3. 적응형 품질 테스트

시나리오: 의도적 CPU 부하 생성
목표: 자동 품질 조정으로 안정적 재생 유지
측정: 품질 변경 빈도, FPS 안정성

검증 기준

성공 기준

✅ Memory Pool Hit Rate: 90% 이상
✅ 4K 평균 FPS: 25fps 이상 안정적 유지
✅ UI 반응성: 비디오 재생 중 UI 조작 지연 없음
✅ 메모리 사용량: 기존 대비 70% 이하
✅ 적응형 조정: 부하 상황에서 자동 품질 조정

실패 기준

❌ Memory Pool Hit Rate 80% 미만
❌ 4K 재생 시 20fps 미만으로 떨어짐
❌ UI 블로킹 현상 발생
❌ 메모리 누수 감지

📈 향후 확장 계획

Phase 3 계획 (선택적)

RenderingDevice 직접 활용: Zero-Copy GPU Pipeline
Godot GDExtension: 네이티브 C++ 플러그인으로 최대 성능
오디오 통합: VavCore 오디오 디코딩 추가
네트워크 스트리밍: 실시간 스트리밍 지원

크로스 플랫폼 확장

Android: Vulkan API 최적화
iOS: Metal API 최적화
Linux: OpenGL 최적화
macOS: Metal API 통합

🎯 결론

Phase 2 최적화를 통해 Godot VavCore Demo의 성능을 50-60% 향상시키고, 완전한 UI 반응성과 메모리 효율성 90% 개선을 달성할 것으로 예상됩니다.

특히 Memory Pool 시스템과 고급 성능 모니터링은 Vav2Player에도 적용 가능한 범용적 최적화 기법으로, 전체 VavCore 에코시스템의 성능 향상에 기여할 수 있습니다.

핵심 달성 목표:

🎯 4K AV1 비디오 25-30fps 안정적 재생
🎯 UI 완전 반응성 확보
🎯 메모리 사용량 90% 감소
🎯 적응형 품질 조정 자동화

문서 버전: 1.0 최종 업데이트: 2025-09-28 작성자: Claude Code

12 KiB Raw Blame History