vav2/D3D_Surface_Direct_Decoding_Design.md

# D3D Surface 직접 디코딩 아키텍처 설계

## 개요

이 문서는 VavCore에서 D3D surface 직접 디코딩을 구현하여 CPU 메모리 복사를 제거하고 고성능 GPU-to-GPU 렌더링을 가능하게 하는 아키텍처를 제시합니다. 이 설계는 CPU 전용 디코딩과의 호환성을 유지하면서 모든 주요 하드웨어 가속 SDK를 지원합니다.

## 현재 상태 분석

### VavCoreVideoFrame (CPU 전용)
```c
typedef struct {
    uint8_t* y_plane;      // Y 평면 데이터 (CPU 메모리)
    uint8_t* u_plane;      // U 평면 데이터 (CPU 메모리)
    uint8_t* v_plane;      // V 평면 데이터 (CPU 메모리)

    int y_stride;          // Y 평면 stride
    int u_stride;          // U 평면 stride
    int v_stride;          // V 평면 stride

    int width;             // 프레임 너비
    int height;            // 프레임 높이

    uint64_t timestamp_us; // 타임스탬프 (마이크로초)
    uint64_t frame_number; // 프레임 시퀀스 번호
} VavCoreVideoFrame;
```

**제한사항:**
- CPU 메모리 포인터만 제공
- 렌더링을 위해 GPU → CPU → GPU 메모리 복사 필요
- 고해상도 콘텐츠에서 성능 병목 발생

## SDK D3D Surface 지원 분석

### 1. AMD AMF (Advanced Media Framework)

**D3D Surface 지원:** ✅ 완전한 D3D11/D3D12 텍스처 지원

**핵심 컴포넌트:**
- `AMFSurface` - 범용 surface 추상화
- `AMFContext::CreateSurfaceFromDX11Native()` - D3D11 텍스처 래퍼
- `AMFContext::CreateSurfaceFromDX12Native()` - D3D12 리소스 래퍼

**사용 패턴:**
```cpp
// D3D11 텍스처 surface 생성
ID3D11Texture2D* d3d11Texture;
AMFSurfacePtr amfSurface;
amfContext->CreateSurfaceFromDX11Native(d3d11Texture, &amfSurface, nullptr);

// AMF surface에 직접 디코딩
amfDecoder->SubmitInput(amfSurface);
amfDecoder->QueryOutput(&outputSurface);
```

### 2. Intel VPL (Video Processing Library)

**D3D Surface 지원:** ✅ mfxFrameSurface1을 통한 D3D11/D3D12 지원

**핵심 컴포넌트:**
- `mfxFrameSurface1` - D3D 핸들이 포함된 surface 디스크립터
- `mfxHandleType` - D3D11/D3D12 핸들 타입 지정
- 외부 할당자 통합

**사용 패턴:**
```cpp
// D3D11 surface 할당자 설정
mfxFrameSurface1 surface = {};
surface.Info = videoParams.mfx.FrameInfo;
surface.Data.MemId = d3d11Texture; // 직접 D3D11 텍스처 할당

// D3D surface에 디코딩
MFXVideoDECODE_DecodeFrameAsync(session, nullptr, &surface, &outputSurface, &sync);
```

### 3. NVIDIA NVDEC

**D3D Surface 지원:** ✅ D3D interop을 통한 CUDA 디바이스 포인터

**핵심 컴포넌트:**
- `cuvidMapVideoFrame()` - 디코딩된 프레임을 CUDA 디바이스 포인터로 매핑
- `CUdeviceptr` - CUDA 디바이스 메모리 포인터
- D3D-CUDA 상호 운용성 API

**사용 패턴:**
```cpp
// 디코딩된 프레임을 CUDA 디바이스 메모리로 매핑
CUdeviceptr devicePtr;
unsigned int pitch;
cuvidMapVideoFrame(decoder, picIdx, &devicePtr, &pitch, &params);

// D3D 텍스처를 CUDA에 등록
CUgraphicsResource cudaResource;
cuGraphicsD3D11RegisterResource(&cudaResource, d3d11Texture, CU_GRAPHICS_REGISTER_FLAGS_NONE);
```

### 4. dav1d (소프트웨어 디코더)

**D3D Surface 지원:** ❌ CPU 전용 디코더

**특징:**
- 순수 소프트웨어 구현
- CPU 메모리 포인터만 제공
- GPU surface 통합 없음
- 렌더링을 위해 CPU → GPU 업로드 필요

## 제안된 아키텍처

### 1. 확장된 Surface 타입

**새로운 VavCoreSurfaceType 열거형:**
```c
typedef enum {
    VAVCORE_SURFACE_CPU = 0,           // 기존 CPU 메모리
    VAVCORE_SURFACE_D3D11_TEXTURE = 1, // D3D11 텍스처
    VAVCORE_SURFACE_D3D12_RESOURCE = 2,// D3D12 리소스
    VAVCORE_SURFACE_CUDA_DEVICE = 3,   // CUDA 디바이스 포인터
    VAVCORE_SURFACE_AMF_SURFACE = 4    // AMF surface 래퍼
} VavCoreSurfaceType;
```

**확장된 VavCoreVideoFrame:**
```c
typedef struct {
    // 기존 CPU 필드들 (호환성 유지)
    uint8_t* y_plane;
    uint8_t* u_plane;
    uint8_t* v_plane;
    int y_stride;
    int u_stride;
    int v_stride;

    // 프레임 메타데이터
    int width;
    int height;
    uint64_t timestamp_us;
    uint64_t frame_number;

    // 새로운 D3D surface 필드들
    VavCoreSurfaceType surface_type;
    union {
        struct {
            // CPU 메모리 (기존)
            uint8_t* planes[3];
            int strides[3];
        } cpu;

        struct {
            // D3D11 텍스처
            void* d3d11_texture;       // ID3D11Texture2D*
            void* d3d11_device;        // ID3D11Device*
            uint32_t subresource_index;
        } d3d11;

        struct {
            // D3D12 리소스
            void* d3d12_resource;      // ID3D12Resource*
            void* d3d12_device;        // ID3D12Device*
            uint32_t subresource_index;
        } d3d12;

        struct {
            // CUDA 디바이스 포인터
            uint64_t device_ptr;       // CUdeviceptr
            uint32_t pitch;
            void* cuda_context;        // CUcontext
        } cuda;

        struct {
            // AMF surface
            void* amf_surface;         // AMFSurface*
            void* amf_context;         // AMFContext*
        } amf;
    } surface_data;
} VavCoreVideoFrame;
```

### 2. 디코더 인터페이스 확장

**향상된 디코더 인터페이스:**
```cpp
class IVideoDecoder {
public:
    // 기존 메서드들
    virtual bool DecodeFrame(const uint8_t* packet_data, size_t packet_size,
                           VavCoreVideoFrame& frame) = 0;

    // 새로운 D3D surface 메서드들
    virtual bool SupportsSurfaceType(VavCoreSurfaceType type) = 0;
    virtual bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,
                               VavCoreSurfaceType target_type,
                               void* target_surface,
                               VavCoreVideoFrame& frame) = 0;
    virtual bool SetD3DDevice(void* d3d_device, VavCoreSurfaceType type) = 0;
};
```

### 3. 하드웨어별 구현

#### AMD AMF 디코더 구현
```cpp
class AMFDecoder : public IVideoDecoder {
private:
    AMFContextPtr m_amfContext;
    AMFComponentPtr m_amfDecoder;
    ID3D11Device* m_d3d11Device;

public:
    bool SupportsSurfaceType(VavCoreSurfaceType type) override {
        return (type == VAVCORE_SURFACE_D3D11_TEXTURE ||
                type == VAVCORE_SURFACE_D3D12_RESOURCE ||
                type == VAVCORE_SURFACE_AMF_SURFACE);
    }

    bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,
                        VavCoreSurfaceType target_type,
                        void* target_surface,
                        VavCoreVideoFrame& frame) override {
        // D3D 텍스처로부터 AMF surface 생성
        AMFSurfacePtr inputSurface;
        if (target_type == VAVCORE_SURFACE_D3D11_TEXTURE) {
            m_amfContext->CreateSurfaceFromDX11Native(
                static_cast<ID3D11Texture2D*>(target_surface),
                &inputSurface, nullptr);
        }

        // AMF surface에 직접 디코딩
        m_amfDecoder->SubmitInput(inputSurface);

        AMFDataPtr outputData;
        m_amfDecoder->QueryOutput(&outputData);

        // 프레임 메타데이터 채움
        frame.surface_type = VAVCORE_SURFACE_AMF_SURFACE;
        frame.surface_data.amf.amf_surface = outputData.GetPtr();
        frame.surface_data.amf.amf_context = m_amfContext.GetPtr();

        return true;
    }
};
```

#### Intel VPL 디코더 구현
```cpp
class VPLDecoder : public IVideoDecoder {
private:
    mfxSession m_session;
    ID3D11Device* m_d3d11Device;

public:
    bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,
                        VavCoreSurfaceType target_type,
                        void* target_surface,
                        VavCoreVideoFrame& frame) override {
        mfxFrameSurface1 surface = {};
        surface.Info = m_videoParams.mfx.FrameInfo;

        if (target_type == VAVCORE_SURFACE_D3D11_TEXTURE) {
            surface.Data.MemId = target_surface; // 직접 D3D11 텍스처

            mfxSyncPoint sync;
            mfxFrameSurface1* outputSurface;

            MFXVideoDECODE_DecodeFrameAsync(m_session, nullptr, &surface,
                                          &outputSurface, &sync);
            MFXVideoCORE_SyncOperation(m_session, sync, MFX_INFINITE);

            // 프레임 메타데이터 채움
            frame.surface_type = VAVCORE_SURFACE_D3D11_TEXTURE;
            frame.surface_data.d3d11.d3d11_texture = outputSurface->Data.MemId;
            frame.surface_data.d3d11.d3d11_device = m_d3d11Device;
        }

        return true;
    }
};
```

#### NVIDIA NVDEC 구현
```cpp
class NVDECDecoder : public IVideoDecoder {
private:
    CUvideodecoder m_decoder;
    CUcontext m_cudaContext;

public:
    bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,
                        VavCoreSurfaceType target_type,
                        void* target_surface,
                        VavCoreVideoFrame& frame) override {
        // 프레임 디코딩
        CUVIDPICPARAMS picParams = {};
        // ... packet_data로부터 picParams 설정

        cuvidDecodePicture(m_decoder, &picParams);

        // CUDA 디바이스 포인터로 매핑
        CUdeviceptr devicePtr;
        unsigned int pitch;
        CUVIDPROCPARAMS procParams = {};

        cuvidMapVideoFrame(m_decoder, picParams.CurrPicIdx,
                          &devicePtr, &pitch, &procParams);

        // 프레임 메타데이터 채움
        frame.surface_type = VAVCORE_SURFACE_CUDA_DEVICE;
        frame.surface_data.cuda.device_ptr = devicePtr;
        frame.surface_data.cuda.pitch = pitch;
        frame.surface_data.cuda.cuda_context = m_cudaContext;

        return true;
    }
};
```

### 4. 렌더러 통합

**D3D Surface 인식 렌더러:**
```cpp
class D3DSurfaceRenderer {
public:
    bool RenderFrame(const VavCoreVideoFrame& frame) {
        switch (frame.surface_type) {
            case VAVCORE_SURFACE_D3D11_TEXTURE:
                return RenderD3D11Texture(frame.surface_data.d3d11);

            case VAVCORE_SURFACE_D3D12_RESOURCE:
                return RenderD3D12Resource(frame.surface_data.d3d12);

            case VAVCORE_SURFACE_CUDA_DEVICE:
                return RenderCudaDevicePtr(frame.surface_data.cuda);

            case VAVCORE_SURFACE_AMF_SURFACE:
                return RenderAMFSurface(frame.surface_data.amf);

            case VAVCORE_SURFACE_CPU:
            default:
                return RenderCPUFrame(frame);
        }
    }

private:
    bool RenderD3D11Texture(const auto& d3d11_data) {
        auto texture = static_cast<ID3D11Texture2D*>(d3d11_data.d3d11_texture);
        // 직접 텍스처-to-백버퍼 복사 또는 셰이더 렌더링
        // CPU 메모리 복사 불필요
        return true;
    }
};
```

### 5. 폴백 전략

**자동 Surface 타입 선택:**
```cpp
class AdaptiveDecoder {
public:
    VavCoreSurfaceType SelectOptimalSurfaceType(VavCoreDecoderType decoder_type) {
        switch (decoder_type) {
            case VAVCORE_DECODER_AMF:
                if (m_d3d11Device) return VAVCORE_SURFACE_D3D11_TEXTURE;
                if (m_d3d12Device) return VAVCORE_SURFACE_D3D12_RESOURCE;
                break;

            case VAVCORE_DECODER_VPL:
                if (m_d3d11Device) return VAVCORE_SURFACE_D3D11_TEXTURE;
                break;

            case VAVCORE_DECODER_NVDEC:
                return VAVCORE_SURFACE_CUDA_DEVICE;

            case VAVCORE_DECODER_DAV1D:
            case VAVCORE_DECODER_MEDIA_FOUNDATION:
            default:
                return VAVCORE_SURFACE_CPU;
        }

        return VAVCORE_SURFACE_CPU; // 폴백
    }
};
```

## 성능 향상 효과

### 예상 성능 개선

**4K AV1 디코딩 + 렌더링 파이프라인:**

| 구성 요소 | 현재 (CPU) | D3D Surface 사용 | 개선도 |
|-----------|------------|------------------|--------|
| 디코딩 | 15-25ms | 10-20ms | 1.2-1.5배 |
| GPU 업로드 | 5-10ms | 0ms | ∞ |
| 렌더링 | 1-3ms | 0.5-1ms | 2-3배 |
| **총합** | **21-38ms** | **10.5-21ms** | **2-3.6배** |

**메모리 대역폭 절약:**
- 4K YUV420: 프레임당 ~12MB
- 60fps: ~720MB/s 메모리 대역폭 절약
- 메모리 압박 및 캐시 오염 감소

### 사용 사례 및 이점

1. **고해상도 콘텐츠 (4K+)**
   - GPU → CPU → GPU 병목 제거
   - 실시간 4K60 디코딩 + 렌더링 가능

2. **멀티 스트림 시나리오**
   - CPU 메모리 복사 없이 여러 비디오 스트림 처리
   - 효율적인 GPU 메모리 공유

3. **실시간 애플리케이션**
   - 라이브 스트리밍 지연 시간 감소
   - 시스템 반응성을 위한 CPU 사용률 감소

## 구현 단계

### 1단계: 핵심 인프라
- [ ] Surface union을 포함한 VavCoreVideoFrame 확장
- [ ] Surface 메서드가 포함된 IVideoDecoder 인터페이스 업데이트
- [ ] Surface 타입 기능 감지 구현

### 2단계: 하드웨어 디코더 통합
- [ ] AMD AMF surface 디코딩 구현
- [ ] Intel VPL surface 디코딩 구현
- [ ] NVIDIA NVDEC CUDA 통합

### 3단계: 렌더러 업데이트
- [ ] D3D11/D3D12 surface 렌더링
- [ ] CUDA-D3D 상호 운용성
- [ ] AMF surface 렌더링

### 4단계: 최적화 및 테스트
- [ ] 성능 벤치마킹
- [ ] 폴백 메커니즘 개선
- [ ] 멀티 GPU 지원

## 호환성 고려사항

### 하위 호환성
- 기존 CPU 기반 코드는 변경 없이 계속 작동
- VavCoreVideoFrame이 기존 CPU 필드 유지
- D3D 불가능 시 CPU 디코딩으로 자동 폴백

### 플랫폼 지원
- **Windows 10/11**: 완전한 D3D11/D3D12 지원
- **구형 Windows**: CPU 디코딩으로 폴백
- **비Windows**: CPU 전용 (향후: Vulkan/OpenGL)

### 하드웨어 요구사항
- **AMD**: AV1 하드웨어 디코딩을 위한 RX 6000+ 시리즈
- **Intel**: Arc 시리즈 또는 11세대+ 내장 그래픽
- **NVIDIA**: AV1 하드웨어 디코딩을 위한 RTX 30 시리즈+

## 위험 평가

### 기술적 위험
1. **드라이버 호환성**: 하드웨어별 드라이버 문제
   - **완화**: CPU 디코딩으로 완전한 폴백

2. **메모리 관리**: D3D surface 생명주기 관리
   - **완화**: RAII 래퍼 및 참조 카운팅

3. **동기화**: GPU-GPU 동기화 복잡성
   - **완화**: 명시적 동기화 프리미티브

### 성능 위험
1. **초기 구현**: 최적화된 CPU 경로보다 느릴 수 있음
   - **완화**: 성능 게이트가 포함된 단계적 롤아웃

2. **메모리 오버헤드**: 추가 surface 메타데이터
   - **완화**: Union 기반 저장소, 최소 오버헤드

## 결론

D3D surface 직접 디코딩은 고해상도 AV1 콘텐츠에 대한 중요한 성능 기회를 제공합니다. 제안된 아키텍처는 하드웨어 가속 시나리오에서 상당한 성능 향상을 가능하게 하면서 하위 호환성을 유지합니다.

구현은 다음을 우선시합니다:
1. **호환성**: 기존 코드가 계속 작동
2. **성능**: 불필요한 메모리 복사 제거
3. **유연성**: 여러 하드웨어 벤더 지원
4. **유지보수성**: 명확한 추상화 계층

적절한 구현을 통해 이 아키텍처는 시스템 안정성과 호환성을 유지하면서 4K+ 콘텐츠에 대해 2-3배의 성능 향상을 제공할 수 있습니다.
Update memory 2025-09-27 00:20:35 +09:00			`# D3D Surface 직접 디코딩 아키텍처 설계`

			`## 개요`

			`이 문서는 VavCore에서 D3D surface 직접 디코딩을 구현하여 CPU 메모리 복사를 제거하고 고성능 GPU-to-GPU 렌더링을 가능하게 하는 아키텍처를 제시합니다. 이 설계는 CPU 전용 디코딩과의 호환성을 유지하면서 모든 주요 하드웨어 가속 SDK를 지원합니다.`

			`## 현재 상태 분석`

			`### VavCoreVideoFrame (CPU 전용)`
			```c
			`typedef struct {`
			`uint8_t* y_plane; // Y 평면 데이터 (CPU 메모리)`
			`uint8_t* u_plane; // U 평면 데이터 (CPU 메모리)`
			`uint8_t* v_plane; // V 평면 데이터 (CPU 메모리)`

			`int y_stride; // Y 평면 stride`
			`int u_stride; // U 평면 stride`
			`int v_stride; // V 평면 stride`

			`int width; // 프레임 너비`
			`int height; // 프레임 높이`

			`uint64_t timestamp_us; // 타임스탬프 (마이크로초)`
			`uint64_t frame_number; // 프레임 시퀀스 번호`
			`} VavCoreVideoFrame;`
			```

			`제한사항:`
			`- CPU 메모리 포인터만 제공`
			`- 렌더링을 위해 GPU → CPU → GPU 메모리 복사 필요`
			`- 고해상도 콘텐츠에서 성능 병목 발생`

			`## SDK D3D Surface 지원 분석`

			`### 1. AMD AMF (Advanced Media Framework)`

			`D3D Surface 지원: ✅ 완전한 D3D11/D3D12 텍스처 지원`

			`핵심 컴포넌트:`
			- `AMFSurface` - 범용 surface 추상화
			- `AMFContext::CreateSurfaceFromDX11Native()` - D3D11 텍스처 래퍼
			- `AMFContext::CreateSurfaceFromDX12Native()` - D3D12 리소스 래퍼

			`사용 패턴:`
			```cpp
			`// D3D11 텍스처 surface 생성`
			`ID3D11Texture2D* d3d11Texture;`
			`AMFSurfacePtr amfSurface;`
			`amfContext->CreateSurfaceFromDX11Native(d3d11Texture, &amfSurface, nullptr);`

			`// AMF surface에 직접 디코딩`
			`amfDecoder->SubmitInput(amfSurface);`
			`amfDecoder->QueryOutput(&outputSurface);`
			```

			`### 2. Intel VPL (Video Processing Library)`

			`D3D Surface 지원: ✅ mfxFrameSurface1을 통한 D3D11/D3D12 지원`

			`핵심 컴포넌트:`
			- `mfxFrameSurface1` - D3D 핸들이 포함된 surface 디스크립터
			- `mfxHandleType` - D3D11/D3D12 핸들 타입 지정
			`- 외부 할당자 통합`

			`사용 패턴:`
			```cpp
			`// D3D11 surface 할당자 설정`
			`mfxFrameSurface1 surface = {};`
			`surface.Info = videoParams.mfx.FrameInfo;`
			`surface.Data.MemId = d3d11Texture; // 직접 D3D11 텍스처 할당`

			`// D3D surface에 디코딩`
			`MFXVideoDECODE_DecodeFrameAsync(session, nullptr, &surface, &outputSurface, &sync);`
			```

			`### 3. NVIDIA NVDEC`

			`D3D Surface 지원: ✅ D3D interop을 통한 CUDA 디바이스 포인터`

			`핵심 컴포넌트:`
			- `cuvidMapVideoFrame()` - 디코딩된 프레임을 CUDA 디바이스 포인터로 매핑
			- `CUdeviceptr` - CUDA 디바이스 메모리 포인터
			`- D3D-CUDA 상호 운용성 API`

			`사용 패턴:`
			```cpp
			`// 디코딩된 프레임을 CUDA 디바이스 메모리로 매핑`
			`CUdeviceptr devicePtr;`
			`unsigned int pitch;`
			`cuvidMapVideoFrame(decoder, picIdx, &devicePtr, &pitch, &params);`

			`// D3D 텍스처를 CUDA에 등록`
			`CUgraphicsResource cudaResource;`
			`cuGraphicsD3D11RegisterResource(&cudaResource, d3d11Texture, CU_GRAPHICS_REGISTER_FLAGS_NONE);`
			```

			`### 4. dav1d (소프트웨어 디코더)`

			`D3D Surface 지원: ❌ CPU 전용 디코더`

			`특징:`
			`- 순수 소프트웨어 구현`
			`- CPU 메모리 포인터만 제공`
			`- GPU surface 통합 없음`
			`- 렌더링을 위해 CPU → GPU 업로드 필요`

			`## 제안된 아키텍처`

			`### 1. 확장된 Surface 타입`

			`새로운 VavCoreSurfaceType 열거형:`
			```c
			`typedef enum {`
			`VAVCORE_SURFACE_CPU = 0, // 기존 CPU 메모리`
			`VAVCORE_SURFACE_D3D11_TEXTURE = 1, // D3D11 텍스처`
			`VAVCORE_SURFACE_D3D12_RESOURCE = 2,// D3D12 리소스`
			`VAVCORE_SURFACE_CUDA_DEVICE = 3, // CUDA 디바이스 포인터`
			`VAVCORE_SURFACE_AMF_SURFACE = 4 // AMF surface 래퍼`
			`} VavCoreSurfaceType;`
			```

			`확장된 VavCoreVideoFrame:`
			```c
			`typedef struct {`
			`// 기존 CPU 필드들 (호환성 유지)`
			`uint8_t* y_plane;`
			`uint8_t* u_plane;`
			`uint8_t* v_plane;`
			`int y_stride;`
			`int u_stride;`
			`int v_stride;`

			`// 프레임 메타데이터`
			`int width;`
			`int height;`
			`uint64_t timestamp_us;`
			`uint64_t frame_number;`

			`// 새로운 D3D surface 필드들`
			`VavCoreSurfaceType surface_type;`
			`union {`
			`struct {`
			`// CPU 메모리 (기존)`
			`uint8_t* planes[3];`
			`int strides[3];`
			`} cpu;`

			`struct {`
			`// D3D11 텍스처`
			`void* d3d11_texture; // ID3D11Texture2D*`
			`void* d3d11_device; // ID3D11Device*`
			`uint32_t subresource_index;`
			`} d3d11;`

			`struct {`
			`// D3D12 리소스`
			`void* d3d12_resource; // ID3D12Resource*`
			`void* d3d12_device; // ID3D12Device*`
			`uint32_t subresource_index;`
			`} d3d12;`

			`struct {`
			`// CUDA 디바이스 포인터`
			`uint64_t device_ptr; // CUdeviceptr`
			`uint32_t pitch;`
			`void* cuda_context; // CUcontext`
			`} cuda;`

			`struct {`
			`// AMF surface`
			`void* amf_surface; // AMFSurface*`
			`void* amf_context; // AMFContext*`
			`} amf;`
			`} surface_data;`
			`} VavCoreVideoFrame;`
			```

			`### 2. 디코더 인터페이스 확장`

			`향상된 디코더 인터페이스:`
			```cpp
			`class IVideoDecoder {`
			`public:`
			`// 기존 메서드들`
			`virtual bool DecodeFrame(const uint8_t* packet_data, size_t packet_size,`
			`VavCoreVideoFrame& frame) = 0;`

			`// 새로운 D3D surface 메서드들`
			`virtual bool SupportsSurfaceType(VavCoreSurfaceType type) = 0;`
			`virtual bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,`
			`VavCoreSurfaceType target_type,`
			`void* target_surface,`
			`VavCoreVideoFrame& frame) = 0;`
			`virtual bool SetD3DDevice(void* d3d_device, VavCoreSurfaceType type) = 0;`
			`};`
			```

			`### 3. 하드웨어별 구현`

			`#### AMD AMF 디코더 구현`
			```cpp
			`class AMFDecoder : public IVideoDecoder {`
			`private:`
			`AMFContextPtr m_amfContext;`
			`AMFComponentPtr m_amfDecoder;`
			`ID3D11Device* m_d3d11Device;`

			`public:`
			`bool SupportsSurfaceType(VavCoreSurfaceType type) override {`
			`return (type == VAVCORE_SURFACE_D3D11_TEXTURE \|\|`
			`type == VAVCORE_SURFACE_D3D12_RESOURCE \|\|`
			`type == VAVCORE_SURFACE_AMF_SURFACE);`
			`}`

			`bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,`
			`VavCoreSurfaceType target_type,`
			`void* target_surface,`
			`VavCoreVideoFrame& frame) override {`
			`// D3D 텍스처로부터 AMF surface 생성`
			`AMFSurfacePtr inputSurface;`
			`if (target_type == VAVCORE_SURFACE_D3D11_TEXTURE) {`
			`m_amfContext->CreateSurfaceFromDX11Native(`
			`static_cast<ID3D11Texture2D*>(target_surface),`
			`&inputSurface, nullptr);`
			`}`

			`// AMF surface에 직접 디코딩`
			`m_amfDecoder->SubmitInput(inputSurface);`

			`AMFDataPtr outputData;`
			`m_amfDecoder->QueryOutput(&outputData);`

			`// 프레임 메타데이터 채움`
			`frame.surface_type = VAVCORE_SURFACE_AMF_SURFACE;`
			`frame.surface_data.amf.amf_surface = outputData.GetPtr();`
			`frame.surface_data.amf.amf_context = m_amfContext.GetPtr();`

			`return true;`
			`}`
			`};`
			```

			`#### Intel VPL 디코더 구현`
			```cpp
			`class VPLDecoder : public IVideoDecoder {`
			`private:`
			`mfxSession m_session;`
			`ID3D11Device* m_d3d11Device;`

			`public:`
			`bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,`
			`VavCoreSurfaceType target_type,`
			`void* target_surface,`
			`VavCoreVideoFrame& frame) override {`
			`mfxFrameSurface1 surface = {};`
			`surface.Info = m_videoParams.mfx.FrameInfo;`

			`if (target_type == VAVCORE_SURFACE_D3D11_TEXTURE) {`
			`surface.Data.MemId = target_surface; // 직접 D3D11 텍스처`

			`mfxSyncPoint sync;`
			`mfxFrameSurface1* outputSurface;`

			`MFXVideoDECODE_DecodeFrameAsync(m_session, nullptr, &surface,`
			`&outputSurface, &sync);`
			`MFXVideoCORE_SyncOperation(m_session, sync, MFX_INFINITE);`

			`// 프레임 메타데이터 채움`
			`frame.surface_type = VAVCORE_SURFACE_D3D11_TEXTURE;`
			`frame.surface_data.d3d11.d3d11_texture = outputSurface->Data.MemId;`
			`frame.surface_data.d3d11.d3d11_device = m_d3d11Device;`
			`}`

			`return true;`
			`}`
			`};`
			```

			`#### NVIDIA NVDEC 구현`
			```cpp
			`class NVDECDecoder : public IVideoDecoder {`
			`private:`
			`CUvideodecoder m_decoder;`
			`CUcontext m_cudaContext;`

			`public:`
			`bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,`
			`VavCoreSurfaceType target_type,`
			`void* target_surface,`
			`VavCoreVideoFrame& frame) override {`
			`// 프레임 디코딩`
			`CUVIDPICPARAMS picParams = {};`
			`// ... packet_data로부터 picParams 설정`

			`cuvidDecodePicture(m_decoder, &picParams);`

			`// CUDA 디바이스 포인터로 매핑`
			`CUdeviceptr devicePtr;`
			`unsigned int pitch;`
			`CUVIDPROCPARAMS procParams = {};`

			`cuvidMapVideoFrame(m_decoder, picParams.CurrPicIdx,`
			`&devicePtr, &pitch, &procParams);`

			`// 프레임 메타데이터 채움`
			`frame.surface_type = VAVCORE_SURFACE_CUDA_DEVICE;`
			`frame.surface_data.cuda.device_ptr = devicePtr;`
			`frame.surface_data.cuda.pitch = pitch;`
			`frame.surface_data.cuda.cuda_context = m_cudaContext;`

			`return true;`
			`}`
			`};`
			```

			`### 4. 렌더러 통합`

			`D3D Surface 인식 렌더러:`
			```cpp
			`class D3DSurfaceRenderer {`
			`public:`
			`bool RenderFrame(const VavCoreVideoFrame& frame) {`
			`switch (frame.surface_type) {`
			`case VAVCORE_SURFACE_D3D11_TEXTURE:`
			`return RenderD3D11Texture(frame.surface_data.d3d11);`

			`case VAVCORE_SURFACE_D3D12_RESOURCE:`
			`return RenderD3D12Resource(frame.surface_data.d3d12);`

			`case VAVCORE_SURFACE_CUDA_DEVICE:`
			`return RenderCudaDevicePtr(frame.surface_data.cuda);`

			`case VAVCORE_SURFACE_AMF_SURFACE:`
			`return RenderAMFSurface(frame.surface_data.amf);`

			`case VAVCORE_SURFACE_CPU:`
			`default:`
			`return RenderCPUFrame(frame);`
			`}`
			`}`

			`private:`
			`bool RenderD3D11Texture(const auto& d3d11_data) {`
			`auto texture = static_cast<ID3D11Texture2D*>(d3d11_data.d3d11_texture);`
			`// 직접 텍스처-to-백버퍼 복사 또는 셰이더 렌더링`
			`// CPU 메모리 복사 불필요`
			`return true;`
			`}`
			`};`
			```

			`### 5. 폴백 전략`

			`자동 Surface 타입 선택:`
			```cpp
			`class AdaptiveDecoder {`
			`public:`
			`VavCoreSurfaceType SelectOptimalSurfaceType(VavCoreDecoderType decoder_type) {`
			`switch (decoder_type) {`
			`case VAVCORE_DECODER_AMF:`
			`if (m_d3d11Device) return VAVCORE_SURFACE_D3D11_TEXTURE;`
			`if (m_d3d12Device) return VAVCORE_SURFACE_D3D12_RESOURCE;`
			`break;`

			`case VAVCORE_DECODER_VPL:`
			`if (m_d3d11Device) return VAVCORE_SURFACE_D3D11_TEXTURE;`
			`break;`

			`case VAVCORE_DECODER_NVDEC:`
			`return VAVCORE_SURFACE_CUDA_DEVICE;`

			`case VAVCORE_DECODER_DAV1D:`
			`case VAVCORE_DECODER_MEDIA_FOUNDATION:`
			`default:`
			`return VAVCORE_SURFACE_CPU;`
			`}`

			`return VAVCORE_SURFACE_CPU; // 폴백`
			`}`
			`};`
			```

			`## 성능 향상 효과`

			`### 예상 성능 개선`

			`4K AV1 디코딩 + 렌더링 파이프라인:`

			`\| 구성 요소 \| 현재 (CPU) \| D3D Surface 사용 \| 개선도 \|`
			`\|-----------\|------------\|------------------\|--------\|`
			`\| 디코딩 \| 15-25ms \| 10-20ms \| 1.2-1.5배 \|`
			`\| GPU 업로드 \| 5-10ms \| 0ms \| ∞ \|`
			`\| 렌더링 \| 1-3ms \| 0.5-1ms \| 2-3배 \|`
			`\| 총합 \| 21-38ms \| 10.5-21ms \| 2-3.6배 \|`

			`메모리 대역폭 절약:`
			`- 4K YUV420: 프레임당 ~12MB`
			`- 60fps: ~720MB/s 메모리 대역폭 절약`
			`- 메모리 압박 및 캐시 오염 감소`

			`### 사용 사례 및 이점`

			`1. 고해상도 콘텐츠 (4K+)`
			`- GPU → CPU → GPU 병목 제거`
			`- 실시간 4K60 디코딩 + 렌더링 가능`

			`2. 멀티 스트림 시나리오`
			`- CPU 메모리 복사 없이 여러 비디오 스트림 처리`
			`- 효율적인 GPU 메모리 공유`

			`3. 실시간 애플리케이션`
			`- 라이브 스트리밍 지연 시간 감소`
			`- 시스템 반응성을 위한 CPU 사용률 감소`

			`## 구현 단계`

			`### 1단계: 핵심 인프라`
			`- [ ] Surface union을 포함한 VavCoreVideoFrame 확장`
			`- [ ] Surface 메서드가 포함된 IVideoDecoder 인터페이스 업데이트`
			`- [ ] Surface 타입 기능 감지 구현`

			`### 2단계: 하드웨어 디코더 통합`
			`- [ ] AMD AMF surface 디코딩 구현`
			`- [ ] Intel VPL surface 디코딩 구현`
			`- [ ] NVIDIA NVDEC CUDA 통합`

			`### 3단계: 렌더러 업데이트`
			`- [ ] D3D11/D3D12 surface 렌더링`
			`- [ ] CUDA-D3D 상호 운용성`
			`- [ ] AMF surface 렌더링`

			`### 4단계: 최적화 및 테스트`
			`- [ ] 성능 벤치마킹`
			`- [ ] 폴백 메커니즘 개선`
			`- [ ] 멀티 GPU 지원`

			`## 호환성 고려사항`

			`### 하위 호환성`
			`- 기존 CPU 기반 코드는 변경 없이 계속 작동`
			`- VavCoreVideoFrame이 기존 CPU 필드 유지`
			`- D3D 불가능 시 CPU 디코딩으로 자동 폴백`

			`### 플랫폼 지원`
			`- Windows 10/11: 완전한 D3D11/D3D12 지원`
			`- 구형 Windows: CPU 디코딩으로 폴백`
			`- 비Windows: CPU 전용 (향후: Vulkan/OpenGL)`

			`### 하드웨어 요구사항`
			`- AMD: AV1 하드웨어 디코딩을 위한 RX 6000+ 시리즈`
			`- Intel: Arc 시리즈 또는 11세대+ 내장 그래픽`
			`- NVIDIA: AV1 하드웨어 디코딩을 위한 RTX 30 시리즈+`

			`## 위험 평가`

			`### 기술적 위험`
			`1. 드라이버 호환성: 하드웨어별 드라이버 문제`
			`- 완화: CPU 디코딩으로 완전한 폴백`

			`2. 메모리 관리: D3D surface 생명주기 관리`
			`- 완화: RAII 래퍼 및 참조 카운팅`

			`3. 동기화: GPU-GPU 동기화 복잡성`
			`- 완화: 명시적 동기화 프리미티브`

			`### 성능 위험`
			`1. 초기 구현: 최적화된 CPU 경로보다 느릴 수 있음`
			`- 완화: 성능 게이트가 포함된 단계적 롤아웃`

			`2. 메모리 오버헤드: 추가 surface 메타데이터`
			`- 완화: Union 기반 저장소, 최소 오버헤드`

			`## 결론`

			`D3D surface 직접 디코딩은 고해상도 AV1 콘텐츠에 대한 중요한 성능 기회를 제공합니다. 제안된 아키텍처는 하드웨어 가속 시나리오에서 상당한 성능 향상을 가능하게 하면서 하위 호환성을 유지합니다.`

			`구현은 다음을 우선시합니다:`
			`1. 호환성: 기존 코드가 계속 작동`
			`2. 성능: 불필요한 메모리 복사 제거`
			`3. 유연성: 여러 하드웨어 벤더 지원`
			`4. 유지보수성: 명확한 추상화 계층`

			`적절한 구현을 통해 이 아키텍처는 시스템 안정성과 호환성을 유지하면서 4K+ 콘텐츠에 대해 2-3배의 성능 향상을 제공할 수 있습니다.`