CUDA Runtime API Implementation - 완료 보고서

작성일: 2025년 10월 7일 상태: ✅ 구현 완료 관련 문서: CUDA_API_Unification_Design.md

1. 구현 요약

문제: CUDA_ERROR_INVALID_HANDLE (400) - Runtime API surface와 Driver API kernel launch 불일치

해결: Runtime API cudaLaunchKernel() 사용으로 handle space 통일

채택 방식: Hybrid API (Runtime surface + Driver kernel loading + Runtime launch)

2. 구현 파일

파일: D3D12SurfaceHandler.cpp 메서드: CopyRGBAToSurfaceViaKernel() (lines 304-363)

3. 핵심 변경사항

Before (Driver API - BROKEN)

CUresult result = cuLaunchKernel(
    m_surfaceWriteKernel,
    grid_size.x, grid_size.y, 1,
    block_size.x, block_size.y, 1,
    0,
    (CUstream)stream,
    kernel_args,  // Runtime API surface → INVALID_HANDLE
    nullptr
);

After (Runtime API - FIXED)

cudaError_t err = cudaLaunchKernel(
    (const void*)m_surfaceWriteKernel,  // Cast Driver API kernel
    grid_size,     // Runtime API dim3
    block_size,    // Runtime API dim3
    kernel_args,   // Runtime API surface ← COMPATIBLE
    0,
    stream
);

4. 최종 아키텍처

[ExternalMemoryCache]
  cudaImportExternalMemory()                        // Runtime API
  cudaExternalMemoryGetMappedMipmappedArray()      // Runtime API
  cudaCreateSurfaceObject()                         // Runtime API
      ↓
[D3D12SurfaceHandler]
  cuModuleLoadData()                                // Driver API
  cuModuleGetFunction()                             // Driver API
  cudaLaunchKernel()                                // Runtime API ← 변경됨
      ↓
[CUDA Kernel]
  surf2Dwrite()                                     // Runtime API

5. 핵심 통찰

왜 Hybrid API인가?

Surface 생성: Runtime API 유지 (이미 작동 중)
Kernel 로딩: Driver API 유지 (PTX 임베딩 활용)
Kernel 실행: Runtime API로 변경 (handle 호환성)

왜 완전 Runtime API로 전환하지 않았나?

PTX 임베딩 시스템 재사용 가능
NVRTC 통합 복잡도 회피
cudaLaunchKernel()이 Driver API 커널을 받을 수 있음 확인

6. 검증 방법

빌드 명령어:

"/c/Program Files/Microsoft Visual Studio/2022/Community/MSBuild/Current/Bin/MSBuild.exe" \
  VavCore.vcxproj //p:Configuration=Debug //p:Platform=x64 //v:minimal

테스트 실행:

"./bin/Debug/RedSurfaceNVDECTest.exe" "D:/Project/video-av1/sample/test_4px_stripe_720p_av1.webm"

성공 기준:

✅ CUDA_ERROR_INVALID_HANDLE 에러 0건
✅ Kernel launch 성공 로그 출력
✅ 프레임 디코딩 정상 완료

7. 다음 단계

VavCore 빌드 및 테스트 실행
Phase 2: 에러 복구 강화 구현
- DecodeToSurface cleanup_on_error 추가
- HandlePictureDisplay 중복 호출 방지

2.9 KiB Raw Permalink Blame History