# CUDA Driver API Complete Unification Design ## Problem Analysis ### Root Cause: Runtime API + Driver API Mixing **Current Issue:** - `CUDA_ERROR_INVALID_HANDLE` occurs when kernel launches - Surface object created successfully but cannot be used in kernel - Context setting doesn't resolve the fundamental problem **Why Mixing APIs Causes Issues:** 1. **Different Context Management:** - Runtime API: Automatic context creation/management (implicit primary context) - Driver API: Explicit context creation/management (requires `cuCtxSetCurrent()`) - **Mixed usage can result in operations happening in different contexts** 2. **Handle Incompatibility:** - Runtime API handles: Bound to Runtime API context - Driver API handles: Bound to Driver API context - **A surface created in Runtime context cannot be used in Driver context** 3. **NVDEC Constraint:** - NVDEC uses **pure Driver API** (`cuvidCreateDecoder`, `cuvidMapVideoFrame`, etc.) - All subsequent CUDA operations must use **the same Driver API context** - Mixing Runtime API breaks this constraint ### Historical Context: How We Got Here **Original Implementation (commit 73d9d8d):** - Fully Runtime API based - `cudaSurfaceObject_t`, `cudaCreateSurfaceObject()` - Worked in isolation but failed with NVDEC integration **Partial Fix Attempt:** - Changed surface creation to Driver API (`cuSurfObjectCreate()`) - Left external memory import as Runtime API - **Result: Still mixing APIs → Still failing** **Current State:** ```cpp // Runtime API (wrong context) cudaImportExternalMemory(&external_memory, &mem_desc); cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, ...); // Driver API (NVDEC context) cuSurfObjectCreate(&surface, &res_desc); cuLaunchKernel(kernel, ...); ``` ## Solution: Complete Driver API Unification ### Phase 1: API Conversion (Current Task) **Files to Modify:** 1. `ExternalMemoryCache.h` - Update type signatures 2. `ExternalMemoryCache.cpp` - Convert all Runtime API calls 3. `D3D12SurfaceHandler.cpp` - Convert remaining Runtime API calls **Conversion Map:** | Runtime API | Driver API | Notes | |-------------|-----------|-------| | `cudaExternalMemory_t` | `CUDA_EXTERNAL_MEMORY_HANDLE_DESC` + `CUexternalMemory` | Type change required | | `cudaImportExternalMemory()` | `cuImportExternalMemory()` | Direct replacement | | `cudaExternalMemoryGetMappedMipmappedArray()` | `cuExternalMemoryGetMappedMipmappedArray()` | Direct replacement | | `cudaGetMipmappedArrayLevel()` | `cuMipmappedArrayGetLevel()` | Direct replacement | | `cudaMipmappedArray_t` | `CUmipmappedArray` | Type change | | `cudaArray_t` | `CUarray` | Type change | | `cudaMemcpy2D()` | `cuMemcpy2D()` | Async version: `cuMemcpy2DAsync()` | | `cudaDestroyExternalMemory()` | `cuDestroyExternalMemory()` | Direct replacement | | `cudaFreeMipmappedArray()` | `cuMipmappedArrayDestroy()` | Direct replacement | ### Phase 2: Context Verification (After Unification) After complete Driver API unification: 1. Verify NVDEC context is properly passed to all components 2. Ensure `cuCtxSetCurrent()` is called before Driver API operations 3. Confirm all operations use the same context ### Expected Benefits ✅ **Single Context:** All operations in NVDEC's Driver API context ✅ **Handle Compatibility:** All handles created/used in same context ✅ **Clear Debugging:** Unified API makes issues easier to diagnose ✅ **NVDEC Alignment:** Matches NVDEC's native API paradigm ✅ **Stability:** Eliminates context switching issues ## Implementation Plan ### Step 1: Update Type Definitions **ExternalMemoryCache.h:** ```cpp // Before (Runtime API types) struct CachedEntry { cudaExternalMemory_t external_memory; cudaMipmappedArray_t mipmapped_array; // ... }; // After (Driver API types) struct CachedEntry { CUexternalMemory external_memory; CUmipmappedArray mipmapped_array; // ... }; ``` ### Step 2: Convert External Memory Import **ExternalMemoryCache.cpp - ImportD3D12TextureAsSurface():** ```cpp // Before (Runtime API) cudaExternalMemoryHandleDesc mem_desc = {}; mem_desc.type = cudaExternalMemoryHandleTypeD3D12Resource; // ... cudaError_t err = cudaImportExternalMemory(&external_memory, &mem_desc); // After (Driver API) CUDA_EXTERNAL_MEMORY_HANDLE_DESC mem_desc = {}; mem_desc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE; // ... CUresult result = cuImportExternalMemory(&external_memory, &mem_desc); ``` ### Step 3: Convert Mipmapped Array Operations ```cpp // Before (Runtime API) cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc); cudaGetMipmappedArrayLevel(&array, mipmapped_array, 0); // After (Driver API) cuExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc); cuMipmappedArrayGetLevel(&array, mipmapped_array, 0); ``` ### Step 4: Convert Memory Copy Operations **D3D12SurfaceHandler.cpp - CopyYPlane() / CopyUVPlane():** ```cpp // Before (Runtime API) cudaError_t err = cudaMemcpy2D( (void*)dst, dst_pitch, (void*)src, src_pitch, width, height, cudaMemcpyDeviceToDevice ); // After (Driver API) CUresult result = cuMemcpy2D(dst, dst_pitch, src, src_pitch, width, height); // Or async version: // CUresult result = cuMemcpy2DAsync(dst, dst_pitch, src, src_pitch, width, height, stream); ``` ### Step 5: Convert Cleanup Operations ```cpp // Before (Runtime API) cudaFreeMipmappedArray(mipmapped_array); cudaDestroyExternalMemory(external_memory); // After (Driver API) cuMipmappedArrayDestroy(mipmapped_array); cuDestroyExternalMemory(external_memory); ``` ## Testing Strategy ### Verification Points 1. **Build Success:** All type changes compile without errors 2. **Context Consistency:** Log context handles to verify single context usage 3. **Surface Creation:** Verify surface objects are created successfully 4. **Kernel Execution:** Confirm kernels can access surface objects 5. **Frame Output:** Validate decoded frames are rendered correctly ### Debug Logging Add diagnostic logs to verify context usage: ```cpp CUcontext current_ctx; cuCtxGetCurrent(¤t_ctx); LOGF_DEBUG("[Component] Using CUDA context: 0x%llX", (unsigned long long)current_ctx); ``` Compare context handles across: - NVDEC initialization - External memory import - Surface creation - Kernel launch All should show **the same context value**. ## Risk Mitigation ### Potential Issues 1. **API Signature Differences:** - Some Driver API functions have different parameter orders - Careful review of CUDA documentation required 2. **Error Handling:** - Runtime API returns `cudaError_t` - Driver API returns `CUresult` - Update error checking code accordingly 3. **Enum Value Changes:** - Runtime: `cudaExternalMemoryHandleTypeD3D12Resource` - Driver: `CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE` - Ensure correct enum mapping ### Rollback Plan If Driver API unification fails: - Git revert to current commit - Consider alternative: Create separate Runtime API context and synchronize - Document why unification was not viable ## Success Criteria ✅ All Runtime API calls converted to Driver API ✅ Build completes without errors ✅ Single CUDA context used throughout pipeline ✅ Surface objects successfully used in kernels ✅ No `CUDA_ERROR_INVALID_HANDLE` errors ✅ Frames decode and render correctly ## References - NVIDIA CUDA Driver API Documentation - CUDA Runtime API vs Driver API Comparison - NVDEC Programming Guide - Previous commit: 73d9d8d (original Runtime API implementation)