Files
video-v1/vav2/docs/working/Driver_API_Unification_Design.md
2025-10-07 03:49:32 +09:00

7.4 KiB

CUDA Driver API Complete Unification Design

Problem Analysis

Root Cause: Runtime API + Driver API Mixing

Current Issue:

  • CUDA_ERROR_INVALID_HANDLE occurs when kernel launches
  • Surface object created successfully but cannot be used in kernel
  • Context setting doesn't resolve the fundamental problem

Why Mixing APIs Causes Issues:

  1. Different Context Management:

    • Runtime API: Automatic context creation/management (implicit primary context)
    • Driver API: Explicit context creation/management (requires cuCtxSetCurrent())
    • Mixed usage can result in operations happening in different contexts
  2. Handle Incompatibility:

    • Runtime API handles: Bound to Runtime API context
    • Driver API handles: Bound to Driver API context
    • A surface created in Runtime context cannot be used in Driver context
  3. NVDEC Constraint:

    • NVDEC uses pure Driver API (cuvidCreateDecoder, cuvidMapVideoFrame, etc.)
    • All subsequent CUDA operations must use the same Driver API context
    • Mixing Runtime API breaks this constraint

Historical Context: How We Got Here

Original Implementation (commit 73d9d8d):

  • Fully Runtime API based
  • cudaSurfaceObject_t, cudaCreateSurfaceObject()
  • Worked in isolation but failed with NVDEC integration

Partial Fix Attempt:

  • Changed surface creation to Driver API (cuSurfObjectCreate())
  • Left external memory import as Runtime API
  • Result: Still mixing APIs → Still failing

Current State:

// Runtime API (wrong context)
cudaImportExternalMemory(&external_memory, &mem_desc);
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, ...);

// Driver API (NVDEC context)
cuSurfObjectCreate(&surface, &res_desc);
cuLaunchKernel(kernel, ...);

Solution: Complete Driver API Unification

Phase 1: API Conversion (Current Task)

Files to Modify:

  1. ExternalMemoryCache.h - Update type signatures
  2. ExternalMemoryCache.cpp - Convert all Runtime API calls
  3. D3D12SurfaceHandler.cpp - Convert remaining Runtime API calls

Conversion Map:

Runtime API Driver API Notes
cudaExternalMemory_t CUDA_EXTERNAL_MEMORY_HANDLE_DESC + CUexternalMemory Type change required
cudaImportExternalMemory() cuImportExternalMemory() Direct replacement
cudaExternalMemoryGetMappedMipmappedArray() cuExternalMemoryGetMappedMipmappedArray() Direct replacement
cudaGetMipmappedArrayLevel() cuMipmappedArrayGetLevel() Direct replacement
cudaMipmappedArray_t CUmipmappedArray Type change
cudaArray_t CUarray Type change
cudaMemcpy2D() cuMemcpy2D() Async version: cuMemcpy2DAsync()
cudaDestroyExternalMemory() cuDestroyExternalMemory() Direct replacement
cudaFreeMipmappedArray() cuMipmappedArrayDestroy() Direct replacement

Phase 2: Context Verification (After Unification)

After complete Driver API unification:

  1. Verify NVDEC context is properly passed to all components
  2. Ensure cuCtxSetCurrent() is called before Driver API operations
  3. Confirm all operations use the same context

Expected Benefits

Single Context: All operations in NVDEC's Driver API context Handle Compatibility: All handles created/used in same context Clear Debugging: Unified API makes issues easier to diagnose NVDEC Alignment: Matches NVDEC's native API paradigm Stability: Eliminates context switching issues

Implementation Plan

Step 1: Update Type Definitions

ExternalMemoryCache.h:

// Before (Runtime API types)
struct CachedEntry {
    cudaExternalMemory_t external_memory;
    cudaMipmappedArray_t mipmapped_array;
    // ...
};

// After (Driver API types)
struct CachedEntry {
    CUexternalMemory external_memory;
    CUmipmappedArray mipmapped_array;
    // ...
};

Step 2: Convert External Memory Import

ExternalMemoryCache.cpp - ImportD3D12TextureAsSurface():

// Before (Runtime API)
cudaExternalMemoryHandleDesc mem_desc = {};
mem_desc.type = cudaExternalMemoryHandleTypeD3D12Resource;
// ...
cudaError_t err = cudaImportExternalMemory(&external_memory, &mem_desc);

// After (Driver API)
CUDA_EXTERNAL_MEMORY_HANDLE_DESC mem_desc = {};
mem_desc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE;
// ...
CUresult result = cuImportExternalMemory(&external_memory, &mem_desc);

Step 3: Convert Mipmapped Array Operations

// Before (Runtime API)
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cudaGetMipmappedArrayLevel(&array, mipmapped_array, 0);

// After (Driver API)
cuExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cuMipmappedArrayGetLevel(&array, mipmapped_array, 0);

Step 4: Convert Memory Copy Operations

D3D12SurfaceHandler.cpp - CopyYPlane() / CopyUVPlane():

// Before (Runtime API)
cudaError_t err = cudaMemcpy2D(
    (void*)dst, dst_pitch,
    (void*)src, src_pitch,
    width, height,
    cudaMemcpyDeviceToDevice
);

// After (Driver API)
CUresult result = cuMemcpy2D(dst, dst_pitch, src, src_pitch, width, height);
// Or async version:
// CUresult result = cuMemcpy2DAsync(dst, dst_pitch, src, src_pitch, width, height, stream);

Step 5: Convert Cleanup Operations

// Before (Runtime API)
cudaFreeMipmappedArray(mipmapped_array);
cudaDestroyExternalMemory(external_memory);

// After (Driver API)
cuMipmappedArrayDestroy(mipmapped_array);
cuDestroyExternalMemory(external_memory);

Testing Strategy

Verification Points

  1. Build Success: All type changes compile without errors
  2. Context Consistency: Log context handles to verify single context usage
  3. Surface Creation: Verify surface objects are created successfully
  4. Kernel Execution: Confirm kernels can access surface objects
  5. Frame Output: Validate decoded frames are rendered correctly

Debug Logging

Add diagnostic logs to verify context usage:

CUcontext current_ctx;
cuCtxGetCurrent(&current_ctx);
LOGF_DEBUG("[Component] Using CUDA context: 0x%llX", (unsigned long long)current_ctx);

Compare context handles across:

  • NVDEC initialization
  • External memory import
  • Surface creation
  • Kernel launch

All should show the same context value.

Risk Mitigation

Potential Issues

  1. API Signature Differences:

    • Some Driver API functions have different parameter orders
    • Careful review of CUDA documentation required
  2. Error Handling:

    • Runtime API returns cudaError_t
    • Driver API returns CUresult
    • Update error checking code accordingly
  3. Enum Value Changes:

    • Runtime: cudaExternalMemoryHandleTypeD3D12Resource
    • Driver: CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE
    • Ensure correct enum mapping

Rollback Plan

If Driver API unification fails:

  • Git revert to current commit
  • Consider alternative: Create separate Runtime API context and synchronize
  • Document why unification was not viable

Success Criteria

All Runtime API calls converted to Driver API Build completes without errors Single CUDA context used throughout pipeline Surface objects successfully used in kernels No CUDA_ERROR_INVALID_HANDLE errors Frames decode and render correctly

References

  • NVIDIA CUDA Driver API Documentation
  • CUDA Runtime API vs Driver API Comparison
  • NVDEC Programming Guide
  • Previous commit: 73d9d8d (original Runtime API implementation)