ened/video-v1

Fork 0

Files

ened 23e7956375 CUDA Driver API called

2025-10-07 03:49:32 +09:00

7.4 KiB

Raw Blame History

CUDA Driver API Complete Unification Design

Problem Analysis

Root Cause: Runtime API + Driver API Mixing

Current Issue:

CUDA_ERROR_INVALID_HANDLE occurs when kernel launches
Surface object created successfully but cannot be used in kernel
Context setting doesn't resolve the fundamental problem

Why Mixing APIs Causes Issues:

Different Context Management:
- Runtime API: Automatic context creation/management (implicit primary context)
- Driver API: Explicit context creation/management (requires cuCtxSetCurrent())
- Mixed usage can result in operations happening in different contexts
Handle Incompatibility:
- Runtime API handles: Bound to Runtime API context
- Driver API handles: Bound to Driver API context
- A surface created in Runtime context cannot be used in Driver context
NVDEC Constraint:
- NVDEC uses pure Driver API (cuvidCreateDecoder, cuvidMapVideoFrame, etc.)
- All subsequent CUDA operations must use the same Driver API context
- Mixing Runtime API breaks this constraint

Historical Context: How We Got Here

Original Implementation (commit 73d9d8d):

Fully Runtime API based
cudaSurfaceObject_t, cudaCreateSurfaceObject()
Worked in isolation but failed with NVDEC integration

Partial Fix Attempt:

Changed surface creation to Driver API (cuSurfObjectCreate())
Left external memory import as Runtime API
Result: Still mixing APIs → Still failing

Current State:

// Runtime API (wrong context)
cudaImportExternalMemory(&external_memory, &mem_desc);
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, ...);

// Driver API (NVDEC context)
cuSurfObjectCreate(&surface, &res_desc);
cuLaunchKernel(kernel, ...);

Solution: Complete Driver API Unification

Phase 1: API Conversion (Current Task)

Files to Modify:

ExternalMemoryCache.h - Update type signatures
ExternalMemoryCache.cpp - Convert all Runtime API calls
D3D12SurfaceHandler.cpp - Convert remaining Runtime API calls

Conversion Map:

Runtime API	Driver API	Notes
`cudaExternalMemory_t`	`CUDA_EXTERNAL_MEMORY_HANDLE_DESC` + `CUexternalMemory`	Type change required
`cudaImportExternalMemory()`	`cuImportExternalMemory()`	Direct replacement
`cudaExternalMemoryGetMappedMipmappedArray()`	`cuExternalMemoryGetMappedMipmappedArray()`	Direct replacement
`cudaGetMipmappedArrayLevel()`	`cuMipmappedArrayGetLevel()`	Direct replacement
`cudaMipmappedArray_t`	`CUmipmappedArray`	Type change
`cudaArray_t`	`CUarray`	Type change
`cudaMemcpy2D()`	`cuMemcpy2D()`	Async version: `cuMemcpy2DAsync()`
`cudaDestroyExternalMemory()`	`cuDestroyExternalMemory()`	Direct replacement
`cudaFreeMipmappedArray()`	`cuMipmappedArrayDestroy()`	Direct replacement

Phase 2: Context Verification (After Unification)

After complete Driver API unification:

Verify NVDEC context is properly passed to all components
Ensure cuCtxSetCurrent() is called before Driver API operations
Confirm all operations use the same context

Expected Benefits

✅ Single Context: All operations in NVDEC's Driver API context ✅ Handle Compatibility: All handles created/used in same context ✅ Clear Debugging: Unified API makes issues easier to diagnose ✅ NVDEC Alignment: Matches NVDEC's native API paradigm ✅ Stability: Eliminates context switching issues

Implementation Plan

Step 1: Update Type Definitions

ExternalMemoryCache.h:

// Before (Runtime API types)
struct CachedEntry {
    cudaExternalMemory_t external_memory;
    cudaMipmappedArray_t mipmapped_array;
    // ...
};

// After (Driver API types)
struct CachedEntry {
    CUexternalMemory external_memory;
    CUmipmappedArray mipmapped_array;
    // ...
};

Step 2: Convert External Memory Import

ExternalMemoryCache.cpp - ImportD3D12TextureAsSurface():

// Before (Runtime API)
cudaExternalMemoryHandleDesc mem_desc = {};
mem_desc.type = cudaExternalMemoryHandleTypeD3D12Resource;
// ...
cudaError_t err = cudaImportExternalMemory(&external_memory, &mem_desc);

// After (Driver API)
CUDA_EXTERNAL_MEMORY_HANDLE_DESC mem_desc = {};
mem_desc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE;
// ...
CUresult result = cuImportExternalMemory(&external_memory, &mem_desc);

Step 3: Convert Mipmapped Array Operations

// Before (Runtime API)
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cudaGetMipmappedArrayLevel(&array, mipmapped_array, 0);

// After (Driver API)
cuExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cuMipmappedArrayGetLevel(&array, mipmapped_array, 0);

Step 4: Convert Memory Copy Operations

D3D12SurfaceHandler.cpp - CopyYPlane() / CopyUVPlane():

// Before (Runtime API)
cudaError_t err = cudaMemcpy2D(
    (void*)dst, dst_pitch,
    (void*)src, src_pitch,
    width, height,
    cudaMemcpyDeviceToDevice
);

// After (Driver API)
CUresult result = cuMemcpy2D(dst, dst_pitch, src, src_pitch, width, height);
// Or async version:
// CUresult result = cuMemcpy2DAsync(dst, dst_pitch, src, src_pitch, width, height, stream);

Step 5: Convert Cleanup Operations

// Before (Runtime API)
cudaFreeMipmappedArray(mipmapped_array);
cudaDestroyExternalMemory(external_memory);

// After (Driver API)
cuMipmappedArrayDestroy(mipmapped_array);
cuDestroyExternalMemory(external_memory);

Testing Strategy

Verification Points

Build Success: All type changes compile without errors
Context Consistency: Log context handles to verify single context usage
Surface Creation: Verify surface objects are created successfully
Kernel Execution: Confirm kernels can access surface objects
Frame Output: Validate decoded frames are rendered correctly

Debug Logging

Add diagnostic logs to verify context usage:

CUcontext current_ctx;
cuCtxGetCurrent(&current_ctx);
LOGF_DEBUG("[Component] Using CUDA context: 0x%llX", (unsigned long long)current_ctx);

Compare context handles across:

NVDEC initialization
External memory import
Surface creation
Kernel launch

All should show the same context value.

Risk Mitigation

Potential Issues

API Signature Differences:
- Some Driver API functions have different parameter orders
- Careful review of CUDA documentation required
Error Handling:
- Runtime API returns cudaError_t
- Driver API returns CUresult
- Update error checking code accordingly
Enum Value Changes:
- Runtime: cudaExternalMemoryHandleTypeD3D12Resource
- Driver: CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE
- Ensure correct enum mapping

Rollback Plan

If Driver API unification fails:

Git revert to current commit
Consider alternative: Create separate Runtime API context and synchronize
Document why unification was not viable

Success Criteria

✅ All Runtime API calls converted to Driver API ✅ Build completes without errors ✅ Single CUDA context used throughout pipeline ✅ Surface objects successfully used in kernels ✅ No CUDA_ERROR_INVALID_HANDLE errors ✅ Frames decode and render correctly

References

NVIDIA CUDA Driver API Documentation
CUDA Runtime API vs Driver API Comparison
NVDEC Programming Guide
Previous commit: 73d9d8d (original Runtime API implementation)

7.4 KiB Raw Blame History