ened/video-v1

Fork 0

Files

ened f3fc17c796 에러 복구 메커니즘 강화 (슬롯 정리 로직 추가)

2025-10-07 04:03:15 +09:00

10 KiB

Raw Blame History

CUDA Driver API Complete Unification Design

Problem Analysis

Root Cause: Runtime API + Driver API Mixing

Current Issue:

CUDA_ERROR_INVALID_HANDLE occurs when kernel launches
Surface object created successfully but cannot be used in kernel
Context setting doesn't resolve the fundamental problem

Why Mixing APIs Causes Issues:

Different Context Management:
- Runtime API: Automatic context creation/management (implicit primary context)
- Driver API: Explicit context creation/management (requires cuCtxSetCurrent())
- Mixed usage can result in operations happening in different contexts
Handle Incompatibility:
- Runtime API handles: Bound to Runtime API context
- Driver API handles: Bound to Driver API context
- A surface created in Runtime context cannot be used in Driver context
NVDEC Constraint:
- NVDEC uses pure Driver API (cuvidCreateDecoder, cuvidMapVideoFrame, etc.)
- All subsequent CUDA operations must use the same Driver API context
- Mixing Runtime API breaks this constraint

Historical Context: How We Got Here

Original Implementation (commit 73d9d8d):

Fully Runtime API based
cudaSurfaceObject_t, cudaCreateSurfaceObject()
Worked in isolation but failed with NVDEC integration

Partial Fix Attempt:

Changed surface creation to Driver API (cuSurfObjectCreate())
Left external memory import as Runtime API
Result: Still mixing APIs → Still failing

Current State:

// Runtime API (wrong context)
cudaImportExternalMemory(&external_memory, &mem_desc);
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, ...);

// Driver API (NVDEC context)
cuSurfObjectCreate(&surface, &res_desc);
cuLaunchKernel(kernel, ...);

Solution: Complete Driver API Unification

Phase 1: API Conversion (Current Task)

Files to Modify:

ExternalMemoryCache.h - Update type signatures
ExternalMemoryCache.cpp - Convert all Runtime API calls
D3D12SurfaceHandler.cpp - Convert remaining Runtime API calls

Conversion Map:

Runtime API	Driver API	Notes
`cudaExternalMemory_t`	`CUDA_EXTERNAL_MEMORY_HANDLE_DESC` + `CUexternalMemory`	Type change required
`cudaImportExternalMemory()`	`cuImportExternalMemory()`	Direct replacement
`cudaExternalMemoryGetMappedMipmappedArray()`	`cuExternalMemoryGetMappedMipmappedArray()`	Direct replacement
`cudaGetMipmappedArrayLevel()`	`cuMipmappedArrayGetLevel()`	Direct replacement
`cudaMipmappedArray_t`	`CUmipmappedArray`	Type change
`cudaArray_t`	`CUarray`	Type change
`cudaMemcpy2D()`	`cuMemcpy2D()`	Async version: `cuMemcpy2DAsync()`
`cudaDestroyExternalMemory()`	`cuDestroyExternalMemory()`	Direct replacement
`cudaFreeMipmappedArray()`	`cuMipmappedArrayDestroy()`	Direct replacement

Phase 2: Context Verification (After Unification)

After complete Driver API unification:

Verify NVDEC context is properly passed to all components
Ensure cuCtxSetCurrent() is called before Driver API operations
Confirm all operations use the same context

Expected Benefits

✅ Single Context: All operations in NVDEC's Driver API context ✅ Handle Compatibility: All handles created/used in same context ✅ Clear Debugging: Unified API makes issues easier to diagnose ✅ NVDEC Alignment: Matches NVDEC's native API paradigm ✅ Stability: Eliminates context switching issues

Implementation Plan

Step 1: Update Type Definitions

ExternalMemoryCache.h:

// Before (Runtime API types)
struct CachedEntry {
    cudaExternalMemory_t external_memory;
    cudaMipmappedArray_t mipmapped_array;
    // ...
};

// After (Driver API types)
struct CachedEntry {
    CUexternalMemory external_memory;
    CUmipmappedArray mipmapped_array;
    // ...
};

Step 2: Convert External Memory Import

ExternalMemoryCache.cpp - ImportD3D12TextureAsSurface():

// Before (Runtime API)
cudaExternalMemoryHandleDesc mem_desc = {};
mem_desc.type = cudaExternalMemoryHandleTypeD3D12Resource;
// ...
cudaError_t err = cudaImportExternalMemory(&external_memory, &mem_desc);

// After (Driver API)
CUDA_EXTERNAL_MEMORY_HANDLE_DESC mem_desc = {};
mem_desc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE;
// ...
CUresult result = cuImportExternalMemory(&external_memory, &mem_desc);

Step 3: Convert Mipmapped Array Operations

// Before (Runtime API)
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cudaGetMipmappedArrayLevel(&array, mipmapped_array, 0);

// After (Driver API)
cuExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cuMipmappedArrayGetLevel(&array, mipmapped_array, 0);

Step 4: Convert Memory Copy Operations

D3D12SurfaceHandler.cpp - CopyYPlane() / CopyUVPlane():

// Before (Runtime API)
cudaError_t err = cudaMemcpy2D(
    (void*)dst, dst_pitch,
    (void*)src, src_pitch,
    width, height,
    cudaMemcpyDeviceToDevice
);

// After (Driver API)
CUresult result = cuMemcpy2D(dst, dst_pitch, src, src_pitch, width, height);
// Or async version:
// CUresult result = cuMemcpy2DAsync(dst, dst_pitch, src, src_pitch, width, height, stream);

Step 5: Convert Cleanup Operations

// Before (Runtime API)
cudaFreeMipmappedArray(mipmapped_array);
cudaDestroyExternalMemory(external_memory);

// After (Driver API)
cuMipmappedArrayDestroy(mipmapped_array);
cuDestroyExternalMemory(external_memory);

Implementation Status

✅ Phase 1: Driver API Unification - COMPLETED

Completed Tasks:

✅ ExternalMemoryCache.cpp - All Runtime API converted to Driver API
✅ D3D12SurfaceHandler.cpp - CopyYPlane/CopyUVPlane converted to Driver API
✅ Type definitions updated (CUexternalMemory, CUmipmappedArray, etc.)
✅ Build successful with Driver API unification

Key Changes:

cudaImportExternalMemory() → cuImportExternalMemory()
cudaExternalMemoryGetMappedMipmappedArray() → cuExternalMemoryGetMappedMipmappedArray()
cudaMemcpy2D() → cuMemcpy2D() with CUDA_MEMCPY2D struct
cudaGetMipmappedArrayLevel() → cuMipmappedArrayGetLevel()

✅ Phase 2: Error Recovery Mechanism - COMPLETED

Implemented Features:

✅ D3D12 Resource Cleanup:
- D3D12SurfaceHandler::ReleaseD3D12Resource() method
- External memory cache release on error
✅ Slot-based Resource Tracking:
- DecodeSlot structure enhanced with d3d12_texture and surface_object fields
- Resources tracked per-slot for proper cleanup
✅ NV12ToRGBAConverter Reset:
- Automatic reset on error to clean state
- Prevents resource accumulation across failed frames

Error Recovery Flow:

if (!copySuccess) {
    // 1. Release D3D12 texture from cache
    m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);

    // 2. Reset converter to clean state
    m_rgbaConverter.reset();
    m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();

    // 3. Clean slot state
    my_slot.d3d12_texture = nullptr;
    my_slot.surface_object = 0;
    my_slot.in_use.store(false);
}

❌ Phase 3: CUDA_ERROR_INVALID_HANDLE Investigation - IN PROGRESS

Current Issue:

Driver API unification completed successfully
Error recovery mechanism working correctly
BUT: CUDA_ERROR_INVALID_HANDLE persists during kernel launch

Verified:

✅ Surface object created successfully (surface=0x13)
✅ Kernel handle valid (kernel=000001DF7D337BC0)
✅ Stream handle valid (stream=000001DF073C08C0)
✅ Source RGBA pointer valid (src_rgba=0x130B200000)
❌ Kernel launch fails with CUDA_ERROR_INVALID_HANDLE

Hypothesis: The issue is NOT related to API mixing (fully unified to Driver API). Possible causes:

Surface object type mismatch in kernel argument passing
CUDA context issue despite Driver API unification
Kernel parameter pointer alignment or type issues

Context Management Analysis

Test Results:

Removed all cuCtxSetCurrent() calls - Error persists
This confirms context was already set correctly by cuCtxCreate()
NVDEC context is active throughout the pipeline

Conclusion: cuCtxSetCurrent() was unnecessary defensive programming. The real issue lies elsewhere.

Testing Strategy

Verification Points

✅ Build Success: All type changes compile without errors
✅ Error Recovery: Resources properly cleaned up on failure
✅ Context Consistency: Single NVDEC context used throughout
❌ Kernel Execution: Still fails with CUDA_ERROR_INVALID_HANDLE
❌ Frame Output: No frames decoded successfully

Debug Logging

Add diagnostic logs to verify context usage:

CUcontext current_ctx;
cuCtxGetCurrent(&current_ctx);
LOGF_DEBUG("[Component] Using CUDA context: 0x%llX", (unsigned long long)current_ctx);

Compare context handles across:

NVDEC initialization
External memory import
Surface creation
Kernel launch

All should show the same context value.

Risk Mitigation

Potential Issues

API Signature Differences:
- Some Driver API functions have different parameter orders
- Careful review of CUDA documentation required
Error Handling:
- Runtime API returns cudaError_t
- Driver API returns CUresult
- Update error checking code accordingly
Enum Value Changes:
- Runtime: cudaExternalMemoryHandleTypeD3D12Resource
- Driver: CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE
- Ensure correct enum mapping

Rollback Plan

If Driver API unification fails:

Git revert to current commit
Consider alternative: Create separate Runtime API context and synchronize
Document why unification was not viable

Success Criteria

Completed ✅

✅ All Runtime API calls converted to Driver API
✅ Build completes without errors
✅ Single CUDA context used throughout pipeline
✅ Error recovery mechanism implemented
✅ Resource cleanup on failure

In Progress ❌

❌ Surface objects successfully used in kernels
❌ No CUDA_ERROR_INVALID_HANDLE errors
❌ Frames decode and render correctly

Next Steps

Investigate kernel parameter passing for surface objects
Verify CUDA Driver API surface object compatibility with D3D12 external memory
Consider alternative approaches if Driver API surface objects incompatible

References

NVIDIA CUDA Driver API Documentation
CUDA Runtime API vs Driver API Comparison
NVDEC Programming Guide
Previous commit: 73d9d8d (original Runtime API implementation)

10 KiB Raw Blame History