Files
video-v1/vav2/docs/working/Driver_API_Unification_Design.md

326 lines
10 KiB
Markdown
Raw Permalink Normal View History

2025-10-07 03:49:32 +09:00
# CUDA Driver API Complete Unification Design
## Problem Analysis
### Root Cause: Runtime API + Driver API Mixing
**Current Issue:**
- `CUDA_ERROR_INVALID_HANDLE` occurs when kernel launches
- Surface object created successfully but cannot be used in kernel
- Context setting doesn't resolve the fundamental problem
**Why Mixing APIs Causes Issues:**
1. **Different Context Management:**
- Runtime API: Automatic context creation/management (implicit primary context)
- Driver API: Explicit context creation/management (requires `cuCtxSetCurrent()`)
- **Mixed usage can result in operations happening in different contexts**
2. **Handle Incompatibility:**
- Runtime API handles: Bound to Runtime API context
- Driver API handles: Bound to Driver API context
- **A surface created in Runtime context cannot be used in Driver context**
3. **NVDEC Constraint:**
- NVDEC uses **pure Driver API** (`cuvidCreateDecoder`, `cuvidMapVideoFrame`, etc.)
- All subsequent CUDA operations must use **the same Driver API context**
- Mixing Runtime API breaks this constraint
### Historical Context: How We Got Here
**Original Implementation (commit 73d9d8d):**
- Fully Runtime API based
- `cudaSurfaceObject_t`, `cudaCreateSurfaceObject()`
- Worked in isolation but failed with NVDEC integration
**Partial Fix Attempt:**
- Changed surface creation to Driver API (`cuSurfObjectCreate()`)
- Left external memory import as Runtime API
- **Result: Still mixing APIs → Still failing**
**Current State:**
```cpp
// Runtime API (wrong context)
cudaImportExternalMemory(&external_memory, &mem_desc);
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, ...);
// Driver API (NVDEC context)
cuSurfObjectCreate(&surface, &res_desc);
cuLaunchKernel(kernel, ...);
```
## Solution: Complete Driver API Unification
### Phase 1: API Conversion (Current Task)
**Files to Modify:**
1. `ExternalMemoryCache.h` - Update type signatures
2. `ExternalMemoryCache.cpp` - Convert all Runtime API calls
3. `D3D12SurfaceHandler.cpp` - Convert remaining Runtime API calls
**Conversion Map:**
| Runtime API | Driver API | Notes |
|-------------|-----------|-------|
| `cudaExternalMemory_t` | `CUDA_EXTERNAL_MEMORY_HANDLE_DESC` + `CUexternalMemory` | Type change required |
| `cudaImportExternalMemory()` | `cuImportExternalMemory()` | Direct replacement |
| `cudaExternalMemoryGetMappedMipmappedArray()` | `cuExternalMemoryGetMappedMipmappedArray()` | Direct replacement |
| `cudaGetMipmappedArrayLevel()` | `cuMipmappedArrayGetLevel()` | Direct replacement |
| `cudaMipmappedArray_t` | `CUmipmappedArray` | Type change |
| `cudaArray_t` | `CUarray` | Type change |
| `cudaMemcpy2D()` | `cuMemcpy2D()` | Async version: `cuMemcpy2DAsync()` |
| `cudaDestroyExternalMemory()` | `cuDestroyExternalMemory()` | Direct replacement |
| `cudaFreeMipmappedArray()` | `cuMipmappedArrayDestroy()` | Direct replacement |
### Phase 2: Context Verification (After Unification)
After complete Driver API unification:
1. Verify NVDEC context is properly passed to all components
2. Ensure `cuCtxSetCurrent()` is called before Driver API operations
3. Confirm all operations use the same context
### Expected Benefits
**Single Context:** All operations in NVDEC's Driver API context
**Handle Compatibility:** All handles created/used in same context
**Clear Debugging:** Unified API makes issues easier to diagnose
**NVDEC Alignment:** Matches NVDEC's native API paradigm
**Stability:** Eliminates context switching issues
## Implementation Plan
### Step 1: Update Type Definitions
**ExternalMemoryCache.h:**
```cpp
// Before (Runtime API types)
struct CachedEntry {
cudaExternalMemory_t external_memory;
cudaMipmappedArray_t mipmapped_array;
// ...
};
// After (Driver API types)
struct CachedEntry {
CUexternalMemory external_memory;
CUmipmappedArray mipmapped_array;
// ...
};
```
### Step 2: Convert External Memory Import
**ExternalMemoryCache.cpp - ImportD3D12TextureAsSurface():**
```cpp
// Before (Runtime API)
cudaExternalMemoryHandleDesc mem_desc = {};
mem_desc.type = cudaExternalMemoryHandleTypeD3D12Resource;
// ...
cudaError_t err = cudaImportExternalMemory(&external_memory, &mem_desc);
// After (Driver API)
CUDA_EXTERNAL_MEMORY_HANDLE_DESC mem_desc = {};
mem_desc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE;
// ...
CUresult result = cuImportExternalMemory(&external_memory, &mem_desc);
```
### Step 3: Convert Mipmapped Array Operations
```cpp
// Before (Runtime API)
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cudaGetMipmappedArrayLevel(&array, mipmapped_array, 0);
// After (Driver API)
cuExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cuMipmappedArrayGetLevel(&array, mipmapped_array, 0);
```
### Step 4: Convert Memory Copy Operations
**D3D12SurfaceHandler.cpp - CopyYPlane() / CopyUVPlane():**
```cpp
// Before (Runtime API)
cudaError_t err = cudaMemcpy2D(
(void*)dst, dst_pitch,
(void*)src, src_pitch,
width, height,
cudaMemcpyDeviceToDevice
);
// After (Driver API)
CUresult result = cuMemcpy2D(dst, dst_pitch, src, src_pitch, width, height);
// Or async version:
// CUresult result = cuMemcpy2DAsync(dst, dst_pitch, src, src_pitch, width, height, stream);
```
### Step 5: Convert Cleanup Operations
```cpp
// Before (Runtime API)
cudaFreeMipmappedArray(mipmapped_array);
cudaDestroyExternalMemory(external_memory);
// After (Driver API)
cuMipmappedArrayDestroy(mipmapped_array);
cuDestroyExternalMemory(external_memory);
```
## Implementation Status
### ✅ Phase 1: Driver API Unification - COMPLETED
**Completed Tasks:**
1. ✅ ExternalMemoryCache.cpp - All Runtime API converted to Driver API
2. ✅ D3D12SurfaceHandler.cpp - CopyYPlane/CopyUVPlane converted to Driver API
3. ✅ Type definitions updated (CUexternalMemory, CUmipmappedArray, etc.)
4. ✅ Build successful with Driver API unification
**Key Changes:**
- `cudaImportExternalMemory()``cuImportExternalMemory()`
- `cudaExternalMemoryGetMappedMipmappedArray()``cuExternalMemoryGetMappedMipmappedArray()`
- `cudaMemcpy2D()``cuMemcpy2D()` with CUDA_MEMCPY2D struct
- `cudaGetMipmappedArrayLevel()``cuMipmappedArrayGetLevel()`
### ✅ Phase 2: Error Recovery Mechanism - COMPLETED
**Implemented Features:**
1.**D3D12 Resource Cleanup:**
- `D3D12SurfaceHandler::ReleaseD3D12Resource()` method
- External memory cache release on error
2.**Slot-based Resource Tracking:**
- `DecodeSlot` structure enhanced with `d3d12_texture` and `surface_object` fields
- Resources tracked per-slot for proper cleanup
3.**NV12ToRGBAConverter Reset:**
- Automatic reset on error to clean state
- Prevents resource accumulation across failed frames
**Error Recovery Flow:**
```cpp
if (!copySuccess) {
// 1. Release D3D12 texture from cache
m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
// 2. Reset converter to clean state
m_rgbaConverter.reset();
m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
// 3. Clean slot state
my_slot.d3d12_texture = nullptr;
my_slot.surface_object = 0;
my_slot.in_use.store(false);
}
```
### ❌ Phase 3: CUDA_ERROR_INVALID_HANDLE Investigation - IN PROGRESS
**Current Issue:**
- Driver API unification completed successfully
- Error recovery mechanism working correctly
- **BUT: `CUDA_ERROR_INVALID_HANDLE` persists during kernel launch**
**Verified:**
- ✅ Surface object created successfully (`surface=0x13`)
- ✅ Kernel handle valid (`kernel=000001DF7D337BC0`)
- ✅ Stream handle valid (`stream=000001DF073C08C0`)
- ✅ Source RGBA pointer valid (`src_rgba=0x130B200000`)
- ❌ Kernel launch fails with `CUDA_ERROR_INVALID_HANDLE`
**Hypothesis:**
The issue is NOT related to API mixing (fully unified to Driver API).
Possible causes:
1. Surface object type mismatch in kernel argument passing
2. CUDA context issue despite Driver API unification
3. Kernel parameter pointer alignment or type issues
### Context Management Analysis
**Test Results:**
- Removed all `cuCtxSetCurrent()` calls - Error persists
- This confirms context was already set correctly by `cuCtxCreate()`
- NVDEC context is active throughout the pipeline
**Conclusion:**
`cuCtxSetCurrent()` was unnecessary defensive programming. The real issue lies elsewhere.
2025-10-07 03:49:32 +09:00
## Testing Strategy
### Verification Points
1.**Build Success:** All type changes compile without errors
2.**Error Recovery:** Resources properly cleaned up on failure
3.**Context Consistency:** Single NVDEC context used throughout
4.**Kernel Execution:** Still fails with CUDA_ERROR_INVALID_HANDLE
5.**Frame Output:** No frames decoded successfully
2025-10-07 03:49:32 +09:00
### Debug Logging
Add diagnostic logs to verify context usage:
```cpp
CUcontext current_ctx;
cuCtxGetCurrent(&current_ctx);
LOGF_DEBUG("[Component] Using CUDA context: 0x%llX", (unsigned long long)current_ctx);
```
Compare context handles across:
- NVDEC initialization
- External memory import
- Surface creation
- Kernel launch
All should show **the same context value**.
## Risk Mitigation
### Potential Issues
1. **API Signature Differences:**
- Some Driver API functions have different parameter orders
- Careful review of CUDA documentation required
2. **Error Handling:**
- Runtime API returns `cudaError_t`
- Driver API returns `CUresult`
- Update error checking code accordingly
3. **Enum Value Changes:**
- Runtime: `cudaExternalMemoryHandleTypeD3D12Resource`
- Driver: `CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE`
- Ensure correct enum mapping
### Rollback Plan
If Driver API unification fails:
- Git revert to current commit
- Consider alternative: Create separate Runtime API context and synchronize
- Document why unification was not viable
## Success Criteria
### Completed ✅
- ✅ All Runtime API calls converted to Driver API
- ✅ Build completes without errors
- ✅ Single CUDA context used throughout pipeline
- ✅ Error recovery mechanism implemented
- ✅ Resource cleanup on failure
### In Progress ❌
- ❌ Surface objects successfully used in kernels
- ❌ No `CUDA_ERROR_INVALID_HANDLE` errors
- ❌ Frames decode and render correctly
### Next Steps
1. Investigate kernel parameter passing for surface objects
2. Verify CUDA Driver API surface object compatibility with D3D12 external memory
3. Consider alternative approaches if Driver API surface objects incompatible
2025-10-07 03:49:32 +09:00
## References
- NVIDIA CUDA Driver API Documentation
- CUDA Runtime API vs Driver API Comparison
- NVDEC Programming Guide
- Previous commit: 73d9d8d (original Runtime API implementation)