Files
video-v1/vav2/docs/working/Driver_API_Unification_Design.md

237 lines
7.4 KiB
Markdown
Raw Normal View History

2025-10-07 03:49:32 +09:00
# CUDA Driver API Complete Unification Design
## Problem Analysis
### Root Cause: Runtime API + Driver API Mixing
**Current Issue:**
- `CUDA_ERROR_INVALID_HANDLE` occurs when kernel launches
- Surface object created successfully but cannot be used in kernel
- Context setting doesn't resolve the fundamental problem
**Why Mixing APIs Causes Issues:**
1. **Different Context Management:**
- Runtime API: Automatic context creation/management (implicit primary context)
- Driver API: Explicit context creation/management (requires `cuCtxSetCurrent()`)
- **Mixed usage can result in operations happening in different contexts**
2. **Handle Incompatibility:**
- Runtime API handles: Bound to Runtime API context
- Driver API handles: Bound to Driver API context
- **A surface created in Runtime context cannot be used in Driver context**
3. **NVDEC Constraint:**
- NVDEC uses **pure Driver API** (`cuvidCreateDecoder`, `cuvidMapVideoFrame`, etc.)
- All subsequent CUDA operations must use **the same Driver API context**
- Mixing Runtime API breaks this constraint
### Historical Context: How We Got Here
**Original Implementation (commit 73d9d8d):**
- Fully Runtime API based
- `cudaSurfaceObject_t`, `cudaCreateSurfaceObject()`
- Worked in isolation but failed with NVDEC integration
**Partial Fix Attempt:**
- Changed surface creation to Driver API (`cuSurfObjectCreate()`)
- Left external memory import as Runtime API
- **Result: Still mixing APIs → Still failing**
**Current State:**
```cpp
// Runtime API (wrong context)
cudaImportExternalMemory(&external_memory, &mem_desc);
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, ...);
// Driver API (NVDEC context)
cuSurfObjectCreate(&surface, &res_desc);
cuLaunchKernel(kernel, ...);
```
## Solution: Complete Driver API Unification
### Phase 1: API Conversion (Current Task)
**Files to Modify:**
1. `ExternalMemoryCache.h` - Update type signatures
2. `ExternalMemoryCache.cpp` - Convert all Runtime API calls
3. `D3D12SurfaceHandler.cpp` - Convert remaining Runtime API calls
**Conversion Map:**
| Runtime API | Driver API | Notes |
|-------------|-----------|-------|
| `cudaExternalMemory_t` | `CUDA_EXTERNAL_MEMORY_HANDLE_DESC` + `CUexternalMemory` | Type change required |
| `cudaImportExternalMemory()` | `cuImportExternalMemory()` | Direct replacement |
| `cudaExternalMemoryGetMappedMipmappedArray()` | `cuExternalMemoryGetMappedMipmappedArray()` | Direct replacement |
| `cudaGetMipmappedArrayLevel()` | `cuMipmappedArrayGetLevel()` | Direct replacement |
| `cudaMipmappedArray_t` | `CUmipmappedArray` | Type change |
| `cudaArray_t` | `CUarray` | Type change |
| `cudaMemcpy2D()` | `cuMemcpy2D()` | Async version: `cuMemcpy2DAsync()` |
| `cudaDestroyExternalMemory()` | `cuDestroyExternalMemory()` | Direct replacement |
| `cudaFreeMipmappedArray()` | `cuMipmappedArrayDestroy()` | Direct replacement |
### Phase 2: Context Verification (After Unification)
After complete Driver API unification:
1. Verify NVDEC context is properly passed to all components
2. Ensure `cuCtxSetCurrent()` is called before Driver API operations
3. Confirm all operations use the same context
### Expected Benefits
**Single Context:** All operations in NVDEC's Driver API context
**Handle Compatibility:** All handles created/used in same context
**Clear Debugging:** Unified API makes issues easier to diagnose
**NVDEC Alignment:** Matches NVDEC's native API paradigm
**Stability:** Eliminates context switching issues
## Implementation Plan
### Step 1: Update Type Definitions
**ExternalMemoryCache.h:**
```cpp
// Before (Runtime API types)
struct CachedEntry {
cudaExternalMemory_t external_memory;
cudaMipmappedArray_t mipmapped_array;
// ...
};
// After (Driver API types)
struct CachedEntry {
CUexternalMemory external_memory;
CUmipmappedArray mipmapped_array;
// ...
};
```
### Step 2: Convert External Memory Import
**ExternalMemoryCache.cpp - ImportD3D12TextureAsSurface():**
```cpp
// Before (Runtime API)
cudaExternalMemoryHandleDesc mem_desc = {};
mem_desc.type = cudaExternalMemoryHandleTypeD3D12Resource;
// ...
cudaError_t err = cudaImportExternalMemory(&external_memory, &mem_desc);
// After (Driver API)
CUDA_EXTERNAL_MEMORY_HANDLE_DESC mem_desc = {};
mem_desc.type = CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE;
// ...
CUresult result = cuImportExternalMemory(&external_memory, &mem_desc);
```
### Step 3: Convert Mipmapped Array Operations
```cpp
// Before (Runtime API)
cudaExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cudaGetMipmappedArrayLevel(&array, mipmapped_array, 0);
// After (Driver API)
cuExternalMemoryGetMappedMipmappedArray(&mipmapped_array, external_memory, &mipmap_desc);
cuMipmappedArrayGetLevel(&array, mipmapped_array, 0);
```
### Step 4: Convert Memory Copy Operations
**D3D12SurfaceHandler.cpp - CopyYPlane() / CopyUVPlane():**
```cpp
// Before (Runtime API)
cudaError_t err = cudaMemcpy2D(
(void*)dst, dst_pitch,
(void*)src, src_pitch,
width, height,
cudaMemcpyDeviceToDevice
);
// After (Driver API)
CUresult result = cuMemcpy2D(dst, dst_pitch, src, src_pitch, width, height);
// Or async version:
// CUresult result = cuMemcpy2DAsync(dst, dst_pitch, src, src_pitch, width, height, stream);
```
### Step 5: Convert Cleanup Operations
```cpp
// Before (Runtime API)
cudaFreeMipmappedArray(mipmapped_array);
cudaDestroyExternalMemory(external_memory);
// After (Driver API)
cuMipmappedArrayDestroy(mipmapped_array);
cuDestroyExternalMemory(external_memory);
```
## Testing Strategy
### Verification Points
1. **Build Success:** All type changes compile without errors
2. **Context Consistency:** Log context handles to verify single context usage
3. **Surface Creation:** Verify surface objects are created successfully
4. **Kernel Execution:** Confirm kernels can access surface objects
5. **Frame Output:** Validate decoded frames are rendered correctly
### Debug Logging
Add diagnostic logs to verify context usage:
```cpp
CUcontext current_ctx;
cuCtxGetCurrent(&current_ctx);
LOGF_DEBUG("[Component] Using CUDA context: 0x%llX", (unsigned long long)current_ctx);
```
Compare context handles across:
- NVDEC initialization
- External memory import
- Surface creation
- Kernel launch
All should show **the same context value**.
## Risk Mitigation
### Potential Issues
1. **API Signature Differences:**
- Some Driver API functions have different parameter orders
- Careful review of CUDA documentation required
2. **Error Handling:**
- Runtime API returns `cudaError_t`
- Driver API returns `CUresult`
- Update error checking code accordingly
3. **Enum Value Changes:**
- Runtime: `cudaExternalMemoryHandleTypeD3D12Resource`
- Driver: `CU_EXTERNAL_MEMORY_HANDLE_TYPE_D3D12_RESOURCE`
- Ensure correct enum mapping
### Rollback Plan
If Driver API unification fails:
- Git revert to current commit
- Consider alternative: Create separate Runtime API context and synchronize
- Document why unification was not viable
## Success Criteria
✅ All Runtime API calls converted to Driver API
✅ Build completes without errors
✅ Single CUDA context used throughout pipeline
✅ Surface objects successfully used in kernels
✅ No `CUDA_ERROR_INVALID_HANDLE` errors
✅ Frames decode and render correctly
## References
- NVIDIA CUDA Driver API Documentation
- CUDA Runtime API vs Driver API Comparison
- NVDEC Programming Guide
- Previous commit: 73d9d8d (original Runtime API implementation)