에러 복구 메커니즘 강화 (슬롯 정리 로직 추가)
This commit is contained in:
@@ -167,15 +167,94 @@ cuMipmappedArrayDestroy(mipmapped_array);
|
||||
cuDestroyExternalMemory(external_memory);
|
||||
```
|
||||
|
||||
## Implementation Status
|
||||
|
||||
### ✅ Phase 1: Driver API Unification - COMPLETED
|
||||
|
||||
**Completed Tasks:**
|
||||
1. ✅ ExternalMemoryCache.cpp - All Runtime API converted to Driver API
|
||||
2. ✅ D3D12SurfaceHandler.cpp - CopyYPlane/CopyUVPlane converted to Driver API
|
||||
3. ✅ Type definitions updated (CUexternalMemory, CUmipmappedArray, etc.)
|
||||
4. ✅ Build successful with Driver API unification
|
||||
|
||||
**Key Changes:**
|
||||
- `cudaImportExternalMemory()` → `cuImportExternalMemory()`
|
||||
- `cudaExternalMemoryGetMappedMipmappedArray()` → `cuExternalMemoryGetMappedMipmappedArray()`
|
||||
- `cudaMemcpy2D()` → `cuMemcpy2D()` with CUDA_MEMCPY2D struct
|
||||
- `cudaGetMipmappedArrayLevel()` → `cuMipmappedArrayGetLevel()`
|
||||
|
||||
### ✅ Phase 2: Error Recovery Mechanism - COMPLETED
|
||||
|
||||
**Implemented Features:**
|
||||
1. ✅ **D3D12 Resource Cleanup:**
|
||||
- `D3D12SurfaceHandler::ReleaseD3D12Resource()` method
|
||||
- External memory cache release on error
|
||||
|
||||
2. ✅ **Slot-based Resource Tracking:**
|
||||
- `DecodeSlot` structure enhanced with `d3d12_texture` and `surface_object` fields
|
||||
- Resources tracked per-slot for proper cleanup
|
||||
|
||||
3. ✅ **NV12ToRGBAConverter Reset:**
|
||||
- Automatic reset on error to clean state
|
||||
- Prevents resource accumulation across failed frames
|
||||
|
||||
**Error Recovery Flow:**
|
||||
```cpp
|
||||
if (!copySuccess) {
|
||||
// 1. Release D3D12 texture from cache
|
||||
m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
|
||||
|
||||
// 2. Reset converter to clean state
|
||||
m_rgbaConverter.reset();
|
||||
m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
|
||||
|
||||
// 3. Clean slot state
|
||||
my_slot.d3d12_texture = nullptr;
|
||||
my_slot.surface_object = 0;
|
||||
my_slot.in_use.store(false);
|
||||
}
|
||||
```
|
||||
|
||||
### ❌ Phase 3: CUDA_ERROR_INVALID_HANDLE Investigation - IN PROGRESS
|
||||
|
||||
**Current Issue:**
|
||||
- Driver API unification completed successfully
|
||||
- Error recovery mechanism working correctly
|
||||
- **BUT: `CUDA_ERROR_INVALID_HANDLE` persists during kernel launch**
|
||||
|
||||
**Verified:**
|
||||
- ✅ Surface object created successfully (`surface=0x13`)
|
||||
- ✅ Kernel handle valid (`kernel=000001DF7D337BC0`)
|
||||
- ✅ Stream handle valid (`stream=000001DF073C08C0`)
|
||||
- ✅ Source RGBA pointer valid (`src_rgba=0x130B200000`)
|
||||
- ❌ Kernel launch fails with `CUDA_ERROR_INVALID_HANDLE`
|
||||
|
||||
**Hypothesis:**
|
||||
The issue is NOT related to API mixing (fully unified to Driver API).
|
||||
Possible causes:
|
||||
1. Surface object type mismatch in kernel argument passing
|
||||
2. CUDA context issue despite Driver API unification
|
||||
3. Kernel parameter pointer alignment or type issues
|
||||
|
||||
### Context Management Analysis
|
||||
|
||||
**Test Results:**
|
||||
- Removed all `cuCtxSetCurrent()` calls - Error persists
|
||||
- This confirms context was already set correctly by `cuCtxCreate()`
|
||||
- NVDEC context is active throughout the pipeline
|
||||
|
||||
**Conclusion:**
|
||||
`cuCtxSetCurrent()` was unnecessary defensive programming. The real issue lies elsewhere.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Verification Points
|
||||
|
||||
1. **Build Success:** All type changes compile without errors
|
||||
2. **Context Consistency:** Log context handles to verify single context usage
|
||||
3. **Surface Creation:** Verify surface objects are created successfully
|
||||
4. **Kernel Execution:** Confirm kernels can access surface objects
|
||||
5. **Frame Output:** Validate decoded frames are rendered correctly
|
||||
1. ✅ **Build Success:** All type changes compile without errors
|
||||
2. ✅ **Error Recovery:** Resources properly cleaned up on failure
|
||||
3. ✅ **Context Consistency:** Single NVDEC context used throughout
|
||||
4. ❌ **Kernel Execution:** Still fails with CUDA_ERROR_INVALID_HANDLE
|
||||
5. ❌ **Frame Output:** No frames decoded successfully
|
||||
|
||||
### Debug Logging
|
||||
|
||||
@@ -221,12 +300,22 @@ If Driver API unification fails:
|
||||
|
||||
## Success Criteria
|
||||
|
||||
✅ All Runtime API calls converted to Driver API
|
||||
✅ Build completes without errors
|
||||
✅ Single CUDA context used throughout pipeline
|
||||
✅ Surface objects successfully used in kernels
|
||||
✅ No `CUDA_ERROR_INVALID_HANDLE` errors
|
||||
✅ Frames decode and render correctly
|
||||
### Completed ✅
|
||||
- ✅ All Runtime API calls converted to Driver API
|
||||
- ✅ Build completes without errors
|
||||
- ✅ Single CUDA context used throughout pipeline
|
||||
- ✅ Error recovery mechanism implemented
|
||||
- ✅ Resource cleanup on failure
|
||||
|
||||
### In Progress ❌
|
||||
- ❌ Surface objects successfully used in kernels
|
||||
- ❌ No `CUDA_ERROR_INVALID_HANDLE` errors
|
||||
- ❌ Frames decode and render correctly
|
||||
|
||||
### Next Steps
|
||||
1. Investigate kernel parameter passing for surface objects
|
||||
2. Verify CUDA Driver API surface object compatibility with D3D12 external memory
|
||||
3. Consider alternative approaches if Driver API surface objects incompatible
|
||||
|
||||
## References
|
||||
|
||||
|
||||
Reference in New Issue
Block a user