에러 복구 메커니즘 강화 (슬롯 정리 로직 추가)

This commit is contained in:
2025-10-07 04:03:15 +09:00
parent 23e7956375
commit f3fc17c796
5 changed files with 154 additions and 14 deletions

View File

@@ -167,15 +167,94 @@ cuMipmappedArrayDestroy(mipmapped_array);
cuDestroyExternalMemory(external_memory);
```
## Implementation Status
### ✅ Phase 1: Driver API Unification - COMPLETED
**Completed Tasks:**
1. ✅ ExternalMemoryCache.cpp - All Runtime API converted to Driver API
2. ✅ D3D12SurfaceHandler.cpp - CopyYPlane/CopyUVPlane converted to Driver API
3. ✅ Type definitions updated (CUexternalMemory, CUmipmappedArray, etc.)
4. ✅ Build successful with Driver API unification
**Key Changes:**
- `cudaImportExternalMemory()``cuImportExternalMemory()`
- `cudaExternalMemoryGetMappedMipmappedArray()``cuExternalMemoryGetMappedMipmappedArray()`
- `cudaMemcpy2D()``cuMemcpy2D()` with CUDA_MEMCPY2D struct
- `cudaGetMipmappedArrayLevel()``cuMipmappedArrayGetLevel()`
### ✅ Phase 2: Error Recovery Mechanism - COMPLETED
**Implemented Features:**
1.**D3D12 Resource Cleanup:**
- `D3D12SurfaceHandler::ReleaseD3D12Resource()` method
- External memory cache release on error
2.**Slot-based Resource Tracking:**
- `DecodeSlot` structure enhanced with `d3d12_texture` and `surface_object` fields
- Resources tracked per-slot for proper cleanup
3.**NV12ToRGBAConverter Reset:**
- Automatic reset on error to clean state
- Prevents resource accumulation across failed frames
**Error Recovery Flow:**
```cpp
if (!copySuccess) {
// 1. Release D3D12 texture from cache
m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
// 2. Reset converter to clean state
m_rgbaConverter.reset();
m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
// 3. Clean slot state
my_slot.d3d12_texture = nullptr;
my_slot.surface_object = 0;
my_slot.in_use.store(false);
}
```
### ❌ Phase 3: CUDA_ERROR_INVALID_HANDLE Investigation - IN PROGRESS
**Current Issue:**
- Driver API unification completed successfully
- Error recovery mechanism working correctly
- **BUT: `CUDA_ERROR_INVALID_HANDLE` persists during kernel launch**
**Verified:**
- ✅ Surface object created successfully (`surface=0x13`)
- ✅ Kernel handle valid (`kernel=000001DF7D337BC0`)
- ✅ Stream handle valid (`stream=000001DF073C08C0`)
- ✅ Source RGBA pointer valid (`src_rgba=0x130B200000`)
- ❌ Kernel launch fails with `CUDA_ERROR_INVALID_HANDLE`
**Hypothesis:**
The issue is NOT related to API mixing (fully unified to Driver API).
Possible causes:
1. Surface object type mismatch in kernel argument passing
2. CUDA context issue despite Driver API unification
3. Kernel parameter pointer alignment or type issues
### Context Management Analysis
**Test Results:**
- Removed all `cuCtxSetCurrent()` calls - Error persists
- This confirms context was already set correctly by `cuCtxCreate()`
- NVDEC context is active throughout the pipeline
**Conclusion:**
`cuCtxSetCurrent()` was unnecessary defensive programming. The real issue lies elsewhere.
## Testing Strategy
### Verification Points
1. **Build Success:** All type changes compile without errors
2. **Context Consistency:** Log context handles to verify single context usage
3. **Surface Creation:** Verify surface objects are created successfully
4. **Kernel Execution:** Confirm kernels can access surface objects
5. **Frame Output:** Validate decoded frames are rendered correctly
1. **Build Success:** All type changes compile without errors
2. **Error Recovery:** Resources properly cleaned up on failure
3. **Context Consistency:** Single NVDEC context used throughout
4. **Kernel Execution:** Still fails with CUDA_ERROR_INVALID_HANDLE
5. **Frame Output:** No frames decoded successfully
### Debug Logging
@@ -221,12 +300,22 @@ If Driver API unification fails:
## Success Criteria
✅ All Runtime API calls converted to Driver API
✅ Build completes without errors
✅ Single CUDA context used throughout pipeline
✅ Surface objects successfully used in kernels
✅ No `CUDA_ERROR_INVALID_HANDLE` errors
✅ Frames decode and render correctly
### Completed ✅
- ✅ All Runtime API calls converted to Driver API
- ✅ Build completes without errors
- ✅ Single CUDA context used throughout pipeline
- ✅ Error recovery mechanism implemented
- ✅ Resource cleanup on failure
### In Progress ❌
- ❌ Surface objects successfully used in kernels
- ❌ No `CUDA_ERROR_INVALID_HANDLE` errors
- ❌ Frames decode and render correctly
### Next Steps
1. Investigate kernel parameter passing for surface objects
2. Verify CUDA Driver API surface object compatibility with D3D12 external memory
3. Consider alternative approaches if Driver API surface objects incompatible
## References