에러 복구 메커니즘 강화 (슬롯 정리 로직 추가)
This commit is contained in:
@@ -167,15 +167,94 @@ cuMipmappedArrayDestroy(mipmapped_array);
|
||||
cuDestroyExternalMemory(external_memory);
|
||||
```
|
||||
|
||||
## Implementation Status
|
||||
|
||||
### ✅ Phase 1: Driver API Unification - COMPLETED
|
||||
|
||||
**Completed Tasks:**
|
||||
1. ✅ ExternalMemoryCache.cpp - All Runtime API converted to Driver API
|
||||
2. ✅ D3D12SurfaceHandler.cpp - CopyYPlane/CopyUVPlane converted to Driver API
|
||||
3. ✅ Type definitions updated (CUexternalMemory, CUmipmappedArray, etc.)
|
||||
4. ✅ Build successful with Driver API unification
|
||||
|
||||
**Key Changes:**
|
||||
- `cudaImportExternalMemory()` → `cuImportExternalMemory()`
|
||||
- `cudaExternalMemoryGetMappedMipmappedArray()` → `cuExternalMemoryGetMappedMipmappedArray()`
|
||||
- `cudaMemcpy2D()` → `cuMemcpy2D()` with CUDA_MEMCPY2D struct
|
||||
- `cudaGetMipmappedArrayLevel()` → `cuMipmappedArrayGetLevel()`
|
||||
|
||||
### ✅ Phase 2: Error Recovery Mechanism - COMPLETED
|
||||
|
||||
**Implemented Features:**
|
||||
1. ✅ **D3D12 Resource Cleanup:**
|
||||
- `D3D12SurfaceHandler::ReleaseD3D12Resource()` method
|
||||
- External memory cache release on error
|
||||
|
||||
2. ✅ **Slot-based Resource Tracking:**
|
||||
- `DecodeSlot` structure enhanced with `d3d12_texture` and `surface_object` fields
|
||||
- Resources tracked per-slot for proper cleanup
|
||||
|
||||
3. ✅ **NV12ToRGBAConverter Reset:**
|
||||
- Automatic reset on error to clean state
|
||||
- Prevents resource accumulation across failed frames
|
||||
|
||||
**Error Recovery Flow:**
|
||||
```cpp
|
||||
if (!copySuccess) {
|
||||
// 1. Release D3D12 texture from cache
|
||||
m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
|
||||
|
||||
// 2. Reset converter to clean state
|
||||
m_rgbaConverter.reset();
|
||||
m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
|
||||
|
||||
// 3. Clean slot state
|
||||
my_slot.d3d12_texture = nullptr;
|
||||
my_slot.surface_object = 0;
|
||||
my_slot.in_use.store(false);
|
||||
}
|
||||
```
|
||||
|
||||
### ❌ Phase 3: CUDA_ERROR_INVALID_HANDLE Investigation - IN PROGRESS
|
||||
|
||||
**Current Issue:**
|
||||
- Driver API unification completed successfully
|
||||
- Error recovery mechanism working correctly
|
||||
- **BUT: `CUDA_ERROR_INVALID_HANDLE` persists during kernel launch**
|
||||
|
||||
**Verified:**
|
||||
- ✅ Surface object created successfully (`surface=0x13`)
|
||||
- ✅ Kernel handle valid (`kernel=000001DF7D337BC0`)
|
||||
- ✅ Stream handle valid (`stream=000001DF073C08C0`)
|
||||
- ✅ Source RGBA pointer valid (`src_rgba=0x130B200000`)
|
||||
- ❌ Kernel launch fails with `CUDA_ERROR_INVALID_HANDLE`
|
||||
|
||||
**Hypothesis:**
|
||||
The issue is NOT related to API mixing (fully unified to Driver API).
|
||||
Possible causes:
|
||||
1. Surface object type mismatch in kernel argument passing
|
||||
2. CUDA context issue despite Driver API unification
|
||||
3. Kernel parameter pointer alignment or type issues
|
||||
|
||||
### Context Management Analysis
|
||||
|
||||
**Test Results:**
|
||||
- Removed all `cuCtxSetCurrent()` calls - Error persists
|
||||
- This confirms context was already set correctly by `cuCtxCreate()`
|
||||
- NVDEC context is active throughout the pipeline
|
||||
|
||||
**Conclusion:**
|
||||
`cuCtxSetCurrent()` was unnecessary defensive programming. The real issue lies elsewhere.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Verification Points
|
||||
|
||||
1. **Build Success:** All type changes compile without errors
|
||||
2. **Context Consistency:** Log context handles to verify single context usage
|
||||
3. **Surface Creation:** Verify surface objects are created successfully
|
||||
4. **Kernel Execution:** Confirm kernels can access surface objects
|
||||
5. **Frame Output:** Validate decoded frames are rendered correctly
|
||||
1. ✅ **Build Success:** All type changes compile without errors
|
||||
2. ✅ **Error Recovery:** Resources properly cleaned up on failure
|
||||
3. ✅ **Context Consistency:** Single NVDEC context used throughout
|
||||
4. ❌ **Kernel Execution:** Still fails with CUDA_ERROR_INVALID_HANDLE
|
||||
5. ❌ **Frame Output:** No frames decoded successfully
|
||||
|
||||
### Debug Logging
|
||||
|
||||
@@ -221,12 +300,22 @@ If Driver API unification fails:
|
||||
|
||||
## Success Criteria
|
||||
|
||||
✅ All Runtime API calls converted to Driver API
|
||||
✅ Build completes without errors
|
||||
✅ Single CUDA context used throughout pipeline
|
||||
✅ Surface objects successfully used in kernels
|
||||
✅ No `CUDA_ERROR_INVALID_HANDLE` errors
|
||||
✅ Frames decode and render correctly
|
||||
### Completed ✅
|
||||
- ✅ All Runtime API calls converted to Driver API
|
||||
- ✅ Build completes without errors
|
||||
- ✅ Single CUDA context used throughout pipeline
|
||||
- ✅ Error recovery mechanism implemented
|
||||
- ✅ Resource cleanup on failure
|
||||
|
||||
### In Progress ❌
|
||||
- ❌ Surface objects successfully used in kernels
|
||||
- ❌ No `CUDA_ERROR_INVALID_HANDLE` errors
|
||||
- ❌ Frames decode and render correctly
|
||||
|
||||
### Next Steps
|
||||
1. Investigate kernel parameter passing for surface objects
|
||||
2. Verify CUDA Driver API surface object compatibility with D3D12 external memory
|
||||
3. Consider alternative approaches if Driver API surface objects incompatible
|
||||
|
||||
## References
|
||||
|
||||
|
||||
@@ -188,6 +188,16 @@ bool D3D12SurfaceHandler::SignalD3D12Fence(uint64_t fence_value)
|
||||
return true;
|
||||
}
|
||||
|
||||
void D3D12SurfaceHandler::ReleaseD3D12Resource(ID3D12Resource* resource)
|
||||
{
|
||||
if (!resource) {
|
||||
return;
|
||||
}
|
||||
|
||||
LOGF_DEBUG("[D3D12SurfaceHandler] Releasing D3D12 resource from cache");
|
||||
m_cache->Release(resource);
|
||||
}
|
||||
|
||||
bool D3D12SurfaceHandler::GetD3D12CUDAPointer(ID3D12Resource* resource, CUdeviceptr* out_ptr)
|
||||
{
|
||||
return m_cache->GetOrCreateExternalMemory(resource, out_ptr);
|
||||
@@ -328,8 +338,10 @@ bool D3D12SurfaceHandler::CopyRGBAToSurfaceViaKernel(CUsurfObject dst_surface,
|
||||
unsigned int grid_x = (width + 15) / 16;
|
||||
unsigned int grid_y = (height + 15) / 16;
|
||||
|
||||
LOGF_DEBUG("[D3D12SurfaceHandler] Launching kernel via Driver API: grid=(%u,%u), block=(%u,%u)",
|
||||
grid_x, grid_y, block_x, block_y);
|
||||
LOGF_DEBUG("[D3D12SurfaceHandler] Launching kernel: surface=0x%llX, src_rgba=0x%llX, width=%u, height=%u, pitch=%u",
|
||||
(unsigned long long)dst_surface, (unsigned long long)src_rgba, width, height, src_pitch);
|
||||
LOGF_DEBUG("[D3D12SurfaceHandler] Kernel params: grid=(%u,%u), block=(%u,%u), stream=%p, kernel=%p",
|
||||
grid_x, grid_y, block_x, block_y, stream, m_surfaceWriteKernel);
|
||||
|
||||
// Driver API Unified Implementation:
|
||||
// - Surface created with cuSurfObjectCreate (Driver API)
|
||||
|
||||
@@ -50,6 +50,9 @@ public:
|
||||
// Signal D3D12 fence from CUDA stream (not implemented yet)
|
||||
bool SignalD3D12Fence(uint64_t fence_value);
|
||||
|
||||
// Release D3D12 resource from external memory cache
|
||||
void ReleaseD3D12Resource(ID3D12Resource* resource);
|
||||
|
||||
private:
|
||||
// Get CUDA device pointer for D3D12 resource (uses cache)
|
||||
bool GetD3D12CUDAPointer(ID3D12Resource* resource, CUdeviceptr* out_ptr);
|
||||
|
||||
@@ -1391,6 +1391,9 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_
|
||||
// Copy to D3D12 surface (use target_surface from slot)
|
||||
ID3D12Resource* d3d12Resource = static_cast<ID3D12Resource*>(my_slot.target_surface);
|
||||
|
||||
// Track D3D12 texture in slot for cleanup on error
|
||||
my_slot.d3d12_texture = d3d12Resource;
|
||||
|
||||
// Check D3D12 texture format to determine if RGBA conversion is needed
|
||||
D3D12_RESOURCE_DESC desc = d3d12Resource->GetDesc();
|
||||
bool isRGBAFormat = (desc.Format == DXGI_FORMAT_R8G8B8A8_UNORM ||
|
||||
@@ -1451,6 +1454,26 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_
|
||||
|
||||
if (!copySuccess) {
|
||||
LOGF_ERROR("[DecodeToSurface] Frame copy failed");
|
||||
|
||||
// Cleanup D3D12 resources on error
|
||||
if (target_type == VAVCORE_SURFACE_D3D12_RESOURCE) {
|
||||
LOGF_DEBUG("[DecodeToSurface] Cleaning up D3D12 resources after error");
|
||||
|
||||
// Release D3D12 texture from external memory cache
|
||||
if (my_slot.d3d12_texture) {
|
||||
LOGF_DEBUG("[DecodeToSurface] Releasing D3D12 texture from cache");
|
||||
m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
|
||||
my_slot.d3d12_texture = nullptr;
|
||||
}
|
||||
|
||||
// Reset NV12ToRGBAConverter to clean state (forces reinitialization on next frame)
|
||||
if (m_rgbaConverter) {
|
||||
m_rgbaConverter.reset();
|
||||
m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
|
||||
LOGF_DEBUG("[DecodeToSurface] NV12ToRGBAConverter reset after error");
|
||||
}
|
||||
}
|
||||
|
||||
my_slot.in_use.store(false);
|
||||
m_returnCounter.fetch_add(1);
|
||||
return false;
|
||||
@@ -1470,9 +1493,14 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_
|
||||
output_frame.timestamp_seconds = static_cast<double>(m_framesDecoded) / 30.0;
|
||||
}
|
||||
|
||||
// 9. Release slot
|
||||
// 9. Release slot and cleanup resources
|
||||
{
|
||||
std::lock_guard<std::mutex> lock(my_slot.slot_mutex);
|
||||
|
||||
// Clear D3D12 resource tracking (resources are managed by external memory cache)
|
||||
my_slot.d3d12_texture = nullptr;
|
||||
my_slot.surface_object = 0;
|
||||
|
||||
my_slot.in_use.store(false);
|
||||
}
|
||||
|
||||
|
||||
@@ -15,6 +15,10 @@ namespace VavCore {
|
||||
class NV12ToRGBAConverter;
|
||||
}
|
||||
|
||||
// Forward declarations for D3D12 and CUDA types
|
||||
struct ID3D12Resource;
|
||||
typedef unsigned long long CUsurfObject;
|
||||
|
||||
// Forward declaration for CUDA external memory type
|
||||
typedef struct CUexternalMemory_st* cudaExternalMemory_t;
|
||||
typedef struct CUexternalSemaphore_st* cudaExternalSemaphore_t;
|
||||
@@ -167,6 +171,10 @@ private:
|
||||
// NVDEC information (set by HandlePictureDecode callback)
|
||||
int picture_index = -1; // CurrPicIdx from NVDEC (same as slot index)
|
||||
|
||||
// D3D12 resource tracking for cleanup on error
|
||||
ID3D12Resource* d3d12_texture = nullptr;
|
||||
CUsurfObject surface_object = 0;
|
||||
|
||||
// Synchronization primitives
|
||||
std::condition_variable frame_ready; // Signaled when decode complete
|
||||
std::mutex slot_mutex; // Protects this slot's state
|
||||
|
||||
Reference in New Issue
Block a user