에러 복구 메커니즘 강화 (슬롯 정리 로직 추가)

This commit is contained in:
2025-10-07 04:03:15 +09:00
parent 23e7956375
commit f3fc17c796
5 changed files with 154 additions and 14 deletions

View File

@@ -167,15 +167,94 @@ cuMipmappedArrayDestroy(mipmapped_array);
cuDestroyExternalMemory(external_memory);
```
## Implementation Status
### ✅ Phase 1: Driver API Unification - COMPLETED
**Completed Tasks:**
1. ✅ ExternalMemoryCache.cpp - All Runtime API converted to Driver API
2. ✅ D3D12SurfaceHandler.cpp - CopyYPlane/CopyUVPlane converted to Driver API
3. ✅ Type definitions updated (CUexternalMemory, CUmipmappedArray, etc.)
4. ✅ Build successful with Driver API unification
**Key Changes:**
- `cudaImportExternalMemory()``cuImportExternalMemory()`
- `cudaExternalMemoryGetMappedMipmappedArray()``cuExternalMemoryGetMappedMipmappedArray()`
- `cudaMemcpy2D()``cuMemcpy2D()` with CUDA_MEMCPY2D struct
- `cudaGetMipmappedArrayLevel()``cuMipmappedArrayGetLevel()`
### ✅ Phase 2: Error Recovery Mechanism - COMPLETED
**Implemented Features:**
1.**D3D12 Resource Cleanup:**
- `D3D12SurfaceHandler::ReleaseD3D12Resource()` method
- External memory cache release on error
2.**Slot-based Resource Tracking:**
- `DecodeSlot` structure enhanced with `d3d12_texture` and `surface_object` fields
- Resources tracked per-slot for proper cleanup
3.**NV12ToRGBAConverter Reset:**
- Automatic reset on error to clean state
- Prevents resource accumulation across failed frames
**Error Recovery Flow:**
```cpp
if (!copySuccess) {
// 1. Release D3D12 texture from cache
m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
// 2. Reset converter to clean state
m_rgbaConverter.reset();
m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
// 3. Clean slot state
my_slot.d3d12_texture = nullptr;
my_slot.surface_object = 0;
my_slot.in_use.store(false);
}
```
### ❌ Phase 3: CUDA_ERROR_INVALID_HANDLE Investigation - IN PROGRESS
**Current Issue:**
- Driver API unification completed successfully
- Error recovery mechanism working correctly
- **BUT: `CUDA_ERROR_INVALID_HANDLE` persists during kernel launch**
**Verified:**
- ✅ Surface object created successfully (`surface=0x13`)
- ✅ Kernel handle valid (`kernel=000001DF7D337BC0`)
- ✅ Stream handle valid (`stream=000001DF073C08C0`)
- ✅ Source RGBA pointer valid (`src_rgba=0x130B200000`)
- ❌ Kernel launch fails with `CUDA_ERROR_INVALID_HANDLE`
**Hypothesis:**
The issue is NOT related to API mixing (fully unified to Driver API).
Possible causes:
1. Surface object type mismatch in kernel argument passing
2. CUDA context issue despite Driver API unification
3. Kernel parameter pointer alignment or type issues
### Context Management Analysis
**Test Results:**
- Removed all `cuCtxSetCurrent()` calls - Error persists
- This confirms context was already set correctly by `cuCtxCreate()`
- NVDEC context is active throughout the pipeline
**Conclusion:**
`cuCtxSetCurrent()` was unnecessary defensive programming. The real issue lies elsewhere.
## Testing Strategy
### Verification Points
1. **Build Success:** All type changes compile without errors
2. **Context Consistency:** Log context handles to verify single context usage
3. **Surface Creation:** Verify surface objects are created successfully
4. **Kernel Execution:** Confirm kernels can access surface objects
5. **Frame Output:** Validate decoded frames are rendered correctly
1. **Build Success:** All type changes compile without errors
2. **Error Recovery:** Resources properly cleaned up on failure
3. **Context Consistency:** Single NVDEC context used throughout
4. **Kernel Execution:** Still fails with CUDA_ERROR_INVALID_HANDLE
5. **Frame Output:** No frames decoded successfully
### Debug Logging
@@ -221,12 +300,22 @@ If Driver API unification fails:
## Success Criteria
✅ All Runtime API calls converted to Driver API
✅ Build completes without errors
✅ Single CUDA context used throughout pipeline
✅ Surface objects successfully used in kernels
✅ No `CUDA_ERROR_INVALID_HANDLE` errors
✅ Frames decode and render correctly
### Completed ✅
- ✅ All Runtime API calls converted to Driver API
- ✅ Build completes without errors
- ✅ Single CUDA context used throughout pipeline
- ✅ Error recovery mechanism implemented
- ✅ Resource cleanup on failure
### In Progress ❌
- ❌ Surface objects successfully used in kernels
- ❌ No `CUDA_ERROR_INVALID_HANDLE` errors
- ❌ Frames decode and render correctly
### Next Steps
1. Investigate kernel parameter passing for surface objects
2. Verify CUDA Driver API surface object compatibility with D3D12 external memory
3. Consider alternative approaches if Driver API surface objects incompatible
## References

View File

@@ -188,6 +188,16 @@ bool D3D12SurfaceHandler::SignalD3D12Fence(uint64_t fence_value)
return true;
}
void D3D12SurfaceHandler::ReleaseD3D12Resource(ID3D12Resource* resource)
{
if (!resource) {
return;
}
LOGF_DEBUG("[D3D12SurfaceHandler] Releasing D3D12 resource from cache");
m_cache->Release(resource);
}
bool D3D12SurfaceHandler::GetD3D12CUDAPointer(ID3D12Resource* resource, CUdeviceptr* out_ptr)
{
return m_cache->GetOrCreateExternalMemory(resource, out_ptr);
@@ -328,8 +338,10 @@ bool D3D12SurfaceHandler::CopyRGBAToSurfaceViaKernel(CUsurfObject dst_surface,
unsigned int grid_x = (width + 15) / 16;
unsigned int grid_y = (height + 15) / 16;
LOGF_DEBUG("[D3D12SurfaceHandler] Launching kernel via Driver API: grid=(%u,%u), block=(%u,%u)",
grid_x, grid_y, block_x, block_y);
LOGF_DEBUG("[D3D12SurfaceHandler] Launching kernel: surface=0x%llX, src_rgba=0x%llX, width=%u, height=%u, pitch=%u",
(unsigned long long)dst_surface, (unsigned long long)src_rgba, width, height, src_pitch);
LOGF_DEBUG("[D3D12SurfaceHandler] Kernel params: grid=(%u,%u), block=(%u,%u), stream=%p, kernel=%p",
grid_x, grid_y, block_x, block_y, stream, m_surfaceWriteKernel);
// Driver API Unified Implementation:
// - Surface created with cuSurfObjectCreate (Driver API)

View File

@@ -50,6 +50,9 @@ public:
// Signal D3D12 fence from CUDA stream (not implemented yet)
bool SignalD3D12Fence(uint64_t fence_value);
// Release D3D12 resource from external memory cache
void ReleaseD3D12Resource(ID3D12Resource* resource);
private:
// Get CUDA device pointer for D3D12 resource (uses cache)
bool GetD3D12CUDAPointer(ID3D12Resource* resource, CUdeviceptr* out_ptr);

View File

@@ -1391,6 +1391,9 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_
// Copy to D3D12 surface (use target_surface from slot)
ID3D12Resource* d3d12Resource = static_cast<ID3D12Resource*>(my_slot.target_surface);
// Track D3D12 texture in slot for cleanup on error
my_slot.d3d12_texture = d3d12Resource;
// Check D3D12 texture format to determine if RGBA conversion is needed
D3D12_RESOURCE_DESC desc = d3d12Resource->GetDesc();
bool isRGBAFormat = (desc.Format == DXGI_FORMAT_R8G8B8A8_UNORM ||
@@ -1451,6 +1454,26 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_
if (!copySuccess) {
LOGF_ERROR("[DecodeToSurface] Frame copy failed");
// Cleanup D3D12 resources on error
if (target_type == VAVCORE_SURFACE_D3D12_RESOURCE) {
LOGF_DEBUG("[DecodeToSurface] Cleaning up D3D12 resources after error");
// Release D3D12 texture from external memory cache
if (my_slot.d3d12_texture) {
LOGF_DEBUG("[DecodeToSurface] Releasing D3D12 texture from cache");
m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
my_slot.d3d12_texture = nullptr;
}
// Reset NV12ToRGBAConverter to clean state (forces reinitialization on next frame)
if (m_rgbaConverter) {
m_rgbaConverter.reset();
m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
LOGF_DEBUG("[DecodeToSurface] NV12ToRGBAConverter reset after error");
}
}
my_slot.in_use.store(false);
m_returnCounter.fetch_add(1);
return false;
@@ -1470,9 +1493,14 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_
output_frame.timestamp_seconds = static_cast<double>(m_framesDecoded) / 30.0;
}
// 9. Release slot
// 9. Release slot and cleanup resources
{
std::lock_guard<std::mutex> lock(my_slot.slot_mutex);
// Clear D3D12 resource tracking (resources are managed by external memory cache)
my_slot.d3d12_texture = nullptr;
my_slot.surface_object = 0;
my_slot.in_use.store(false);
}

View File

@@ -15,6 +15,10 @@ namespace VavCore {
class NV12ToRGBAConverter;
}
// Forward declarations for D3D12 and CUDA types
struct ID3D12Resource;
typedef unsigned long long CUsurfObject;
// Forward declaration for CUDA external memory type
typedef struct CUexternalMemory_st* cudaExternalMemory_t;
typedef struct CUexternalSemaphore_st* cudaExternalSemaphore_t;
@@ -167,6 +171,10 @@ private:
// NVDEC information (set by HandlePictureDecode callback)
int picture_index = -1; // CurrPicIdx from NVDEC (same as slot index)
// D3D12 resource tracking for cleanup on error
ID3D12Resource* d3d12_texture = nullptr;
CUsurfObject surface_object = 0;
// Synchronization primitives
std::condition_variable frame_ready; // Signaled when decode complete
std::mutex slot_mutex; // Protects this slot's state