에러 복구 메커니즘 강화 (슬롯 정리 로직 추가)

2025-10-07 04:03:15 +09:00
parent 23e7956375
commit f3fc17c796
5 changed files with 154 additions and 14 deletions
--- a/vav2/docs/working/Driver_API_Unification_Design.md
+++ b/vav2/docs/working/Driver_API_Unification_Design.md
@@ -167,15 +167,94 @@ cuMipmappedArrayDestroy(mipmapped_array);
 cuDestroyExternalMemory(external_memory);
 ```

+## Implementation Status
+
+### ✅ Phase 1: Driver API Unification - COMPLETED
+
+**Completed Tasks:**
+1. ✅ ExternalMemoryCache.cpp - All Runtime API converted to Driver API
+2. ✅ D3D12SurfaceHandler.cpp - CopyYPlane/CopyUVPlane converted to Driver API
+3. ✅ Type definitions updated (CUexternalMemory, CUmipmappedArray, etc.)
+4. ✅ Build successful with Driver API unification
+
+**Key Changes:**
+- `cudaImportExternalMemory()` → `cuImportExternalMemory()`
+- `cudaExternalMemoryGetMappedMipmappedArray()` → `cuExternalMemoryGetMappedMipmappedArray()`
+- `cudaMemcpy2D()` → `cuMemcpy2D()` with CUDA_MEMCPY2D struct
+- `cudaGetMipmappedArrayLevel()` → `cuMipmappedArrayGetLevel()`
+
+### ✅ Phase 2: Error Recovery Mechanism - COMPLETED
+
+**Implemented Features:**
+1. ✅ **D3D12 Resource Cleanup:**
+   - `D3D12SurfaceHandler::ReleaseD3D12Resource()` method
+   - External memory cache release on error
+
+2. ✅ **Slot-based Resource Tracking:**
+   - `DecodeSlot` structure enhanced with `d3d12_texture` and `surface_object` fields
+   - Resources tracked per-slot for proper cleanup
+
+3. ✅ **NV12ToRGBAConverter Reset:**
+   - Automatic reset on error to clean state
+   - Prevents resource accumulation across failed frames
+
+**Error Recovery Flow:**
+```cpp
+if (!copySuccess) {
+    // 1. Release D3D12 texture from cache
+    m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
+
+    // 2. Reset converter to clean state
+    m_rgbaConverter.reset();
+    m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
+
+    // 3. Clean slot state
+    my_slot.d3d12_texture = nullptr;
+    my_slot.surface_object = 0;
+    my_slot.in_use.store(false);
+}
+```
+
+### ❌ Phase 3: CUDA_ERROR_INVALID_HANDLE Investigation - IN PROGRESS
+
+**Current Issue:**
+- Driver API unification completed successfully
+- Error recovery mechanism working correctly
+- **BUT: `CUDA_ERROR_INVALID_HANDLE` persists during kernel launch**
+
+**Verified:**
+- ✅ Surface object created successfully (`surface=0x13`)
+- ✅ Kernel handle valid (`kernel=000001DF7D337BC0`)
+- ✅ Stream handle valid (`stream=000001DF073C08C0`)
+- ✅ Source RGBA pointer valid (`src_rgba=0x130B200000`)
+- ❌ Kernel launch fails with `CUDA_ERROR_INVALID_HANDLE`
+
+**Hypothesis:**
+The issue is NOT related to API mixing (fully unified to Driver API).
+Possible causes:
+1. Surface object type mismatch in kernel argument passing
+2. CUDA context issue despite Driver API unification
+3. Kernel parameter pointer alignment or type issues
+
+### Context Management Analysis
+
+**Test Results:**
+- Removed all `cuCtxSetCurrent()` calls - Error persists
+- This confirms context was already set correctly by `cuCtxCreate()`
+- NVDEC context is active throughout the pipeline
+
+**Conclusion:**
+`cuCtxSetCurrent()` was unnecessary defensive programming. The real issue lies elsewhere.
+
 ## Testing Strategy

 ### Verification Points

-1. **Build Success:** All type changes compile without errors
-2. **Context Consistency:** Log context handles to verify single context usage
-3. **Surface Creation:** Verify surface objects are created successfully
-4. **Kernel Execution:** Confirm kernels can access surface objects
-5. **Frame Output:** Validate decoded frames are rendered correctly
+1. ✅ **Build Success:** All type changes compile without errors
+2. ✅ **Error Recovery:** Resources properly cleaned up on failure
+3. ✅ **Context Consistency:** Single NVDEC context used throughout
+4. ❌ **Kernel Execution:** Still fails with CUDA_ERROR_INVALID_HANDLE
+5. ❌ **Frame Output:** No frames decoded successfully

 ### Debug Logging

@@ -221,12 +300,22 @@ If Driver API unification fails:

 ## Success Criteria

-✅ All Runtime API calls converted to Driver API
-✅ Build completes without errors
-✅ Single CUDA context used throughout pipeline
-✅ Surface objects successfully used in kernels
-✅ No `CUDA_ERROR_INVALID_HANDLE` errors
-✅ Frames decode and render correctly
+### Completed ✅
+- ✅ All Runtime API calls converted to Driver API
+- ✅ Build completes without errors
+- ✅ Single CUDA context used throughout pipeline
+- ✅ Error recovery mechanism implemented
+- ✅ Resource cleanup on failure
+
+### In Progress ❌
+- ❌ Surface objects successfully used in kernels
+- ❌ No `CUDA_ERROR_INVALID_HANDLE` errors
+- ❌ Frames decode and render correctly
+
+### Next Steps
+1. Investigate kernel parameter passing for surface objects
+2. Verify CUDA Driver API surface object compatibility with D3D12 external memory
+3. Consider alternative approaches if Driver API surface objects incompatible

 ## References

--- a/vav2/platforms/windows/vavcore/src/Decoder/D3D12SurfaceHandler.cpp
+++ b/vav2/platforms/windows/vavcore/src/Decoder/D3D12SurfaceHandler.cpp
@@ -188,6 +188,16 @@ bool D3D12SurfaceHandler::SignalD3D12Fence(uint64_t fence_value)
    return true;
 }

+void D3D12SurfaceHandler::ReleaseD3D12Resource(ID3D12Resource* resource)
+{
+    if (!resource) {
+        return;
+    }
+
+    LOGF_DEBUG("[D3D12SurfaceHandler] Releasing D3D12 resource from cache");
+    m_cache->Release(resource);
+}
+
 bool D3D12SurfaceHandler::GetD3D12CUDAPointer(ID3D12Resource* resource, CUdeviceptr* out_ptr)
 {
    return m_cache->GetOrCreateExternalMemory(resource, out_ptr);
@@ -328,8 +338,10 @@ bool D3D12SurfaceHandler::CopyRGBAToSurfaceViaKernel(CUsurfObject dst_surface,
    unsigned int grid_x = (width + 15) / 16;
    unsigned int grid_y = (height + 15) / 16;

-    LOGF_DEBUG("[D3D12SurfaceHandler] Launching kernel via Driver API: grid=(%u,%u), block=(%u,%u)",
-               grid_x, grid_y, block_x, block_y);
+    LOGF_DEBUG("[D3D12SurfaceHandler] Launching kernel: surface=0x%llX, src_rgba=0x%llX, width=%u, height=%u, pitch=%u",
+               (unsigned long long)dst_surface, (unsigned long long)src_rgba, width, height, src_pitch);
+    LOGF_DEBUG("[D3D12SurfaceHandler] Kernel params: grid=(%u,%u), block=(%u,%u), stream=%p, kernel=%p",
+               grid_x, grid_y, block_x, block_y, stream, m_surfaceWriteKernel);

    // Driver API Unified Implementation:
    // - Surface created with cuSurfObjectCreate (Driver API)
--- a/vav2/platforms/windows/vavcore/src/Decoder/D3D12SurfaceHandler.h
+++ b/vav2/platforms/windows/vavcore/src/Decoder/D3D12SurfaceHandler.h
@@ -50,6 +50,9 @@ public:
    // Signal D3D12 fence from CUDA stream (not implemented yet)
    bool SignalD3D12Fence(uint64_t fence_value);

+    // Release D3D12 resource from external memory cache
+    void ReleaseD3D12Resource(ID3D12Resource* resource);
+
 private:
    // Get CUDA device pointer for D3D12 resource (uses cache)
    bool GetD3D12CUDAPointer(ID3D12Resource* resource, CUdeviceptr* out_ptr);
--- a/vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.cpp
+++ b/vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.cpp
@@ -1391,6 +1391,9 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_
            // Copy to D3D12 surface (use target_surface from slot)
            ID3D12Resource* d3d12Resource = static_cast<ID3D12Resource*>(my_slot.target_surface);

+            // Track D3D12 texture in slot for cleanup on error
+            my_slot.d3d12_texture = d3d12Resource;
+
            // Check D3D12 texture format to determine if RGBA conversion is needed
            D3D12_RESOURCE_DESC desc = d3d12Resource->GetDesc();
            bool isRGBAFormat = (desc.Format == DXGI_FORMAT_R8G8B8A8_UNORM ||
@@ -1451,6 +1454,26 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_

            if (!copySuccess) {
                LOGF_ERROR("[DecodeToSurface] Frame copy failed");
+
+                // Cleanup D3D12 resources on error
+                if (target_type == VAVCORE_SURFACE_D3D12_RESOURCE) {
+                    LOGF_DEBUG("[DecodeToSurface] Cleaning up D3D12 resources after error");
+
+                    // Release D3D12 texture from external memory cache
+                    if (my_slot.d3d12_texture) {
+                        LOGF_DEBUG("[DecodeToSurface] Releasing D3D12 texture from cache");
+                        m_d3d12Handler->ReleaseD3D12Resource(my_slot.d3d12_texture);
+                        my_slot.d3d12_texture = nullptr;
+                    }
+
+                    // Reset NV12ToRGBAConverter to clean state (forces reinitialization on next frame)
+                    if (m_rgbaConverter) {
+                        m_rgbaConverter.reset();
+                        m_rgbaConverter = std::make_unique<NV12ToRGBAConverter>();
+                        LOGF_DEBUG("[DecodeToSurface] NV12ToRGBAConverter reset after error");
+                    }
+                }
+
                my_slot.in_use.store(false);
                m_returnCounter.fetch_add(1);
                return false;
@@ -1470,9 +1493,14 @@ bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_
            output_frame.timestamp_seconds = static_cast<double>(m_framesDecoded) / 30.0;
        }

-        // 9. Release slot
+        // 9. Release slot and cleanup resources
        {
            std::lock_guard<std::mutex> lock(my_slot.slot_mutex);
+
+            // Clear D3D12 resource tracking (resources are managed by external memory cache)
+            my_slot.d3d12_texture = nullptr;
+            my_slot.surface_object = 0;
+
            my_slot.in_use.store(false);
        }

--- a/vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.h
+++ b/vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.h
@@ -15,6 +15,10 @@ namespace VavCore {
    class NV12ToRGBAConverter;
 }

+// Forward declarations for D3D12 and CUDA types
+struct ID3D12Resource;
+typedef unsigned long long CUsurfObject;
+
 // Forward declaration for CUDA external memory type
 typedef struct CUexternalMemory_st* cudaExternalMemory_t;
 typedef struct CUexternalSemaphore_st* cudaExternalSemaphore_t;
@@ -167,6 +171,10 @@ private:
        // NVDEC information (set by HandlePictureDecode callback)
        int picture_index = -1;                   // CurrPicIdx from NVDEC (same as slot index)

+        // D3D12 resource tracking for cleanup on error
+        ID3D12Resource* d3d12_texture = nullptr;
+        CUsurfObject surface_object = 0;
+
        // Synchronization primitives
        std::condition_variable frame_ready;      // Signaled when decode complete
        std::mutex slot_mutex;                    // Protects this slot's state