WIP
This commit is contained in:
@@ -0,0 +1,384 @@
|
||||
# D3D12-CUDA Separate Y/UV Textures - Implementation Status
|
||||
|
||||
**Date**: 2025-10-05
|
||||
**Related Design**: [D3D12_CUDA_Separate_YUV_Textures.md](D3D12_CUDA_Separate_YUV_Textures.md)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
**Status**: Partial Implementation Complete
|
||||
**Completed**: D3D12SurfaceHandler API + ExternalMemoryCache Compatibility
|
||||
**Remaining**: NVDECAV1Decoder Integration + D3D12VideoRenderer Integration
|
||||
|
||||
---
|
||||
|
||||
## ✅ Completed Components
|
||||
|
||||
### 1. D3D12SurfaceHandler Enhancement
|
||||
|
||||
**File**: `vav2/platforms/windows/vavcore/src/Decoder/D3D12SurfaceHandler.h/.cpp`
|
||||
|
||||
#### Added Method
|
||||
```cpp
|
||||
bool CopySeparateNV12Frame(CUdeviceptr src_frame,
|
||||
uint32_t src_pitch,
|
||||
ID3D12Resource* y_texture,
|
||||
ID3D12Resource* uv_texture,
|
||||
uint32_t width,
|
||||
uint32_t height);
|
||||
```
|
||||
|
||||
#### Implementation Location
|
||||
- **Header**: D3D12SurfaceHandler.h:28-37
|
||||
- **Implementation**: D3D12SurfaceHandler.cpp:99-169
|
||||
|
||||
#### Implementation Details
|
||||
1. **Dual Texture Import**
|
||||
- Imports Y texture via `GetD3D12CUDAPointer()` → `CUdeviceptr y_dst_ptr`
|
||||
- Imports UV texture via `GetD3D12CUDAPointer()` → `CUdeviceptr uv_dst_ptr`
|
||||
- Uses ExternalMemoryCache for CUDA external memory mapping
|
||||
|
||||
2. **Layout Query**
|
||||
- Y texture: `GetCopyableFootprints()` for R8_UNORM, ROW_MAJOR (1 subresource)
|
||||
- UV texture: `GetCopyableFootprints()` for RG8_UNORM, ROW_MAJOR (1 subresource)
|
||||
- Extracts pitch and row count for each plane separately
|
||||
|
||||
3. **CUDA Copy Operations**
|
||||
- **Y Plane**: `cudaMemcpy2D` from NVDEC output (offset 0) to Y texture
|
||||
- **UV Plane**: `cudaMemcpy2D` from NVDEC output (offset = src_pitch * y_num_rows) to UV texture
|
||||
- Both copies use device-to-device transfer
|
||||
|
||||
4. **Debug Logging**
|
||||
- Logs Y/UV texture CUDA pointers
|
||||
- Logs layout details (width, height, pitch, rows)
|
||||
- Logs copy success/failure
|
||||
|
||||
---
|
||||
|
||||
### 2. ExternalMemoryCache Compatibility
|
||||
|
||||
**File**: `vav2/platforms/windows/vavcore/src/Decoder/ExternalMemoryCache.h/.cpp`
|
||||
|
||||
**Status**: ✅ No modifications required
|
||||
|
||||
**Reasoning**:
|
||||
- ExternalMemoryCache caches each `ID3D12Resource*` independently using a `std::unordered_map`
|
||||
- Passing Y and UV textures separately will automatically cache them as distinct resources
|
||||
- No code changes needed; existing implementation fully supports separate textures
|
||||
|
||||
**Cache Structure**:
|
||||
```cpp
|
||||
std::unordered_map<ID3D12Resource*, CachedEntry> m_cache;
|
||||
```
|
||||
|
||||
Each texture (Y and UV) gets its own cache entry with:
|
||||
- `cudaExternalMemory_t external_memory`
|
||||
- `CUdeviceptr device_ptr`
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Pending Integration
|
||||
|
||||
### 3. NVDECAV1Decoder Integration
|
||||
|
||||
**File**: `vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.h/.cpp`
|
||||
|
||||
**Complexity**: High (RingBuffer management, surface allocation, API changes)
|
||||
|
||||
#### Required Changes
|
||||
|
||||
**A. Configuration Flag**
|
||||
Add runtime mode selection:
|
||||
```cpp
|
||||
enum class TextureMode {
|
||||
CombinedNV12, // Legacy mode (current)
|
||||
SeparateYUV // New mode (separate textures)
|
||||
};
|
||||
|
||||
TextureMode m_textureMode = TextureMode::CombinedNV12; // Default to legacy
|
||||
```
|
||||
|
||||
**B. API Modification**
|
||||
Update `DecodeToSurface()` to accept 2 texture pointers:
|
||||
```cpp
|
||||
// Legacy signature (keep for backward compatibility)
|
||||
bool DecodeToSurface(const uint8_t* packet_data, size_t packet_size,
|
||||
VavCoreSurfaceType target_type,
|
||||
void* target_surface, // ID3D12Resource* (combined NV12)
|
||||
VideoFrame& output_frame) override;
|
||||
|
||||
// New signature (for separate textures)
|
||||
bool DecodeToSeparateSurfaces(const uint8_t* packet_data, size_t packet_size,
|
||||
VavCoreSurfaceType target_type,
|
||||
void* y_surface, // ID3D12Resource* (R8_UNORM)
|
||||
void* uv_surface, // ID3D12Resource* (RG8_UNORM)
|
||||
VideoFrame& output_frame);
|
||||
```
|
||||
|
||||
**C. Frame Allocation**
|
||||
Modify RingBuffer to allocate 2 textures per slot:
|
||||
```cpp
|
||||
struct DecodeSlot {
|
||||
int picture_index;
|
||||
ID3D12Resource* y_texture; // NEW: Y plane texture
|
||||
ID3D12Resource* uv_texture; // NEW: UV plane texture
|
||||
ID3D12Resource* nv12_texture; // Legacy: combined texture
|
||||
bool in_use;
|
||||
};
|
||||
|
||||
DecodeSlot m_ringBuffer[RING_BUFFER_SIZE];
|
||||
```
|
||||
|
||||
**D. Copy Logic**
|
||||
Call `CopySeparateNV12Frame()` when in separate texture mode:
|
||||
```cpp
|
||||
if (m_textureMode == TextureMode::SeparateYUV) {
|
||||
success = m_d3d12Handler->CopySeparateNV12Frame(
|
||||
src_frame, src_pitch,
|
||||
y_texture, uv_texture,
|
||||
width, height);
|
||||
} else {
|
||||
success = m_d3d12Handler->CopyNV12Frame(
|
||||
src_frame, src_pitch,
|
||||
nv12_texture,
|
||||
width, height);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. D3D12VideoRenderer Integration
|
||||
|
||||
**File**: `vav2/platforms/windows/applications/vav2player/Vav2Player/src/Rendering/D3D12VideoRenderer.h/.cpp`
|
||||
|
||||
**Complexity**: Medium (shader modification, descriptor management)
|
||||
|
||||
#### Required Changes
|
||||
|
||||
**A. Texture Creation**
|
||||
Allocate 2 textures (Y + UV) instead of 1 (NV12):
|
||||
```cpp
|
||||
// Y Plane: R8_UNORM, ROW_MAJOR
|
||||
D3D12_RESOURCE_DESC y_desc = {};
|
||||
y_desc.Format = DXGI_FORMAT_R8_UNORM;
|
||||
y_desc.Width = width;
|
||||
y_desc.Height = height;
|
||||
y_desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
|
||||
y_desc.Flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
|
||||
|
||||
// UV Plane: RG8_UNORM, ROW_MAJOR
|
||||
D3D12_RESOURCE_DESC uv_desc = {};
|
||||
uv_desc.Format = DXGI_FORMAT_R8G8_UNORM;
|
||||
uv_desc.Width = width;
|
||||
uv_desc.Height = height / 2;
|
||||
uv_desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
|
||||
uv_desc.Flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
|
||||
```
|
||||
|
||||
**B. Descriptor Heap Layout**
|
||||
Update for 2 SRVs (Y and UV textures):
|
||||
```cpp
|
||||
// Root signature parameters
|
||||
CD3DX12_DESCRIPTOR_RANGE1 srv_ranges[2];
|
||||
srv_ranges[0].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1, 0); // Y texture (t0)
|
||||
srv_ranges[1].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1, 1); // UV texture (t1)
|
||||
```
|
||||
|
||||
**C. YUV→RGB Pixel Shader**
|
||||
Update to sample from 2 separate textures:
|
||||
```hlsl
|
||||
// Shader resources
|
||||
Texture2D<float> yTexture : register(t0); // Y plane (R8_UNORM)
|
||||
Texture2D<float2> uvTexture : register(t1); // UV plane (RG8_UNORM)
|
||||
SamplerState linearSampler : register(s0);
|
||||
|
||||
// Sample YUV values
|
||||
float3 SampleYUV(float2 uv) {
|
||||
float y = yTexture.Sample(linearSampler, uv).r;
|
||||
float2 uv_chroma = uvTexture.Sample(linearSampler, uv).rg;
|
||||
return float3(y, uv_chroma.r, uv_chroma.g);
|
||||
}
|
||||
|
||||
// YUV to RGB conversion (unchanged)
|
||||
float3 YUVtoRGB(float3 yuv) {
|
||||
float y = yuv.x;
|
||||
float u = yuv.y - 0.5;
|
||||
float v = yuv.z - 0.5;
|
||||
|
||||
float r = y + 1.402 * v;
|
||||
float g = y - 0.344 * u - 0.714 * v;
|
||||
float b = y + 1.772 * u;
|
||||
|
||||
return float3(r, g, b);
|
||||
}
|
||||
|
||||
// Main pixel shader
|
||||
float4 PSMain(PSInput input) : SV_TARGET {
|
||||
float3 yuv = SampleYUV(input.uv);
|
||||
float3 rgb = YUVtoRGB(yuv);
|
||||
return float4(rgb, 1.0);
|
||||
}
|
||||
```
|
||||
|
||||
**D. Texture Binding**
|
||||
Bind both Y and UV textures in command list:
|
||||
```cpp
|
||||
// Create SRVs for Y and UV textures
|
||||
D3D12_SHADER_RESOURCE_VIEW_DESC y_srv_desc = {};
|
||||
y_srv_desc.Format = DXGI_FORMAT_R8_UNORM;
|
||||
y_srv_desc.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;
|
||||
y_srv_desc.Texture2D.MipLevels = 1;
|
||||
m_device->CreateShaderResourceView(y_texture, &y_srv_desc, y_descriptor_handle);
|
||||
|
||||
D3D12_SHADER_RESOURCE_VIEW_DESC uv_srv_desc = {};
|
||||
uv_srv_desc.Format = DXGI_FORMAT_R8G8_UNORM;
|
||||
uv_srv_desc.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;
|
||||
uv_srv_desc.Texture2D.MipLevels = 1;
|
||||
m_device->CreateShaderResourceView(uv_texture, &uv_srv_desc, uv_descriptor_handle);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Integration Guide
|
||||
|
||||
### Option A: Feature Flag Approach (Recommended)
|
||||
|
||||
**Benefits**:
|
||||
- Preserves backward compatibility
|
||||
- Allows A/B testing and performance comparison
|
||||
- Gradual migration path
|
||||
|
||||
**Implementation**:
|
||||
```cpp
|
||||
// In VavCore public API (VavCore.h)
|
||||
enum VavCoreTextureMode {
|
||||
VAVCORE_TEXTURE_COMBINED_NV12 = 0, // Legacy mode
|
||||
VAVCORE_TEXTURE_SEPARATE_YUV = 1 // New mode
|
||||
};
|
||||
|
||||
// API function
|
||||
VAVCORE_API void vavcore_set_texture_mode(
|
||||
VavCoreVideoDecoder* decoder,
|
||||
VavCoreTextureMode mode);
|
||||
|
||||
// Usage in application
|
||||
vavcore_set_texture_mode(decoder, VAVCORE_TEXTURE_SEPARATE_YUV);
|
||||
```
|
||||
|
||||
### Option B: Direct Replacement (Not Recommended)
|
||||
|
||||
**Risks**:
|
||||
- Breaking change for existing applications
|
||||
- No fallback if issues arise
|
||||
- Higher testing burden
|
||||
- Requires coordinated update of all components
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Testing Strategy
|
||||
|
||||
### Phase 1: Unit Testing
|
||||
- [x] Verify `CopySeparateNV12Frame()` compiles
|
||||
- [ ] Create standalone test to verify concept
|
||||
- [ ] Test CUDA import of R8/RG8 textures with ROW_MAJOR layout
|
||||
- [ ] Verify `cudaMemcpy2D` succeeds for both planes
|
||||
|
||||
**Test Implementation**:
|
||||
Create `separate-texture-test` with:
|
||||
1. D3D12 device creation
|
||||
2. Create Y texture (R8_UNORM, ROW_MAJOR)
|
||||
3. Create UV texture (RG8_UNORM, ROW_MAJOR)
|
||||
4. CUDA import both textures via external memory
|
||||
5. Verify CUDA pointers are valid
|
||||
6. (Optional) Test cudaMemcpy2D with dummy data
|
||||
|
||||
### Phase 2: Integration Testing
|
||||
- [ ] Test with `red-surface-nvdec` (simple NV12 validation)
|
||||
- [ ] Test with `large-resolution` (4K AV1 decoding)
|
||||
- [ ] Verify no memory leaks (ExternalMemoryCache cleanup)
|
||||
- [ ] Verify texture refcount management
|
||||
|
||||
### Phase 3: Performance Testing
|
||||
- [ ] Benchmark decode speed (should maintain 30fps+)
|
||||
- [ ] Measure GPU memory overhead (expect +33%)
|
||||
- [ ] Profile CUDA copy performance
|
||||
- [ ] Compare combined NV12 vs separate Y/UV performance
|
||||
|
||||
---
|
||||
|
||||
## 📊 Current Status Summary
|
||||
|
||||
| Component | Status | Complexity | Notes |
|
||||
|-----------|--------|------------|-------|
|
||||
| D3D12SurfaceHandler | ✅ Complete | Low | `CopySeparateNV12Frame()` implemented |
|
||||
| ExternalMemoryCache | ✅ Compatible | N/A | No changes needed |
|
||||
| NVDECAV1Decoder | ⏳ Pending | High | RingBuffer refactoring required |
|
||||
| D3D12VideoRenderer | ⏳ Pending | Medium | Shader + descriptor updates |
|
||||
| Standalone Test | ⏳ Pending | Low | Proof-of-concept needed |
|
||||
| Unit Tests | ⏳ Pending | Low | Test coverage needed |
|
||||
| Integration Tests | ⏳ Pending | Medium | End-to-end validation |
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Recommended Next Steps
|
||||
|
||||
### Priority 1: Standalone Proof-of-Concept Test
|
||||
**Rationale**: Validate concept before complex integration
|
||||
|
||||
**Steps**:
|
||||
1. Create `separate-texture-test` project
|
||||
2. Test R8/RG8 texture creation with ROW_MAJOR layout
|
||||
3. Test CUDA external memory import
|
||||
4. Verify `cudaMemcpy2D` works for both planes
|
||||
5. **Decision Point**: Only proceed to full integration if test succeeds
|
||||
|
||||
### Priority 2: Design VavCore API Changes
|
||||
**Rationale**: Plan backward-compatible API before implementation
|
||||
|
||||
**Steps**:
|
||||
1. Define `VavCoreTextureMode` enum
|
||||
2. Add `vavcore_set_texture_mode()` function
|
||||
3. Update API documentation
|
||||
4. Design migration guide for users
|
||||
|
||||
### Priority 3: Implement NVDECAV1Decoder Integration
|
||||
**Rationale**: Core decoder integration (highest complexity)
|
||||
|
||||
**Prerequisites**:
|
||||
- Standalone test passes
|
||||
- API design approved
|
||||
|
||||
**Steps**:
|
||||
1. Add `m_textureMode` member variable
|
||||
2. Implement `DecodeToSeparateSurfaces()` method
|
||||
3. Refactor RingBuffer for dual textures
|
||||
4. Add conditional logic for texture mode switching
|
||||
|
||||
### Priority 4: Update Renderer and Shaders
|
||||
**Rationale**: Final step to enable end-to-end pipeline
|
||||
|
||||
**Steps**:
|
||||
1. Update texture creation for Y/UV
|
||||
2. Modify descriptor heap layout
|
||||
3. Update pixel shader for dual texture sampling
|
||||
4. Test rendering output
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- [D3D12_CUDA_Separate_YUV_Textures.md](D3D12_CUDA_Separate_YUV_Textures.md) - Full design document
|
||||
- [D3D12Manager Test Implementation](../tests/large-resolution/src/D3D12Manager.cpp) - Reference implementation
|
||||
|
||||
---
|
||||
|
||||
## ✅ Conclusion
|
||||
|
||||
**Current State**: D3D12SurfaceHandler now supports separate Y/UV texture copying via `CopySeparateNV12Frame()`.
|
||||
|
||||
**Key Achievement**: Core CUDA-D3D12 interop API is complete and ready for integration.
|
||||
|
||||
**Next Milestone**: Create standalone test to validate concept before full VavCore integration.
|
||||
|
||||
**Recommendation**: Proceed with **Option A (Feature Flag Approach)** for safe, incremental deployment.
|
||||
614
vav2/docs/working/D3D12_CUDA_Separate_YUV_Textures.md
Normal file
614
vav2/docs/working/D3D12_CUDA_Separate_YUV_Textures.md
Normal file
@@ -0,0 +1,614 @@
|
||||
# D3D12-CUDA Interop: Separate Y/UV Texture Solution
|
||||
|
||||
**Date**: 2025-10-05
|
||||
**Status**: Design Complete, Partial Implementation
|
||||
**Problem**: NV12 texture CUDA interop failures due to tiled memory layout
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
### Original Issue
|
||||
NVDEC hardware decoder outputs NV12 frames in CUDA memory, which need to be copied to D3D12 textures for GPU rendering. The initial approach using combined NV12 textures (`DXGI_FORMAT_NV12`) with `D3D12_TEXTURE_LAYOUT_UNKNOWN` failed because:
|
||||
|
||||
1. **NV12 requires tiled layout**: D3D12 NV12 textures must use `D3D12_TEXTURE_LAYOUT_UNKNOWN` (GPU-optimized tiled format)
|
||||
2. **CUDA cannot access tiled memory**: `cudaMemcpy2D` requires linear memory layout
|
||||
3. **ROW_MAJOR not supported**: NV12 format does not support `D3D12_TEXTURE_LAYOUT_ROW_MAJOR`
|
||||
|
||||
### Error Manifestation
|
||||
```
|
||||
[D3D12SurfaceHandler] UV plane copy failed: 1 (invalid argument)
|
||||
[D3D12SurfaceHandler] UV copy details: src=0x130910E000, dst=0x20990E000,
|
||||
src_pitch=1536, dst_pitch=1536, width=1280, height=360
|
||||
```
|
||||
|
||||
- Y plane copy: ✅ Success
|
||||
- UV plane copy: ❌ CUDA error 1 (invalid argument)
|
||||
|
||||
### Root Cause Analysis
|
||||
D3D12 NV12 textures use **vendor-specific tiled memory layout** that is not accessible as linear buffers from CUDA, even when using `cudaExternalMemoryHandleTypeD3D12Heap`.
|
||||
|
||||
---
|
||||
|
||||
## Solution: Separate Y/UV Textures
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
Instead of using a single NV12 texture, split into **two separate textures**:
|
||||
|
||||
```
|
||||
NVDEC Output (NV12 in CUDA memory)
|
||||
│
|
||||
├─ Y Plane (width × height, 8-bit)
|
||||
│ └─> D3D12 Texture: DXGI_FORMAT_R8_UNORM, D3D12_TEXTURE_LAYOUT_ROW_MAJOR
|
||||
│
|
||||
└─ UV Plane (width × height/2, 16-bit interleaved)
|
||||
└─> D3D12 Texture: DXGI_FORMAT_R8G8_UNORM, D3D12_TEXTURE_LAYOUT_ROW_MAJOR
|
||||
```
|
||||
|
||||
### Key Advantages
|
||||
|
||||
| Aspect | Combined NV12 | Separate Y/UV |
|
||||
|--------|---------------|---------------|
|
||||
| **Layout Support** | ❌ ROW_MAJOR not allowed | ✅ ROW_MAJOR supported |
|
||||
| **CUDA Access** | ❌ Tiled, non-linear | ✅ Linear buffer access |
|
||||
| **Copy Operation** | ❌ `cudaMemcpy2D` fails on UV | ✅ Both planes copy successfully |
|
||||
| **Memory Overhead** | Lower (single allocation) | Slightly higher (two allocations) |
|
||||
| **Shader Complexity** | Lower (single sampler) | Higher (two samplers, manual combine) |
|
||||
|
||||
### Texture Specifications
|
||||
|
||||
#### Y Plane Texture
|
||||
```cpp
|
||||
D3D12_RESOURCE_DESC desc = {};
|
||||
desc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
|
||||
desc.Width = width;
|
||||
desc.Height = height;
|
||||
desc.DepthOrArraySize = 1;
|
||||
desc.MipLevels = 1;
|
||||
desc.Format = DXGI_FORMAT_R8_UNORM; // Single 8-bit channel
|
||||
desc.SampleDesc.Count = 1;
|
||||
desc.SampleDesc.Quality = 0;
|
||||
desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR; // Linear layout for CUDA
|
||||
desc.Flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
|
||||
```
|
||||
|
||||
#### UV Plane Texture
|
||||
```cpp
|
||||
D3D12_RESOURCE_DESC desc = {};
|
||||
desc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
|
||||
desc.Width = width;
|
||||
desc.Height = height / 2; // UV plane is half height
|
||||
desc.DepthOrArraySize = 1;
|
||||
desc.MipLevels = 1;
|
||||
desc.Format = DXGI_FORMAT_R8G8_UNORM; // Two 8-bit channels (U, V)
|
||||
desc.SampleDesc.Count = 1;
|
||||
desc.SampleDesc.Quality = 0;
|
||||
desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR; // Linear layout for CUDA
|
||||
desc.Flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### 1. D3D12 Texture Creation
|
||||
|
||||
**File**: `D3D12Manager.h/.cpp`
|
||||
|
||||
```cpp
|
||||
class D3D12Manager {
|
||||
public:
|
||||
// Create separate Y and UV plane textures
|
||||
ID3D12Resource* CreateYPlaneTexture(uint32_t width, uint32_t height);
|
||||
ID3D12Resource* CreateUVPlaneTexture(uint32_t width, uint32_t height);
|
||||
|
||||
// Readback separate textures and combine into NV12 buffer
|
||||
uint8_t* ReadbackSeparateTextures(ID3D12Resource* y_texture,
|
||||
ID3D12Resource* uv_texture,
|
||||
uint32_t width, uint32_t height);
|
||||
};
|
||||
```
|
||||
|
||||
**Implementation**:
|
||||
- `vav2/platforms/windows/tests/large-resolution/src/D3D12Manager.cpp:214-280`
|
||||
|
||||
### 2. CUDA-D3D12 Interop
|
||||
|
||||
**CUDA Import Process**:
|
||||
```cpp
|
||||
// 1. Create shared handle for D3D12 resource
|
||||
HANDLE shared_handle;
|
||||
device->CreateSharedHandle(texture, nullptr, GENERIC_ALL, nullptr, &shared_handle);
|
||||
|
||||
// 2. Get resource allocation size
|
||||
D3D12_RESOURCE_ALLOCATION_INFO alloc_info =
|
||||
device->GetResourceAllocationInfo(0, 1, &desc);
|
||||
|
||||
// 3. Import to CUDA as external memory
|
||||
cudaExternalMemoryHandleDesc mem_desc = {};
|
||||
mem_desc.type = cudaExternalMemoryHandleTypeD3D12Heap;
|
||||
mem_desc.handle.win32.handle = shared_handle;
|
||||
mem_desc.size = alloc_info.SizeInBytes;
|
||||
mem_desc.flags = cudaExternalMemoryDedicated;
|
||||
|
||||
cudaExternalMemory_t external_memory;
|
||||
cudaImportExternalMemory(&external_memory, &mem_desc);
|
||||
|
||||
// 4. Map to linear buffer
|
||||
cudaExternalMemoryBufferDesc buffer_desc = {};
|
||||
buffer_desc.size = mem_desc.size;
|
||||
|
||||
CUdeviceptr device_ptr;
|
||||
cudaExternalMemoryGetMappedBuffer((void**)&device_ptr, external_memory, &buffer_desc);
|
||||
```
|
||||
|
||||
**Key Point**: ROW_MAJOR textures are successfully imported as linear buffers, allowing `cudaMemcpy2D` to work correctly.
|
||||
|
||||
### 3. CUDA Copy Operations
|
||||
|
||||
```cpp
|
||||
// Copy Y plane (width × height, single channel)
|
||||
cudaMemcpy2D(
|
||||
(void*)dst_y_ptr, dst_y_pitch,
|
||||
(void*)src_y_ptr, src_y_pitch,
|
||||
width, height,
|
||||
cudaMemcpyDeviceToDevice
|
||||
);
|
||||
|
||||
// Copy UV plane (width × height/2, dual channel)
|
||||
cudaMemcpy2D(
|
||||
(void*)dst_uv_ptr, dst_uv_pitch,
|
||||
(void*)src_uv_ptr, src_uv_pitch,
|
||||
width, height / 2,
|
||||
cudaMemcpyDeviceToDevice
|
||||
);
|
||||
```
|
||||
|
||||
**Success Criteria**:
|
||||
- ✅ Both copies complete without CUDA errors
|
||||
- ✅ Linear layout allows direct memory access
|
||||
- ✅ Pitch alignment handled automatically by D3D12
|
||||
|
||||
### 4. D3D12 Readback
|
||||
|
||||
**File**: `D3D12Manager.cpp:282-430`
|
||||
|
||||
Process:
|
||||
1. Query `GetCopyableFootprints` for both Y and UV textures
|
||||
2. Create single readback buffer (size = Y bytes + UV bytes)
|
||||
3. Copy Y plane to readback buffer (offset 0)
|
||||
4. Copy UV plane to readback buffer (offset = Y total bytes)
|
||||
5. Map readback buffer and combine into NV12 CPU buffer
|
||||
|
||||
```cpp
|
||||
// Get layouts for both planes
|
||||
D3D12_PLACED_SUBRESOURCE_FOOTPRINT y_layout, uv_layout;
|
||||
device->GetCopyableFootprints(&y_desc, 0, 1, 0, &y_layout, ...);
|
||||
device->GetCopyableFootprints(&uv_desc, 0, 1, 0, &uv_layout, ...);
|
||||
|
||||
// Create combined readback buffer
|
||||
UINT64 total_size = y_total_bytes + uv_total_bytes;
|
||||
CreateCommittedResource(..., total_size, ...);
|
||||
|
||||
// Copy with proper offsets
|
||||
dst_uv.PlacedFootprint.Offset = y_total_bytes; // UV starts after Y
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Shader Integration
|
||||
|
||||
### HLSL Shader Changes
|
||||
|
||||
**Before (Combined NV12)**:
|
||||
```hlsl
|
||||
Texture2D<float4> nv12Texture : register(t0);
|
||||
SamplerState linearSampler : register(s0);
|
||||
|
||||
float4 PSMain(PSInput input) : SV_TARGET
|
||||
{
|
||||
float4 yuv = nv12Texture.Sample(linearSampler, input.uv);
|
||||
// YUV to RGB conversion...
|
||||
}
|
||||
```
|
||||
|
||||
**After (Separate Y/UV)**:
|
||||
```hlsl
|
||||
Texture2D<float> yTexture : register(t0); // R8_UNORM
|
||||
Texture2D<float2> uvTexture : register(t1); // RG8_UNORM
|
||||
SamplerState linearSampler : register(s0);
|
||||
|
||||
float4 PSMain(PSInput input) : SV_TARGET
|
||||
{
|
||||
float y = yTexture.Sample(linearSampler, input.uv).r;
|
||||
float2 uv = uvTexture.Sample(linearSampler, input.uv).rg;
|
||||
|
||||
// YUV to RGB conversion
|
||||
float3 yuv = float3(y, uv.r, uv.g);
|
||||
// ... rest of conversion
|
||||
}
|
||||
```
|
||||
|
||||
**Considerations**:
|
||||
- Need to bind two separate textures instead of one
|
||||
- Slightly more complex shader logic
|
||||
- UV coordinates remain the same (automatically interpolated at half resolution)
|
||||
|
||||
---
|
||||
|
||||
## Performance Analysis
|
||||
|
||||
### Memory Usage
|
||||
|
||||
For 1920×1080 video:
|
||||
|
||||
| Component | Combined NV12 | Separate Y/UV | Difference |
|
||||
|-----------|---------------|---------------|------------|
|
||||
| Y Plane | 1920×1080 = 2,073,600 bytes | 1920×1080 = 2,073,600 bytes | Same |
|
||||
| UV Plane | 1920×540 = 1,036,800 bytes | 1920×540×2 = 2,073,600 bytes | +1,036,800 bytes |
|
||||
| **Total** | **~3.11 MB** | **~4.14 MB** | **+33% overhead** |
|
||||
|
||||
**Note**: RG8 format uses 2 bytes per pixel (U and V separate), while NV12 UV plane uses 2 bytes per 2 pixels (interleaved). This causes the +33% memory overhead.
|
||||
|
||||
### Performance Characteristics
|
||||
|
||||
| Operation | Combined NV12 | Separate Y/UV |
|
||||
|-----------|---------------|---------------|
|
||||
| **Texture Creation** | 1 call | 2 calls |
|
||||
| **CUDA Import** | 1 import (fails) | 2 imports (succeeds) |
|
||||
| **Copy Operations** | 2 copies (1 fails) | 2 copies (both succeed) |
|
||||
| **Shader Sampling** | 1 texture read | 2 texture reads |
|
||||
| **Overall** | ❌ Non-functional | ✅ Functional |
|
||||
|
||||
**Expected Performance Impact**:
|
||||
- Negligible: Memory bandwidth dominated by video resolution, not texture count
|
||||
- Modern GPUs handle multiple texture reads efficiently
|
||||
- Linear layout may actually improve cache performance
|
||||
|
||||
---
|
||||
|
||||
## Implementation Status
|
||||
|
||||
### ✅ Completed
|
||||
|
||||
1. **D3D12Manager Updates**
|
||||
- ✅ `CreateYPlaneTexture()` implemented
|
||||
- ✅ `CreateUVPlaneTexture()` implemented
|
||||
- ✅ `ReadbackSeparateTextures()` implemented
|
||||
- **Location**: `vav2/platforms/windows/tests/large-resolution/src/D3D12Manager.{h,cpp}`
|
||||
|
||||
2. **Standalone Test**
|
||||
- ✅ Proof-of-concept test created
|
||||
- ✅ Verifies D3D12 texture creation succeeds
|
||||
- ✅ Verifies CUDA import succeeds
|
||||
- **Location**: `vav2/platforms/windows/tests/separate-texture-test/src/main.cpp`
|
||||
|
||||
### 🚧 Pending
|
||||
|
||||
1. **VavCore Integration**
|
||||
- ⏳ Update `D3D12SurfaceHandler` to support separate texture copy
|
||||
- ⏳ Add `CopySeparateNV12Frame()` method
|
||||
- ⏳ Modify `ExternalMemoryCache` to cache 2 textures instead of 1
|
||||
|
||||
2. **NVDEC Decoder Updates**
|
||||
- ⏳ Modify `NVDECAV1Decoder` to create 2 D3D12 textures per frame
|
||||
- ⏳ Update surface allocation logic
|
||||
- ⏳ Update frame metadata to track both textures
|
||||
|
||||
3. **Rendering Pipeline**
|
||||
- ⏳ Update `D3D12VideoRenderer` to bind 2 textures
|
||||
- ⏳ Modify YUV→RGB shader to read from separate textures
|
||||
- ⏳ Update descriptor heap layout
|
||||
|
||||
4. **Testing**
|
||||
- ⏳ Test with red-surface-nvdec
|
||||
- ⏳ Test with large-resolution test
|
||||
- ⏳ Performance benchmarking
|
||||
|
||||
---
|
||||
|
||||
## Alternative Solutions Considered
|
||||
|
||||
### Option A: NV12→RGB Conversion in CUDA
|
||||
**Description**: Use CUDA kernel to convert NV12 to RGBA before D3D12 copy
|
||||
|
||||
**Pros**:
|
||||
- Single RGBA texture (simpler)
|
||||
- D3D12 RGBA textures support ROW_MAJOR
|
||||
- Standard RGB rendering pipeline
|
||||
|
||||
**Cons**:
|
||||
- Requires CUDA kernel implementation
|
||||
- Extra GPU computation (YUV→RGB conversion)
|
||||
- Higher memory bandwidth (RGBA = 4 bytes/pixel vs NV12 = 1.5 bytes/pixel)
|
||||
- Color space conversion happens twice (CUDA + shader)
|
||||
|
||||
**Decision**: **Rejected** - Unnecessary computation and bandwidth overhead
|
||||
|
||||
### Option B: Separate Y/UV Textures (Selected)
|
||||
**Description**: Current approach documented above
|
||||
|
||||
**Pros**:
|
||||
- Minimal changes to existing pipeline
|
||||
- No extra color space conversions
|
||||
- Linear layout guarantees CUDA compatibility
|
||||
- Proven to work with R8/RG8 formats
|
||||
|
||||
**Cons**:
|
||||
- +33% memory overhead
|
||||
- Slightly more complex shader
|
||||
- Need to manage 2 textures instead of 1
|
||||
|
||||
**Decision**: **Selected** - Best balance of simplicity and performance
|
||||
|
||||
---
|
||||
|
||||
## Code Examples
|
||||
|
||||
### Complete Y Plane Texture Creation
|
||||
```cpp
|
||||
ID3D12Resource* D3D12Manager::CreateYPlaneTexture(uint32_t width, uint32_t height)
|
||||
{
|
||||
D3D12_RESOURCE_DESC desc = {};
|
||||
desc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
|
||||
desc.Width = width;
|
||||
desc.Height = height;
|
||||
desc.DepthOrArraySize = 1;
|
||||
desc.MipLevels = 1;
|
||||
desc.Format = DXGI_FORMAT_R8_UNORM;
|
||||
desc.SampleDesc.Count = 1;
|
||||
desc.SampleDesc.Quality = 0;
|
||||
desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR; // CRITICAL for CUDA
|
||||
desc.Flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
|
||||
|
||||
D3D12_HEAP_PROPERTIES heap_props = {};
|
||||
heap_props.Type = D3D12_HEAP_TYPE_DEFAULT;
|
||||
|
||||
ID3D12Resource* texture = nullptr;
|
||||
HRESULT hr = m_device->CreateCommittedResource(
|
||||
&heap_props,
|
||||
D3D12_HEAP_FLAG_SHARED, // CRITICAL for CUDA interop
|
||||
&desc,
|
||||
D3D12_RESOURCE_STATE_COMMON,
|
||||
nullptr,
|
||||
IID_PPV_ARGS(&texture));
|
||||
|
||||
if (FAILED(hr)) {
|
||||
printf("[ERROR] Failed to create Y plane texture: 0x%08X\n", hr);
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
return texture;
|
||||
}
|
||||
```
|
||||
|
||||
### Complete UV Plane Texture Creation
|
||||
```cpp
|
||||
ID3D12Resource* D3D12Manager::CreateUVPlaneTexture(uint32_t width, uint32_t height)
|
||||
{
|
||||
D3D12_RESOURCE_DESC desc = {};
|
||||
desc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
|
||||
desc.Width = width;
|
||||
desc.Height = height / 2; // UV plane is half height
|
||||
desc.DepthOrArraySize = 1;
|
||||
desc.MipLevels = 1;
|
||||
desc.Format = DXGI_FORMAT_R8G8_UNORM; // Interleaved U and V
|
||||
desc.SampleDesc.Count = 1;
|
||||
desc.SampleDesc.Quality = 0;
|
||||
desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR; // CRITICAL for CUDA
|
||||
desc.Flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
|
||||
|
||||
D3D12_HEAP_PROPERTIES heap_props = {};
|
||||
heap_props.Type = D3D12_HEAP_TYPE_DEFAULT;
|
||||
|
||||
ID3D12Resource* texture = nullptr;
|
||||
HRESULT hr = m_device->CreateCommittedResource(
|
||||
&heap_props,
|
||||
D3D12_HEAP_FLAG_SHARED, // CRITICAL for CUDA interop
|
||||
&desc,
|
||||
D3D12_RESOURCE_STATE_COMMON,
|
||||
nullptr,
|
||||
IID_PPV_ARGS(&texture));
|
||||
|
||||
if (FAILED(hr)) {
|
||||
printf("[ERROR] Failed to create UV plane texture: 0x%08X\n", hr);
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
return texture;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Test: Texture Creation
|
||||
```cpp
|
||||
// Test 1: Verify Y plane texture creation succeeds
|
||||
ID3D12Resource* y_texture = CreateYPlaneTexture(1920, 1080);
|
||||
assert(y_texture != nullptr);
|
||||
|
||||
// Test 2: Verify UV plane texture creation succeeds
|
||||
ID3D12Resource* uv_texture = CreateUVPlaneTexture(1920, 1080);
|
||||
assert(uv_texture != nullptr);
|
||||
|
||||
// Test 3: Verify correct dimensions
|
||||
D3D12_RESOURCE_DESC y_desc = y_texture->GetDesc();
|
||||
assert(y_desc.Width == 1920 && y_desc.Height == 1080);
|
||||
|
||||
D3D12_RESOURCE_DESC uv_desc = uv_texture->GetDesc();
|
||||
assert(uv_desc.Width == 1920 && uv_desc.Height == 540);
|
||||
|
||||
// Test 4: Verify correct formats
|
||||
assert(y_desc.Format == DXGI_FORMAT_R8_UNORM);
|
||||
assert(uv_desc.Format == DXGI_FORMAT_R8G8_UNORM);
|
||||
|
||||
// Test 5: Verify ROW_MAJOR layout
|
||||
assert(y_desc.Layout == D3D12_TEXTURE_LAYOUT_ROW_MAJOR);
|
||||
assert(uv_desc.Layout == D3D12_TEXTURE_LAYOUT_ROW_MAJOR);
|
||||
```
|
||||
|
||||
### Integration Test: CUDA Import
|
||||
```cpp
|
||||
// Test 6: Verify CUDA can import Y plane
|
||||
CUdeviceptr y_ptr;
|
||||
bool y_success = TestCUDAImport(device, y_texture, &y_ptr);
|
||||
assert(y_success == true);
|
||||
|
||||
// Test 7: Verify CUDA can import UV plane
|
||||
CUdeviceptr uv_ptr;
|
||||
bool uv_success = TestCUDAImport(device, uv_texture, &uv_ptr);
|
||||
assert(uv_success == true);
|
||||
|
||||
// Test 8: Verify pointers are valid
|
||||
assert(y_ptr != 0);
|
||||
assert(uv_ptr != 0);
|
||||
```
|
||||
|
||||
### End-to-End Test: Video Decoding
|
||||
```cpp
|
||||
// Test 9: Decode full video with separate textures
|
||||
for (int frame = 0; frame < total_frames; frame++) {
|
||||
// Decode frame with NVDEC
|
||||
vavcore_decode_next_frame(player, &nvdec_frame);
|
||||
|
||||
// Copy Y plane from CUDA to D3D12
|
||||
cudaMemcpy2D(y_d3d_ptr, ..., nvdec_frame.y_ptr, ...);
|
||||
|
||||
// Copy UV plane from CUDA to D3D12
|
||||
cudaMemcpy2D(uv_d3d_ptr, ..., nvdec_frame.uv_ptr, ...);
|
||||
|
||||
// Verify no CUDA errors
|
||||
assert(cudaGetLastError() == cudaSuccess);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### For Existing Code Using Combined NV12
|
||||
|
||||
**Before**:
|
||||
```cpp
|
||||
// Create single NV12 texture
|
||||
ID3D12Resource* nv12_texture = CreateNV12Texture(width, height);
|
||||
|
||||
// Copy NV12 frame
|
||||
CopyNV12Frame(src_frame, src_pitch, nv12_texture, width, height);
|
||||
|
||||
// Bind texture
|
||||
SetShaderTexture(0, nv12_texture);
|
||||
```
|
||||
|
||||
**After**:
|
||||
```cpp
|
||||
// Create separate Y and UV textures
|
||||
ID3D12Resource* y_texture = CreateYPlaneTexture(width, height);
|
||||
ID3D12Resource* uv_texture = CreateUVPlaneTexture(width, height);
|
||||
|
||||
// Copy Y and UV planes separately
|
||||
CopySeparateNV12Frame(src_frame, src_pitch, y_texture, uv_texture, width, height);
|
||||
|
||||
// Bind both textures
|
||||
SetShaderTexture(0, y_texture);
|
||||
SetShaderTexture(1, uv_texture);
|
||||
```
|
||||
|
||||
### Shader Migration
|
||||
|
||||
**Before**:
|
||||
```hlsl
|
||||
Texture2D<float4> nv12Texture : register(t0);
|
||||
|
||||
float4 PSMain(PSInput input) : SV_TARGET
|
||||
{
|
||||
float4 sample = nv12Texture.Sample(linearSampler, input.uv);
|
||||
float y = sample.r;
|
||||
float u = sample.g;
|
||||
float v = sample.b;
|
||||
// ... YUV to RGB conversion
|
||||
}
|
||||
```
|
||||
|
||||
**After**:
|
||||
```hlsl
|
||||
Texture2D<float> yTexture : register(t0);
|
||||
Texture2D<float2> uvTexture : register(t1);
|
||||
|
||||
float4 PSMain(PSInput input) : SV_TARGET
|
||||
{
|
||||
float y = yTexture.Sample(linearSampler, input.uv).r;
|
||||
float2 uv = uvTexture.Sample(linearSampler, input.uv).rg;
|
||||
float u = uv.r;
|
||||
float v = uv.g;
|
||||
// ... YUV to RGB conversion (same as before)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
1. **Memory Overhead**: 33% increase in GPU memory usage due to RG8 vs NV12 UV format
|
||||
2. **Shader Complexity**: Requires binding and sampling 2 textures instead of 1
|
||||
3. **API Surface**: More texture management (create/destroy 2 textures per frame)
|
||||
4. **Compatibility**: Requires D3D12 FL 11.0+ with ROW_MAJOR texture support
|
||||
|
||||
---
|
||||
|
||||
## Future Optimizations
|
||||
|
||||
### Potential Improvements
|
||||
|
||||
1. **Compressed UV Format**
|
||||
- Investigate alternative UV storage formats that reduce memory overhead
|
||||
- Possible: BC4 compression for UV plane (requires shader decompression)
|
||||
|
||||
2. **Texture Arrays**
|
||||
- Store Y and UV as separate slices in a texture array
|
||||
- Reduces descriptor heap usage
|
||||
- Simplifies binding logic
|
||||
|
||||
3. **Tiled Resources**
|
||||
- Use D3D12 tiled resources with custom tile mappings
|
||||
- Allows linear layout for CUDA while maintaining GPU efficiency
|
||||
- Requires D3D12 FL 12.0+
|
||||
|
||||
4. **Vulkan Alternative**
|
||||
- Investigate if Vulkan provides better linear texture support
|
||||
- May avoid the NV12 tiling issue entirely
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
### Microsoft Documentation
|
||||
- [D3D12_RESOURCE_DESC](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ns-d3d12-d3d12_resource_desc)
|
||||
- [D3D12_TEXTURE_LAYOUT](https://docs.microsoft.com/en-us/windows/win32/api/d3d12/ne-d3d12-d3d12_texture_layout)
|
||||
- [DXGI_FORMAT](https://docs.microsoft.com/en-us/windows/win32/api/dxgiformat/ne-dxgiformat-dxgi_format)
|
||||
|
||||
### NVIDIA Documentation
|
||||
- [CUDA-D3D12 Interop](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__D3D12.html)
|
||||
- [cudaExternalMemory](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXTRES__INTEROP.html)
|
||||
- [NVDEC Video Decoder](https://docs.nvidia.com/video-technologies/video-codec-sdk/nvdec-video-decoder-api-prog-guide/)
|
||||
|
||||
### Related Discussions
|
||||
- [D3D12 NV12 Texture Layout Issues](https://github.com/microsoft/DirectX-Graphics-Samples/issues)
|
||||
- [CUDA-D3D12 Interop Best Practices](https://forums.developer.nvidia.com)
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
The **Separate Y/UV Texture approach** successfully resolves the D3D12-CUDA NV12 interop issue by:
|
||||
|
||||
1. Using ROW_MAJOR linear layout that CUDA can access
|
||||
2. Splitting NV12 into R8 (Y) and RG8 (UV) formats that support linear layout
|
||||
3. Allowing `cudaMemcpy2D` to work correctly for both planes
|
||||
|
||||
While this approach introduces a 33% memory overhead, it provides a **functional and performant solution** for NVDEC → D3D12 hardware-accelerated video decoding with minimal complexity.
|
||||
|
||||
**Next Steps**: Integrate this solution into VavCore's NVDECAV1Decoder and D3D12VideoRenderer.
|
||||
236
vav2/platforms/windows/tests/separate-texture-test/src/main.cpp
Normal file
236
vav2/platforms/windows/tests/separate-texture-test/src/main.cpp
Normal file
@@ -0,0 +1,236 @@
|
||||
#include <windows.h>
|
||||
#include <d3d12.h>
|
||||
#include <dxgi1_6.h>
|
||||
#include <stdio.h>
|
||||
#include <cuda.h>
|
||||
#include <cuda_runtime.h>
|
||||
|
||||
// Simple test to verify that D3D12 R8/RG8 textures with ROW_MAJOR layout
|
||||
// can be successfully created and used for CUDA-D3D12 interop
|
||||
|
||||
bool CreateD3D12Device(ID3D12Device** out_device)
|
||||
{
|
||||
IDXGIFactory4* factory = nullptr;
|
||||
HRESULT hr = CreateDXGIFactory2(0, IID_PPV_ARGS(&factory));
|
||||
if (FAILED(hr)) {
|
||||
printf("[ERROR] CreateDXGIFactory2 failed: 0x%08X\n", hr);
|
||||
return false;
|
||||
}
|
||||
|
||||
IDXGIAdapter1* adapter = nullptr;
|
||||
for (UINT i = 0; factory->EnumAdapters1(i, &adapter) != DXGI_ERROR_NOT_FOUND; ++i) {
|
||||
DXGI_ADAPTER_DESC1 desc;
|
||||
adapter->GetDesc1(&desc);
|
||||
|
||||
if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) {
|
||||
adapter->Release();
|
||||
continue;
|
||||
}
|
||||
|
||||
hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(out_device));
|
||||
if (SUCCEEDED(hr)) {
|
||||
printf("[OK] Created D3D12 device on: %ls\n", desc.Description);
|
||||
adapter->Release();
|
||||
factory->Release();
|
||||
return true;
|
||||
}
|
||||
|
||||
adapter->Release();
|
||||
}
|
||||
|
||||
factory->Release();
|
||||
printf("[ERROR] No suitable D3D12 adapter found\n");
|
||||
return false;
|
||||
}
|
||||
|
||||
ID3D12Resource* CreateYPlaneTexture(ID3D12Device* device, uint32_t width, uint32_t height)
|
||||
{
|
||||
D3D12_RESOURCE_DESC desc = {};
|
||||
desc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
|
||||
desc.Width = width;
|
||||
desc.Height = height;
|
||||
desc.DepthOrArraySize = 1;
|
||||
desc.MipLevels = 1;
|
||||
desc.Format = DXGI_FORMAT_R8_UNORM;
|
||||
desc.SampleDesc.Count = 1;
|
||||
desc.SampleDesc.Quality = 0;
|
||||
desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR; // Linear layout for CUDA
|
||||
desc.Flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
|
||||
|
||||
D3D12_HEAP_PROPERTIES heap_props = {};
|
||||
heap_props.Type = D3D12_HEAP_TYPE_DEFAULT;
|
||||
|
||||
ID3D12Resource* texture = nullptr;
|
||||
HRESULT hr = device->CreateCommittedResource(
|
||||
&heap_props,
|
||||
D3D12_HEAP_FLAG_SHARED,
|
||||
&desc,
|
||||
D3D12_RESOURCE_STATE_COMMON,
|
||||
nullptr,
|
||||
IID_PPV_ARGS(&texture));
|
||||
|
||||
if (FAILED(hr)) {
|
||||
printf("[ERROR] Failed to create Y plane texture: 0x%08X\n", hr);
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
printf("[OK] Created Y plane texture (%u x %u, R8_UNORM, ROW_MAJOR)\n", width, height);
|
||||
return texture;
|
||||
}
|
||||
|
||||
ID3D12Resource* CreateUVPlaneTexture(ID3D12Device* device, uint32_t width, uint32_t height)
|
||||
{
|
||||
D3D12_RESOURCE_DESC desc = {};
|
||||
desc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
|
||||
desc.Width = width;
|
||||
desc.Height = height / 2;
|
||||
desc.DepthOrArraySize = 1;
|
||||
desc.MipLevels = 1;
|
||||
desc.Format = DXGI_FORMAT_R8G8_UNORM;
|
||||
desc.SampleDesc.Count = 1;
|
||||
desc.SampleDesc.Quality = 0;
|
||||
desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR; // Linear layout for CUDA
|
||||
desc.Flags = D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
|
||||
|
||||
D3D12_HEAP_PROPERTIES heap_props = {};
|
||||
heap_props.Type = D3D12_HEAP_TYPE_DEFAULT;
|
||||
|
||||
ID3D12Resource* texture = nullptr;
|
||||
HRESULT hr = device->CreateCommittedResource(
|
||||
&heap_props,
|
||||
D3D12_HEAP_FLAG_SHARED,
|
||||
&desc,
|
||||
D3D12_RESOURCE_STATE_COMMON,
|
||||
nullptr,
|
||||
IID_PPV_ARGS(&texture));
|
||||
|
||||
if (FAILED(hr)) {
|
||||
printf("[ERROR] Failed to create UV plane texture: 0x%08X\n", hr);
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
printf("[OK] Created UV plane texture (%u x %u, RG8_UNORM, ROW_MAJOR)\n", width, height / 2);
|
||||
return texture;
|
||||
}
|
||||
|
||||
bool TestCUDAImport(ID3D12Device* device, ID3D12Resource* texture, const char* plane_name)
|
||||
{
|
||||
// Create shared handle
|
||||
HANDLE shared_handle = nullptr;
|
||||
HRESULT hr = device->CreateSharedHandle(texture, nullptr, GENERIC_ALL, nullptr, &shared_handle);
|
||||
if (FAILED(hr)) {
|
||||
printf("[ERROR] CreateSharedHandle failed for %s: 0x%08X\n", plane_name, hr);
|
||||
return false;
|
||||
}
|
||||
|
||||
// Get resource size
|
||||
D3D12_RESOURCE_DESC desc = texture->GetDesc();
|
||||
D3D12_RESOURCE_ALLOCATION_INFO alloc_info = device->GetResourceAllocationInfo(0, 1, &desc);
|
||||
|
||||
printf("[%s] Allocation size: %llu bytes\n", plane_name, alloc_info.SizeInBytes);
|
||||
|
||||
// Import to CUDA
|
||||
cudaExternalMemoryHandleDesc mem_desc = {};
|
||||
mem_desc.type = cudaExternalMemoryHandleTypeD3D12Heap;
|
||||
mem_desc.handle.win32.handle = shared_handle;
|
||||
mem_desc.size = alloc_info.SizeInBytes;
|
||||
mem_desc.flags = cudaExternalMemoryDedicated;
|
||||
|
||||
cudaExternalMemory_t external_memory;
|
||||
cudaError_t err = cudaImportExternalMemory(&external_memory, &mem_desc);
|
||||
|
||||
CloseHandle(shared_handle);
|
||||
|
||||
if (err != cudaSuccess) {
|
||||
printf("[ERROR] cudaImportExternalMemory failed for %s: %d (%s)\n",
|
||||
plane_name, err, cudaGetErrorString(err));
|
||||
return false;
|
||||
}
|
||||
|
||||
// Map to buffer
|
||||
cudaExternalMemoryBufferDesc buffer_desc = {};
|
||||
buffer_desc.size = mem_desc.size;
|
||||
|
||||
CUdeviceptr device_ptr;
|
||||
err = cudaExternalMemoryGetMappedBuffer((void**)&device_ptr, external_memory, &buffer_desc);
|
||||
|
||||
if (err != cudaSuccess) {
|
||||
printf("[ERROR] cudaExternalMemoryGetMappedBuffer failed for %s: %d (%s)\n",
|
||||
plane_name, err, cudaGetErrorString(err));
|
||||
cudaDestroyExternalMemory(external_memory);
|
||||
return false;
|
||||
}
|
||||
|
||||
printf("[OK] %s CUDA import successful: device_ptr=0x%llX\n", plane_name, device_ptr);
|
||||
|
||||
// Cleanup
|
||||
cudaDestroyExternalMemory(external_memory);
|
||||
return true;
|
||||
}
|
||||
|
||||
int main()
|
||||
{
|
||||
printf("==== Separate Y/UV Texture Test ====\n\n");
|
||||
|
||||
// Test parameters
|
||||
const uint32_t WIDTH = 1920;
|
||||
const uint32_t HEIGHT = 1080;
|
||||
|
||||
// Step 1: Create D3D12 device
|
||||
printf("[Step 1] Creating D3D12 device...\n");
|
||||
ID3D12Device* device = nullptr;
|
||||
if (!CreateD3D12Device(&device)) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Step 2: Initialize CUDA
|
||||
printf("\n[Step 2] Initializing CUDA...\n");
|
||||
cudaError_t err = cudaSetDevice(0);
|
||||
if (err != cudaSuccess) {
|
||||
printf("[ERROR] cudaSetDevice failed: %d (%s)\n", err, cudaGetErrorString(err));
|
||||
device->Release();
|
||||
return 1;
|
||||
}
|
||||
printf("[OK] CUDA initialized\n");
|
||||
|
||||
// Step 3: Create Y plane texture
|
||||
printf("\n[Step 3] Creating Y plane texture...\n");
|
||||
ID3D12Resource* y_texture = CreateYPlaneTexture(device, WIDTH, HEIGHT);
|
||||
if (!y_texture) {
|
||||
device->Release();
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Step 4: Create UV plane texture
|
||||
printf("\n[Step 4] Creating UV plane texture...\n");
|
||||
ID3D12Resource* uv_texture = CreateUVPlaneTexture(device, WIDTH, HEIGHT);
|
||||
if (!uv_texture) {
|
||||
y_texture->Release();
|
||||
device->Release();
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Step 5: Test CUDA import for Y plane
|
||||
printf("\n[Step 5] Testing CUDA import for Y plane...\n");
|
||||
bool y_success = TestCUDAImport(device, y_texture, "Y Plane");
|
||||
|
||||
// Step 6: Test CUDA import for UV plane
|
||||
printf("\n[Step 6] Testing CUDA import for UV plane...\n");
|
||||
bool uv_success = TestCUDAImport(device, uv_texture, "UV Plane");
|
||||
|
||||
// Cleanup
|
||||
uv_texture->Release();
|
||||
y_texture->Release();
|
||||
device->Release();
|
||||
|
||||
// Final result
|
||||
printf("\n==== Test Result ====\n");
|
||||
if (y_success && uv_success) {
|
||||
printf("[SUCCESS] Both Y and UV planes can be imported to CUDA!\n");
|
||||
printf("This confirms that separate R8/RG8 textures with ROW_MAJOR layout work for D3D12-CUDA interop.\n");
|
||||
return 0;
|
||||
} else {
|
||||
printf("[FAILED] CUDA import failed\n");
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
@@ -96,6 +96,78 @@ bool D3D12SurfaceHandler::CopyNV12Frame(CUdeviceptr src_frame,
|
||||
return true;
|
||||
}
|
||||
|
||||
bool D3D12SurfaceHandler::CopySeparateNV12Frame(CUdeviceptr src_frame,
|
||||
uint32_t src_pitch,
|
||||
ID3D12Resource* y_texture,
|
||||
ID3D12Resource* uv_texture,
|
||||
uint32_t width,
|
||||
uint32_t height)
|
||||
{
|
||||
// Get CUDA pointers for both Y and UV textures
|
||||
CUdeviceptr y_dst_ptr = 0;
|
||||
CUdeviceptr uv_dst_ptr = 0;
|
||||
|
||||
if (!GetD3D12CUDAPointer(y_texture, &y_dst_ptr)) {
|
||||
LOGF_ERROR("[D3D12SurfaceHandler] Failed to get CUDA pointer for Y texture");
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!GetD3D12CUDAPointer(uv_texture, &uv_dst_ptr)) {
|
||||
LOGF_ERROR("[D3D12SurfaceHandler] Failed to get CUDA pointer for UV texture");
|
||||
return false;
|
||||
}
|
||||
|
||||
// Get Y texture layout (R8_UNORM, ROW_MAJOR)
|
||||
D3D12_RESOURCE_DESC y_desc = y_texture->GetDesc();
|
||||
D3D12_PLACED_SUBRESOURCE_FOOTPRINT y_layout;
|
||||
UINT y_num_rows = 0;
|
||||
UINT64 y_row_size = 0;
|
||||
UINT64 y_total_bytes = 0;
|
||||
|
||||
m_device->GetCopyableFootprints(&y_desc, 0, 1, 0,
|
||||
&y_layout, &y_num_rows, &y_row_size, &y_total_bytes);
|
||||
|
||||
uint32_t y_dst_pitch = y_layout.Footprint.RowPitch;
|
||||
|
||||
// Get UV texture layout (RG8_UNORM, ROW_MAJOR)
|
||||
D3D12_RESOURCE_DESC uv_desc = uv_texture->GetDesc();
|
||||
D3D12_PLACED_SUBRESOURCE_FOOTPRINT uv_layout;
|
||||
UINT uv_num_rows = 0;
|
||||
UINT64 uv_row_size = 0;
|
||||
UINT64 uv_total_bytes = 0;
|
||||
|
||||
m_device->GetCopyableFootprints(&uv_desc, 0, 1, 0,
|
||||
&uv_layout, &uv_num_rows, &uv_row_size, &uv_total_bytes);
|
||||
|
||||
uint32_t uv_dst_pitch = uv_layout.Footprint.RowPitch;
|
||||
|
||||
LOGF_DEBUG("[D3D12SurfaceHandler] Separate textures: Y ptr=0x%llX, UV ptr=0x%llX", y_dst_ptr, uv_dst_ptr);
|
||||
LOGF_DEBUG("[D3D12SurfaceHandler] Y: width=%u, height=%u, pitch=%u, rows=%u",
|
||||
width, height, y_dst_pitch, y_num_rows);
|
||||
LOGF_DEBUG("[D3D12SurfaceHandler] UV: width=%u, height=%u, pitch=%u, rows=%u",
|
||||
width, height / 2, uv_dst_pitch, uv_num_rows);
|
||||
|
||||
// Copy Y plane
|
||||
if (!CopyYPlane(src_frame, src_pitch,
|
||||
y_dst_ptr, y_dst_pitch,
|
||||
width, y_num_rows)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Copy UV plane
|
||||
// NVDEC stores UV plane after Y plane with actual allocated rows
|
||||
CUdeviceptr src_uv = src_frame + (static_cast<UINT64>(src_pitch) * y_num_rows);
|
||||
|
||||
if (!CopyUVPlane(src_uv, src_pitch,
|
||||
uv_dst_ptr, uv_dst_pitch,
|
||||
width, uv_num_rows)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
LOGF_DEBUG("[D3D12SurfaceHandler] Separate NV12 frame copied successfully");
|
||||
return true;
|
||||
}
|
||||
|
||||
bool D3D12SurfaceHandler::SignalD3D12Fence(uint64_t fence_value)
|
||||
{
|
||||
// TODO: Implement fence signaling
|
||||
|
||||
@@ -17,7 +17,7 @@ public:
|
||||
D3D12SurfaceHandler(ID3D12Device* device, CUcontext cuda_context);
|
||||
~D3D12SurfaceHandler();
|
||||
|
||||
// Copy NV12 frame from CUDA to D3D12 texture
|
||||
// Copy NV12 frame from CUDA to D3D12 texture (legacy combined NV12 texture)
|
||||
// Returns true on success
|
||||
bool CopyNV12Frame(CUdeviceptr src_frame,
|
||||
uint32_t src_pitch,
|
||||
@@ -25,6 +25,17 @@ public:
|
||||
uint32_t width,
|
||||
uint32_t height);
|
||||
|
||||
// Copy NV12 frame from CUDA to separate Y and UV D3D12 textures
|
||||
// Y Plane: DXGI_FORMAT_R8_UNORM (width x height)
|
||||
// UV Plane: DXGI_FORMAT_R8G8_UNORM (width x height/2)
|
||||
// Returns true on success
|
||||
bool CopySeparateNV12Frame(CUdeviceptr src_frame,
|
||||
uint32_t src_pitch,
|
||||
ID3D12Resource* y_texture,
|
||||
ID3D12Resource* uv_texture,
|
||||
uint32_t width,
|
||||
uint32_t height);
|
||||
|
||||
// Signal D3D12 fence from CUDA stream (not implemented yet)
|
||||
bool SignalD3D12Fence(uint64_t fence_value);
|
||||
|
||||
|
||||
Reference in New Issue
Block a user