16 KiB
16 KiB
NVDECAV1Decoder C++ Refactoring Design
Date: 2025-10-03 Status: Design Phase Goal: Refactor NVDECAV1Decoder internal C++ code for readability and maintainability
Problem Analysis
Current State
- File:
vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.cpp - Lines: 1,722 lines (too large)
- Main Method:
DecodeToSurface()is 500+ lines with deeply nested logic
Key Issues
- Monolithic Method:
DecodeToSurface()handles CPU, D3D11, D3D12, CUDA in one giant function - Mixed Responsibilities: Decoding + Surface copying + Memory management + Fence signaling all mixed
- Hard to Debug: Pitch/stride bugs are difficult to trace due to complex nesting
- Difficult to Test: Cannot unit test individual components in isolation
- Poor Readability: Excessive debug logging makes logic hard to follow
Design Goals
Primary Goals
- Readability: Each method should do ONE thing clearly
- Maintainability: Easy to locate and fix bugs (like current NV12 stride issue)
- Testability: Each component can be tested independently
- Performance: Zero overhead - use inline functions where appropriate
Non-Goals
- NOT creating a C API (VavCore already provides that)
- NOT changing external interface of NVDECAV1Decoder
- NOT over-engineering with complex patterns
Proposed Architecture
File Structure
NVDECAV1Decoder.h (Public interface - unchanged)
NVDECAV1Decoder.cpp (Main decoder - 400 lines)
└── Uses helper classes below
D3D12SurfaceHandler.h (D3D12-specific logic - 300 lines)
D3D12SurfaceHandler.cpp
├── ImportD3D12Resource()
├── CopyNV12Frame()
└── SignalFence()
ExternalMemoryCache.h (CUDA-D3D12 interop cache - 200 lines)
ExternalMemoryCache.cpp
├── GetOrCreate()
├── Release()
└── Clear()
Class Diagram
NVDECAV1Decoder (Main decoder)
├── CUvideodecoder m_decoder
├── CUvideoparser m_parser
├── CUcontext m_cudaContext
├── D3D12SurfaceHandler* m_d3d12Handler (on-demand)
└── ExternalMemoryCache* m_memoryCache (on-demand)
D3D12SurfaceHandler
├── ID3D12Device* m_device
├── CUcontext m_cudaContext
├── ExternalMemoryCache* m_cache
└── Methods:
├── CopyNV12Frame(src, dst, width, height, srcPitch)
├── GetD3D12CUDAPointer(ID3D12Resource*)
└── SignalD3D12Fence(value)
ExternalMemoryCache
├── std::map<ID3D12Resource*, CachedEntry>
└── Methods:
├── GetOrCreateExternalMemory(resource)
└── ReleaseAll()
Refactored Code Structure
1. NVDECAV1Decoder.cpp (Main decoder - simplified)
Before: 500+ lines in DecodeToSurface()
After: Clean routing logic
bool NVDECAV1Decoder::DecodeToSurface(const uint8_t* packet_data, size_t packet_size,
void* target_surface, SurfaceType target_type)
{
// Step 1: Decode packet to NVDEC internal buffer
if (!DecodePacket(packet_data, packet_size)) {
return false;
}
// Step 2: Get decoded frame info
DecodedFrameInfo frame_info;
if (!GetDecodedFrame(&frame_info)) {
return false;
}
// Step 3: Copy to target surface based on type
bool result = false;
switch (target_type) {
case SURFACE_TYPE_CPU:
result = CopyToCPUSurface(frame_info, target_surface);
break;
case SURFACE_TYPE_D3D12:
result = CopyToD3D12Surface(frame_info, target_surface);
break;
case SURFACE_TYPE_D3D11:
result = CopyToD3D11Surface(frame_info, target_surface);
break;
case SURFACE_TYPE_CUDA:
result = CopyToCUDASurface(frame_info, target_surface);
break;
}
// Step 4: Cleanup
cuvidUnmapVideoFrame(m_decoder, frame_info.device_ptr);
return result;
}
2. Private Helper Methods (in NVDECAV1Decoder.cpp)
// Decode packet using cuvidParseVideoData
// Returns: true on success
// Complexity: ~30 lines
private:
bool DecodePacket(const uint8_t* data, size_t size)
{
CUVIDSOURCEDATAPACKET packet = {};
packet.payload = data;
packet.payload_size = size;
packet.flags = CUVID_PKT_TIMESTAMP;
CUresult result = cuvidParseVideoData(m_parser, &packet);
if (result != CUDA_SUCCESS) {
LogError("cuvidParseVideoData failed: %d", result);
return false;
}
return true;
}
// Get decoded frame from internal queue
// Returns: true if frame available
// Complexity: ~40 lines
private:
struct DecodedFrameInfo {
CUdeviceptr device_ptr;
uint32_t pitch;
uint32_t width;
uint32_t height;
};
bool GetDecodedFrame(DecodedFrameInfo* out_info)
{
if (m_frameQueue.empty()) {
return false;
}
int frame_index = m_frameQueue.front();
m_frameQueue.pop();
CUVIDPROCPARAMS proc_params = {};
proc_params.progressive_frame = 1;
CUdeviceptr device_ptr;
unsigned int pitch;
CUresult result = cuvidMapVideoFrame(m_decoder, frame_index,
&device_ptr, &pitch, &proc_params);
if (result != CUDA_SUCCESS) {
LogError("cuvidMapVideoFrame failed: %d", result);
return false;
}
out_info->device_ptr = device_ptr;
out_info->pitch = pitch;
out_info->width = m_width;
out_info->height = m_height;
return true;
}
// Copy to D3D12 surface (delegates to handler)
// Returns: true on success
// Complexity: ~20 lines
private:
bool CopyToD3D12Surface(const DecodedFrameInfo& frame, void* surface)
{
auto* d3d12_resource = static_cast<ID3D12Resource*>(surface);
// Create handler on-demand
if (!m_d3d12Handler) {
m_d3d12Handler = std::make_unique<D3D12SurfaceHandler>(
m_d3d12Device, m_cudaContext
);
}
return m_d3d12Handler->CopyNV12Frame(
frame.device_ptr,
frame.pitch,
d3d12_resource,
frame.width,
frame.height
);
}
3. D3D12SurfaceHandler.h (D3D12-specific operations)
#pragma once
#include <d3d12.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <memory>
namespace VavCore {
// Forward declaration
class ExternalMemoryCache;
class D3D12SurfaceHandler {
public:
D3D12SurfaceHandler(ID3D12Device* device, CUcontext cuda_context);
~D3D12SurfaceHandler();
// Copy NV12 frame from CUDA to D3D12 texture
// Returns: true on success
bool CopyNV12Frame(CUdeviceptr src_frame,
uint32_t src_pitch,
ID3D12Resource* dst_texture,
uint32_t width,
uint32_t height);
// Signal D3D12 fence from CUDA stream
// Returns: true on success
bool SignalD3D12Fence(uint64_t fence_value);
private:
// Get CUDA device pointer for D3D12 resource (uses cache)
bool GetD3D12CUDAPointer(ID3D12Resource* resource, CUdeviceptr* out_ptr);
// Copy Y plane (8-bit single channel)
bool CopyYPlane(CUdeviceptr src, uint32_t src_pitch,
CUdeviceptr dst, uint32_t dst_pitch,
uint32_t width, uint32_t height);
// Copy UV plane (8-bit dual channel, interleaved)
bool CopyUVPlane(CUdeviceptr src, uint32_t src_pitch,
CUdeviceptr dst, uint32_t dst_pitch,
uint32_t width, uint32_t height);
private:
ID3D12Device* m_device;
CUcontext m_cudaContext;
std::unique_ptr<ExternalMemoryCache> m_cache;
};
} // namespace VavCore
4. D3D12SurfaceHandler.cpp (Implementation)
#include "D3D12SurfaceHandler.h"
#include "ExternalMemoryCache.h"
#include <stdio.h>
namespace VavCore {
D3D12SurfaceHandler::D3D12SurfaceHandler(ID3D12Device* device, CUcontext cuda_context)
: m_device(device)
, m_cudaContext(cuda_context)
, m_cache(std::make_unique<ExternalMemoryCache>(device, cuda_context))
{
}
D3D12SurfaceHandler::~D3D12SurfaceHandler()
{
}
bool D3D12SurfaceHandler::CopyNV12Frame(CUdeviceptr src_frame,
uint32_t src_pitch,
ID3D12Resource* dst_texture,
uint32_t width,
uint32_t height)
{
// Get CUDA pointer for D3D12 resource
CUdeviceptr dst_ptr = 0;
if (!GetD3D12CUDAPointer(dst_texture, &dst_ptr)) {
return false;
}
// Get D3D12 texture layout
D3D12_RESOURCE_DESC desc = dst_texture->GetDesc();
D3D12_PLACED_SUBRESOURCE_FOOTPRINT layouts[2];
UINT num_rows[2] = {0};
UINT64 row_sizes[2] = {0};
UINT64 total_bytes = 0;
m_device->GetCopyableFootprints(&desc, 0, 2, 0,
layouts, num_rows, row_sizes, &total_bytes);
// Copy Y plane
if (!CopyYPlane(src_frame, src_pitch,
dst_ptr, layouts[0].Footprint.RowPitch,
width, height)) {
return false;
}
// Copy UV plane
CUdeviceptr src_uv = src_frame + (src_pitch * height);
CUdeviceptr dst_uv = dst_ptr + layouts[1].Offset;
if (!CopyUVPlane(src_uv, src_pitch,
dst_uv, layouts[1].Footprint.RowPitch,
width, height / 2)) {
return false;
}
return true;
}
bool D3D12SurfaceHandler::GetD3D12CUDAPointer(ID3D12Resource* resource,
CUdeviceptr* out_ptr)
{
return m_cache->GetOrCreateExternalMemory(resource, out_ptr);
}
bool D3D12SurfaceHandler::CopyYPlane(CUdeviceptr src, uint32_t src_pitch,
CUdeviceptr dst, uint32_t dst_pitch,
uint32_t width, uint32_t height)
{
cudaError_t err = cudaMemcpy2D(
(void*)dst, dst_pitch,
(void*)src, src_pitch,
width, height, // Copy only valid pixels, not padding
cudaMemcpyDeviceToDevice
);
if (err != cudaSuccess) {
printf("[D3D12] Y plane copy failed: %d\n", err);
return false;
}
return true;
}
bool D3D12SurfaceHandler::CopyUVPlane(CUdeviceptr src, uint32_t src_pitch,
CUdeviceptr dst, uint32_t dst_pitch,
uint32_t width, uint32_t height)
{
// NV12 UV plane: interleaved U and V, so width in bytes = width of Y plane
cudaError_t err = cudaMemcpy2D(
(void*)dst, dst_pitch,
(void*)src, src_pitch,
width, height, // UV plane has same width in bytes, half height
cudaMemcpyDeviceToDevice
);
if (err != cudaSuccess) {
printf("[D3D12] UV plane copy failed: %d\n", err);
return false;
}
return true;
}
} // namespace VavCore
5. ExternalMemoryCache.h (CUDA-D3D12 interop cache)
#pragma once
#include <d3d12.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <map>
namespace VavCore {
class ExternalMemoryCache {
public:
ExternalMemoryCache(ID3D12Device* device, CUcontext cuda_context);
~ExternalMemoryCache();
// Get or create CUDA device pointer for D3D12 resource
// Returns: true on success
bool GetOrCreateExternalMemory(ID3D12Resource* resource, CUdeviceptr* out_ptr);
// Release specific resource
void Release(ID3D12Resource* resource);
// Release all cached resources
void ReleaseAll();
private:
struct CachedEntry {
cudaExternalMemory_t external_memory;
CUdeviceptr device_ptr;
size_t size;
};
bool ImportD3D12Resource(ID3D12Resource* resource,
cudaExternalMemory_t* out_ext_mem,
CUdeviceptr* out_ptr);
private:
ID3D12Device* m_device;
CUcontext m_cudaContext;
std::map<ID3D12Resource*, CachedEntry> m_cache;
};
} // namespace VavCore
Key Improvements
Readability
Before:
DecodeToSurface(): 500+ lines with 5 levels of nesting- Mixed concerns: decoding, copying, caching, signaling
After:
DecodeToSurface(): 40 lines, clear 4-step process- Each helper method: 20-60 lines, single responsibility
Debugging
Before:
- NV12 stride bug hidden in 500 lines of mixed logic
- Hard to locate which
cudaMemcpy2Dcall is wrong
After:
CopyYPlane()andCopyUVPlane()are separate methods- Easy to add breakpoint and inspect parameters
- Clear separation of Y and UV plane logic
Testing
Before:
- Cannot test D3D12 copying without full decoder setup
- Cannot mock CUDA operations
After:
- Can unit test
D3D12SurfaceHandlerindependently - Can test
ExternalMemoryCachein isolation - Easy to add mock implementations
Maintenance
Before:
- Adding D3D11 support requires modifying 500+ line method
- Risk of breaking existing D3D12 code
After:
- Add new
D3D11SurfaceHandlerclass - Existing D3D12 code untouched
- Clean separation of concerns
File Size Comparison
| File | Before | After |
|---|---|---|
| NVDECAV1Decoder.cpp | 1,722 lines | ~600 lines |
| D3D12SurfaceHandler.cpp | - | ~300 lines |
| ExternalMemoryCache.cpp | - | ~200 lines |
| Total | 1,722 lines | 1,100 lines |
Reduction: 36% code reduction while improving readability
Implementation Plan
Phase 1: Extract D3D12 Handler (2-3 hours)
- Create
D3D12SurfaceHandler.h/.cpp - Move D3D12 resource import logic
- Move NV12 plane copying logic
- Test with existing Vav2Player
Acceptance Criteria:
- Vav2Player displays video correctly
- No memory leaks
- Performance same or better
Phase 2: Extract External Memory Cache (1-2 hours)
- Create
ExternalMemoryCache.h/.cpp - Move external memory caching logic
- Add proper cleanup on resource release
- Test memory management
Acceptance Criteria:
- Cache hit/miss working correctly
- No memory leaks on repeated loads
- Cache cleared on decoder cleanup
Phase 3: Refactor Main Decoder (1-2 hours)
- Simplify
DecodeToSurface()to routing logic - Extract
DecodePacket()method - Extract
GetDecodedFrame()method - Extract
CopyToCPUSurface()method - Test all surface types
Acceptance Criteria:
- All surface types working
- Code passes all existing tests
- Debug logging reduced
Phase 4: Fix NV12 Stride Bug (30 minutes)
- Fix
CopyYPlane()width parameter - Fix
CopyUVPlane()width parameter - Verify with test video
Acceptance Criteria:
- No stripe pattern in displayed video
- Correct colors displayed
- Performance maintained
Testing Strategy
Unit Tests
TEST(D3D12SurfaceHandler, CopiesNV12FrameCorrectly)
{
auto handler = CreateTestHandler();
auto src_frame = CreateTestNV12Frame(1920, 1080);
auto dst_texture = CreateTestD3D12Texture(1920, 1080);
bool result = handler->CopyNV12Frame(
src_frame.device_ptr, src_frame.pitch,
dst_texture, 1920, 1080
);
EXPECT_TRUE(result);
VerifyNV12Data(dst_texture);
}
TEST(ExternalMemoryCache, ReusesExistingEntry)
{
auto cache = CreateTestCache();
auto resource = CreateTestD3D12Resource();
CUdeviceptr ptr1, ptr2;
cache->GetOrCreateExternalMemory(resource, &ptr1);
cache->GetOrCreateExternalMemory(resource, &ptr2);
EXPECT_EQ(ptr1, ptr2); // Should return same pointer
}
Integration Tests
- Load video file
- Decode multiple frames
- Verify no memory leaks
- Verify correct video display
Success Criteria
- Design document complete
- Phase 1 complete: D3D12SurfaceHandler working
- Phase 2 complete: ExternalMemoryCache working
- Phase 3 complete: Main decoder simplified
- Phase 4 complete: NV12 stripe bug fixed
- All existing tests passing
- No performance regression
- Code review passed
- Documentation updated
Next Step: Start Phase 1 - Extract D3D12SurfaceHandler
Last Updated: 2025-10-03