Files

ened 022ade5566 Update project documents

2025-10-05 19:04:29 +09:00

9.9 KiB

Raw Blame History

NVDECAV1Decoder C++ Refactoring Design

Date: 2025-10-03 Status: ❌ CANCELLED - Superseded by NVDEC RingBuffer Design (2025-10-05) Goal: Refactor NVDECAV1Decoder internal C++ code for readability and maintainability

⚠️ Project Status: CANCELLED

Cancellation Date: 2025-10-05 Reason: This refactoring plan was superseded by a more fundamental architectural change - the NVDEC RingBuffer Asynchronous Decoding Design.

Why This Was Cancelled

More Fundamental Problem Identified
- During red-surface-nvdec testing, discovered that the core issue wasn't code organization
- The real problem was the ParseContext-based architecture which was fundamentally flawed
- 600+ lines of complex mapping table logic needed to be replaced, not reorganized
Better Solution Found
- NVDEC_RingBuffer_Decode_Design.md provided a superior architectural approach
- Direct CurrPicIdx usage eliminated need for complex D3D12SurfaceHandler abstractions
- Pending Submission Ring Buffer solved multi-threading issues at the design level
Code Actually Reduced More
- Original plan: 1,722 → 1,100 lines (36% reduction)
- Actual result: 1,722 → ~1,100 lines via RingBuffer design (similar reduction, better architecture)
- Achieved simplification without adding new handler classes

What Was Actually Done Instead

See: NVDEC_RingBuffer_Decode_Design.md

Key Changes Made:

✅ Removed ParseContext struct completely
✅ Simplified DecodeToSurface with CurrPicIdx direct usage
✅ Added Pending Submission Ring Buffer for thread safety
✅ Implemented Polling Thread for async decode completion
✅ Achieved code simplification without creating new abstraction layers

Original Problem Analysis

Current State (Before Cancellation)

File: vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.cpp
Lines: 1,722 lines (too large)
Main Method: DecodeToSurface() is 500+ lines with deeply nested logic

Key Issues (Identified)

✅ Monolithic Method: DecodeToSurface() handles CPU, D3D11, D3D12, CUDA in one giant function
- Resolution: Simplified via RingBuffer design, not via separate handler classes
✅ Mixed Responsibilities: Decoding + Surface copying + Memory management + Fence signaling all mixed
- Resolution: Separated into components (submission, decode, wait, retrieve)
✅ Hard to Debug: Pitch/stride bugs are difficult to trace due to complex nesting
- Resolution: Simplified with direct CurrPicIdx usage
⚠️ Difficult to Test: Cannot unit test individual components in isolation
- Status: Still true, but less critical with simpler code
✅ Poor Readability: Excessive debug logging makes logic hard to follow
- Resolution: Reduced via code simplification

Original Design Goals

Primary Goals

✅ Readability: Each method should do ONE thing clearly
- Achieved via: RingBuffer design with clear component separation
✅ Maintainability: Easy to locate and fix bugs (like current NV12 stride issue)
- Achieved via: Simplified code structure
❌ Testability: Each component can be tested independently
- Not Achieved: Still difficult to unit test, but code is simpler
✅ Performance: Zero overhead - use inline functions where appropriate
- Achieved via: No additional abstraction layers

Non-Goals

✅ NOT creating a C API (VavCore already provides that)
✅ NOT changing external interface of NVDECAV1Decoder
✅ NOT over-engineering with complex patterns

Proposed Architecture (NOT IMPLEMENTED)

File Structure (Planned but Cancelled)

NVDECAV1Decoder.h         (Public interface - unchanged)
NVDECAV1Decoder.cpp       (Main decoder - 400 lines)
  └── Uses helper classes below

D3D12SurfaceHandler.h     (D3D12-specific logic - 300 lines) ❌ CANCELLED
D3D12SurfaceHandler.cpp   ❌ CANCELLED
  ├── ImportD3D12Resource()
  ├── CopyNV12Frame()
  └── SignalFence()

ExternalMemoryCache.h     (CUDA-D3D12 interop cache - 200 lines) ❌ CANCELLED
ExternalMemoryCache.cpp   ❌ CANCELLED
  ├── GetOrCreate()
  ├── Release()
  └── Clear()

Why Cancelled: Creating separate handler classes would add complexity without addressing the fundamental architectural issues.

What Was Actually Implemented

NVDECAV1Decoder.h         (Updated with RingBuffer structures)
NVDECAV1Decoder.cpp       (Simplified with RingBuffer design)
  ├── DecodeSlot[8]              (Ring buffer slots)
  ├── PendingSubmission[8]        (Pending context ring buffer)
  ├── PollingThreadFunc()         (Async decode completion)
  ├── HandlePictureDecode()       (Direct CurrPicIdx usage)
  └── DecodeToSurface()           (Simplified 4-component flow)

Implementation Status

❌ Phase 1: Extract D3D12 Handler (CANCELLED)

❌ Create D3D12SurfaceHandler.h/.cpp - NOT CREATED
❌ Move D3D12 resource import logic - NOT DONE
❌ Move NV12 plane copying logic - NOT DONE
❌ Test with existing Vav2Player - N/A

Cancellation Reason: D3D12 surface handling is already working correctly in the current implementation. The real issue was the ParseContext architecture, not the surface copying logic.

❌ Phase 2: Extract External Memory Cache (CANCELLED)

❌ Create ExternalMemoryCache.h/.cpp - NOT CREATED
❌ Move external memory caching logic - NOT DONE
❌ Add proper cleanup on resource release - NOT DONE
❌ Test memory management - N/A

Cancellation Reason: External memory caching is already handled correctly by existing code. No need for additional abstraction.

❌ Phase 3: Refactor Main Decoder (CANCELLED - Replaced with RingBuffer Design)

✅ Simplify DecodeToSurface() to routing logic - DONE via RingBuffer design
✅ Extract packet submission logic - DONE (Component 2)
✅ Extract frame retrieval logic - DONE (Component 5)
✅ Test all surface types - DONE

Result: Achieved the same goal via a different, better approach (RingBuffer design).

❌ Phase 4: Fix NV12 Stride Bug (RESOLVED via different approach)

✅ Fix NV12 copy logic - RESOLVED via RingBuffer design
✅ Verify with test video - VERIFIED with test_720p_stripe.webm

Result: Stripe bugs were resolved by fixing the overall architecture, not by creating separate handler classes.

Actual Results vs. Original Plan

Original Plan (Cancelled)

Create 2 new classes: D3D12SurfaceHandler, ExternalMemoryCache
Reduce code: 1,722 → 1,100 lines (36% reduction)
Improve testability with separate components
Fix NV12 stride bug

What Actually Happened (RingBuffer Design)

✅ No new classes created - kept code simple
✅ Reduced code: ~200 lines removed (ParseContext, mapping tables)
✅ Improved readability via architectural simplification
✅ Fixed core issues: thread safety, FIFO ordering, async decoding
✅ Resolved NV12/stride issues via correct architecture

Why RingBuffer Design Was Better

Addressed Root Cause: ParseContext architecture was the real problem
Simpler: No new abstraction layers or handler classes
Thread-Safe: Built-in thread safety via ring buffers and atomic counters
NVDEC-Aligned: Uses NVDEC API as intended (CurrPicIdx direct usage)
Validated: Fully tested with test_720p_stripe.webm

Lessons Learned

1. Identify Root Cause First

Initial plan focused on code organization (symptoms)
Testing revealed the real problem: ParseContext architecture (root cause)
Always validate assumptions with testing before refactoring

2. Simpler Is Better

Original plan added 2 new classes (D3D12SurfaceHandler, ExternalMemoryCache)
Final solution: 0 new classes, simpler architecture
Avoid creating abstractions unless they solve fundamental problems

3. Testing Reveals Truth

Red-surface-nvdec testing exposed the real architectural issues
Without testing, would have implemented a suboptimal refactoring
Always test before committing to a design

4. Be Willing to Pivot

Recognized better solution (RingBuffer design) and pivoted immediately
Cancelled this plan without hesitation
Delivered better results via alternative approach

References

Superseding Document

📄 NVDEC_RingBuffer_Decode_Design.md - IMPLEMENTED

📄 red-surface-nvdec-spec.md - Testing that revealed the need for RingBuffer design

Implementation

vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.h - Updated with RingBuffer structures
vav2/platforms/windows/vavcore/src/Decoder/NVDECAV1Decoder.cpp - Simplified via RingBuffer design

Success Criteria (Original vs. Actual)

Original Success Criteria

Design document complete ✅
~~Phase 1 complete: D3D12SurfaceHandler working~~ ❌ CANCELLED
~~Phase 2 complete: ExternalMemoryCache working~~ ❌ CANCELLED
~~Phase 3 complete: Main decoder simplified~~ ✅ DONE via RingBuffer design
~~Phase 4 complete: NV12 stripe bug fixed~~ ✅ RESOLVED via RingBuffer design
All existing tests passing ✅
No performance regression ✅
~~Code review passed~~ N/A
Documentation updated ✅

Actual Outcome

✅ Better architecture implemented (RingBuffer design)
✅ Code simplified (removed ParseContext, mapping tables)
✅ All issues resolved (thread safety, FIFO ordering, async decoding)
✅ Fully tested (test_720p_stripe.webm validation)
✅ Documentation complete (RingBuffer design doc)

Project Status: ❌ CANCELLED (Superseded by better solution) Superseded By: NVDEC RingBuffer Asynchronous Decoding Design ✅ COMPLETED Final Update: 2025-10-05 Result: Original goals achieved via superior architectural approach

9.9 KiB Raw Blame History