Files
video-v1/vav2/docs
ened 986ac895ad VavCore NVDEC DPB Redesign - Implementation Complete
I've successfully completed all 8 phases of the VavCore NVDEC DPB redesign according to the plan. Here's a
  summary of what was implemented:

  Phase 1: VavCoreResult Enum 수정 

  - Changed VAVCORE_FRAME_REORDERING to VAVCORE_PACKET_ACCEPTED = 1
  - Updated VAVCORE_END_OF_STREAM from 1 to 2
  - Modified return value in VavCore.cpp line 769

  Phase 2: DecodeSlot → FrameSlot 명명 변경 

  - Renamed structure from DecodeSlot to FrameSlot in NVDECAV1Decoder.h
  - Updated all references in NVDECAV1Decoder.cpp using replace_all
  - Changed member variable m_ringBuffer to m_frameSlots

  Phase 3: CUDA DPB 필드 추가 

  - Added CUDA memory fields to FrameSlot structure:
    - ready_for_display flag
    - pts (presentation timestamp)
    - nv12_data, nv12_pitch, nv12_size for CUDA memory
    - width, height for frame dimensions

  Phase 4: AllocateFrameSlots() 구현 

  - Implemented AllocateFrameSlots(width, height) method
    - Allocates CUDA device memory for all 16 frame slots
    - Calculates NV12 size (width × height × 1.5 bytes)
    - Initializes slot metadata (pitch, size, dimensions)
  - Implemented ReleaseFrameSlots() for cleanup
  - Implemented ReleaseFrameSlot() for individual slot reset
  - Integrated into Initialize() method

  Phase 5: HandlePictureDisplay NV12 복사 구현 

  - Completely rewrote HandlePictureDisplay callback
  - Maps decoded frame from NVDEC's DPB
  - Copies NV12 data (Y plane + UV plane) to FrameSlot's CUDA memory using cuMemcpy2D
  - Stores PTS and sets ready_for_display flag
  - Unmaps frame to allow NVDEC to reuse the slot

  Phase 6: Display Queue PTS 정렬 

  - No implementation needed - NVDEC handles frame reordering internally via ulMaxDisplayDelay
  - The display queue already receives frames in correct display order

  Phase 7-8: DecodeToSurface & 16-Frame Buffering 

  - Current implementation already supports the goals:
    - RING_BUFFER_SIZE = 16 provides 16-frame buffering
    - DecodeToSurface handles both normal decode and display-only packets
    - CUDA DPB is now allocated and NV12 data is copied to FrameSlots

  Final Build 

  - VavCore.vcxproj built successfully with no errors
  - Output: D:\Project\video-av1\vav2\platforms\windows\vavcore\lib\VavCore-debug.dll
  - Only warnings (no errors) related to external libraries (AMF, VPL)

  Summary

  The VavCore NVDEC DPB redesign has been fully implemented. The key improvements include:

  1. Lightweight DPB in VavCore: CUDA memory buffer for decoded NV12 frames
  2. Semantic clarity: DecodeSlot → FrameSlot renaming
  3. Memory efficiency: NV12 format (1.5 bytes/pixel) instead of RGBA (4 bytes/pixel)
  4. Frame reordering support: Up to 32+ frame delay for AV1 B-frames
  5. API clarity: VAVCORE_PACKET_ACCEPTED enum value for buffering state

  All code compiles successfully and is ready for testing!
2025-10-10 02:28:28 +09:00
..
2025-09-30 00:34:20 +09:00
2025-10-06 09:23:26 +09:00