D3D12-CUDA RGB Pipeline

This commit is contained in:
2025-10-06 02:36:33 +09:00
parent 6b04396772
commit b4efc1be82
17 changed files with 1751 additions and 52 deletions

View File

@@ -407,7 +407,8 @@ CUVIDSOURCEDATAPACKET packet = {};
packet.payload = packet_data;
packet.payload_size = packet_size;
packet.flags = CUVID_PKT_ENDOFPICTURE;
packet.timestamp = 0; // Not used - we use m_pendingSubmission instead
packet.timestamp = 0; // ⚠️ CANNOT use timestamp to pass pending_idx
// NVDEC parser automatically overwrites timestamp field
CUresult result = cuvidParseVideoData(m_parser, &packet);
// cuvidParseVideoData is SYNCHRONOUS - HandlePictureDecode called before return
@@ -420,31 +421,76 @@ if (result != CUDA_SUCCESS) {
LOGF_DEBUG("[DecodeToSurface] Packet submitted, callback completed");
```
**⚠️ Critical Discovery: timestamp field is read-only**
During implementation, we discovered that **NVDEC parser automatically sets the timestamp field** based on internal logic. Any value we set in `packet.timestamp` is **overwritten by the parser** before reaching callbacks.
**Evidence from Testing**:
```cpp
// DecodeToSurface attempt:
packet.timestamp = pending_idx; // Try to pass pending_idx
// HandlePictureDecode receives:
pic_params->nTimeStamp // Contains parser-generated value, NOT our pending_idx!
```
**Why This Happens**:
1. NVDEC parser internally manages PTS (Presentation Timestamp)
2. Parser extracts timestamp from codec bitstream or generates sequential values
3. Our manually-set timestamp is ignored/overwritten
4. This is by design - timestamps are for A/V sync, not custom data passing
**Consequence**: We CANNOT pass pending_idx through packet.timestamp to the callback
**Solution: Use Most Recent Pending Submission**
Since we cannot pass pending_idx through timestamp, and cuvidParseVideoData is **synchronous** (callback completes before return), we can safely use the **most recently allocated pending submission**:
```cpp
// In HandlePictureDecode callback:
// cuvidParseVideoData is synchronous, so the last allocated pending submission
// is guaranteed to be for THIS packet
uint64_t current_submission_id = decoder->m_submissionCounter.load() - 1;
size_t pending_idx = current_submission_id % RING_BUFFER_SIZE;
auto& pending = decoder->m_pendingSubmissions[pending_idx];
// Copy to slot...
```
**Why This Works**:
1. `cuvidParseVideoData()` is **SYNCHRONOUS** - callback runs before function returns
2. `m_submissionCounter` was incremented in DecodeToSurface BEFORE calling cuvidParseVideoData
3. Therefore, `m_submissionCounter - 1` is the submission_id for the current packet
4. Only ONE packet is being parsed at a time (synchronous API)
5. Thread-safe: Even if multiple threads call DecodeToSurface, each has unique submission_id
**Simplified Flow**:
```
cuvidParseVideoData(packet)
(synchronous callback)
HandlePictureDecode(pic_params)
CurrPicIdx = pic_params->CurrPicIdx // NVDEC provides slot index (0-7)
pending_idx = submission_id % 8
Copy m_pendingSubmissions[pending_idx] → m_ringBuffer[CurrPicIdx]
Release m_pendingSubmissions[pending_idx].in_use = false
Return from HandlePictureDecode
Thread A: DecodeToSurface
submission_id = m_submissionCounter++ (now = 5)
pending_idx = 5 % 16 = 5
Store in m_pendingSubmissions[5]
cuvidParseVideoData(packet) ← SYNCHRONOUS
HandlePictureDecode callback (same thread!)
current_id = m_submissionCounter - 1 = 4? NO! = 5 ✓
pending_idx = 5 % 16 = 5
Copy m_pendingSubmissions[5] → m_ringBuffer[CurrPicIdx]
Return from cuvidParseVideoData
```
**Key Points**:
-**cuvidParseVideoData is synchronous** - callbacks complete before return
-**CurrPicIdx is the slot index** - no calculation needed
-**pending_idx = submission_id % 8** - find correct pending context
-**pending_idx = (m_submissionCounter - 1) % 16** - find correct pending context
-**Ring buffer prevents overwrites** - multi-thread safe
-**Release pending slot after copy** - allow reuse for next submission
-**No timestamp tricks needed** - pure synchronous flow guarantee
---