D3D12-CUDA RGB Pipeline
This commit is contained in:
@@ -407,7 +407,8 @@ CUVIDSOURCEDATAPACKET packet = {};
|
||||
packet.payload = packet_data;
|
||||
packet.payload_size = packet_size;
|
||||
packet.flags = CUVID_PKT_ENDOFPICTURE;
|
||||
packet.timestamp = 0; // Not used - we use m_pendingSubmission instead
|
||||
packet.timestamp = 0; // ⚠️ CANNOT use timestamp to pass pending_idx
|
||||
// NVDEC parser automatically overwrites timestamp field
|
||||
|
||||
CUresult result = cuvidParseVideoData(m_parser, &packet);
|
||||
// cuvidParseVideoData is SYNCHRONOUS - HandlePictureDecode called before return
|
||||
@@ -420,31 +421,76 @@ if (result != CUDA_SUCCESS) {
|
||||
LOGF_DEBUG("[DecodeToSurface] Packet submitted, callback completed");
|
||||
```
|
||||
|
||||
**⚠️ Critical Discovery: timestamp field is read-only**
|
||||
|
||||
During implementation, we discovered that **NVDEC parser automatically sets the timestamp field** based on internal logic. Any value we set in `packet.timestamp` is **overwritten by the parser** before reaching callbacks.
|
||||
|
||||
**Evidence from Testing**:
|
||||
```cpp
|
||||
// DecodeToSurface attempt:
|
||||
packet.timestamp = pending_idx; // Try to pass pending_idx
|
||||
|
||||
// HandlePictureDecode receives:
|
||||
pic_params->nTimeStamp // Contains parser-generated value, NOT our pending_idx!
|
||||
```
|
||||
|
||||
**Why This Happens**:
|
||||
1. NVDEC parser internally manages PTS (Presentation Timestamp)
|
||||
2. Parser extracts timestamp from codec bitstream or generates sequential values
|
||||
3. Our manually-set timestamp is ignored/overwritten
|
||||
4. This is by design - timestamps are for A/V sync, not custom data passing
|
||||
|
||||
**Consequence**: We CANNOT pass pending_idx through packet.timestamp to the callback
|
||||
|
||||
**Solution: Use Most Recent Pending Submission**
|
||||
|
||||
Since we cannot pass pending_idx through timestamp, and cuvidParseVideoData is **synchronous** (callback completes before return), we can safely use the **most recently allocated pending submission**:
|
||||
|
||||
```cpp
|
||||
// In HandlePictureDecode callback:
|
||||
// cuvidParseVideoData is synchronous, so the last allocated pending submission
|
||||
// is guaranteed to be for THIS packet
|
||||
|
||||
uint64_t current_submission_id = decoder->m_submissionCounter.load() - 1;
|
||||
size_t pending_idx = current_submission_id % RING_BUFFER_SIZE;
|
||||
|
||||
auto& pending = decoder->m_pendingSubmissions[pending_idx];
|
||||
// Copy to slot...
|
||||
```
|
||||
|
||||
**Why This Works**:
|
||||
1. `cuvidParseVideoData()` is **SYNCHRONOUS** - callback runs before function returns
|
||||
2. `m_submissionCounter` was incremented in DecodeToSurface BEFORE calling cuvidParseVideoData
|
||||
3. Therefore, `m_submissionCounter - 1` is the submission_id for the current packet
|
||||
4. Only ONE packet is being parsed at a time (synchronous API)
|
||||
5. Thread-safe: Even if multiple threads call DecodeToSurface, each has unique submission_id
|
||||
|
||||
**Simplified Flow**:
|
||||
```
|
||||
cuvidParseVideoData(packet)
|
||||
↓ (synchronous callback)
|
||||
HandlePictureDecode(pic_params)
|
||||
↓
|
||||
CurrPicIdx = pic_params->CurrPicIdx // NVDEC provides slot index (0-7)
|
||||
↓
|
||||
pending_idx = submission_id % 8
|
||||
↓
|
||||
Copy m_pendingSubmissions[pending_idx] → m_ringBuffer[CurrPicIdx]
|
||||
↓
|
||||
Release m_pendingSubmissions[pending_idx].in_use = false
|
||||
↓
|
||||
Return from HandlePictureDecode
|
||||
Thread A: DecodeToSurface
|
||||
↓
|
||||
submission_id = m_submissionCounter++ (now = 5)
|
||||
pending_idx = 5 % 16 = 5
|
||||
Store in m_pendingSubmissions[5]
|
||||
↓
|
||||
cuvidParseVideoData(packet) ← SYNCHRONOUS
|
||||
↓
|
||||
HandlePictureDecode callback (same thread!)
|
||||
↓
|
||||
current_id = m_submissionCounter - 1 = 4? NO! = 5 ✓
|
||||
pending_idx = 5 % 16 = 5
|
||||
Copy m_pendingSubmissions[5] → m_ringBuffer[CurrPicIdx]
|
||||
↓
|
||||
Return from cuvidParseVideoData
|
||||
```
|
||||
|
||||
**Key Points**:
|
||||
- ✅ **cuvidParseVideoData is synchronous** - callbacks complete before return
|
||||
- ✅ **CurrPicIdx is the slot index** - no calculation needed
|
||||
- ✅ **pending_idx = submission_id % 8** - find correct pending context
|
||||
- ✅ **pending_idx = (m_submissionCounter - 1) % 16** - find correct pending context
|
||||
- ✅ **Ring buffer prevents overwrites** - multi-thread safe
|
||||
- ✅ **Release pending slot after copy** - allow reuse for next submission
|
||||
- ✅ **No timestamp tricks needed** - pure synchronous flow guarantee
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user