[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support)#2105
[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support)#2105pszemus wants to merge 7 commits intoCCExtractor:masterfrom
Conversation
bdf3aa1 to
ff9c160
Compare
cfsmp3
left a comment
There was a problem hiding this comment.
Good feature with a clear real-world use case. The implementation is clean and properly wired through both C and Rust. However, the --null-terminated flag currently only works for DVB bitmap subtitles, not for text-based captions (CEA-608/708). This needs to be fixed before merging.
The problem
In src/lib_ccx/ccx_encoders_transcript.c, you replaced encoded_crlf with encoded_end_frame in only one place — the bitmap subtitle path at line 92:
// write_cc_bitmap_as_transcript() — line 92 — ✅ changed
write_wrapped(context->out->fh, context->encoded_end_frame, context->encoded_end_frame_length);But the text subtitle path (write_cc_buffer_as_transcript) still uses encoded_crlf in three places that also need updating:
// Line 206 — ❌ not changed (end of each subtitle line)
ret = write(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);
// Line 328 — ❌ not changed (end of each subtitle block)
ret = write(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);There's also line 77 and 90 where encoded_crlf is used for parsing/splitting tokens — those should probably stay as-is since they're detecting line breaks within the input, not writing output.
How to verify
I tested with a CEA-608 stream:
./ccextractor input.ts --txt --stdout --null-terminated 2>/dev/null | xxd | head -30
The output contains only 0d 0a (CRLF) — zero null bytes. The flag has no effect for text-based captions.
What to fix
In src/lib_ccx/ccx_encoders_transcript.c, replace encoded_crlf with encoded_end_frame on lines 206 and 328 (the two write() calls in write_cc_buffer_as_transcript). Leave lines 77 and 90 alone — those are input parsing, not output.
Note: you'll also need to update the ret < context->encoded_crlf_length comparisons on lines 207 and 329 to use encoded_end_frame_length accordingly.
|
Thanks @cfsmp3 I've fixed missing code paths. to: |
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 733ed89...:
Congratulations: Merging this PR would fix the following tests:
All tests passed completely. Check the result page for more info. |
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 733ed89...:
Congratulations: Merging this PR would fix the following tests:
All tests passed completely. Check the result page for more info. |
cfsmp3
left a comment
There was a problem hiding this comment.
Thanks for addressing the previous feedback — the C paths all work now. However there's still one path that doesn't respect --null-terminated:
CEA-708 via the Rust decoder — src/rust/src/decoder/tv_screen.rs:353 hardcodes \r\n:
writer.write_to_file(b"\r\n")?;This means --null-terminated has no effect on CEA-708 transcript output. You can verify:
ccextractor input.ts --txt -o /tmp/test.txt --null-terminated -svc 1
xxd /tmp/test.p1.svc01.txt | head -20
# No null bytes — only 0d 0aThe frame_terminator_0 option needs to be plumbed into the Rust Writer struct so that write_transcript can use it instead of the hardcoded \r\n.
|
Hi @pszemus, I noticed the latest review feedback about plumbing frame_terminator_0 into the Rust Writer struct for CEA-708 support. I'd be happy to help with this if you'd like, just let me know! |
In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
When streaming subtitles (particularly DVBSUB) from ccextractor to WebSocket endpoints via tools like websocat, multi-line subtitles cause issues. Each line is sent as a separate message, resulting in only the last line being visible at the receiving end.
For example, using the following pipeline:
multi-line subtitle frames are sent line-by-line, losing all but the final line.
This PR introduces the
--null-terminatedoption, which appends a null character (\0) as a frame delimiter after each complete subtitle frame (whether single or multi-line). This enables proper frame boundaries for streaming scenarios.Then, it'll be possible to create the following pipeline:
With this change, websocat's
-0flag can properly parse complete subtitle frames using the null delimiter (see websocat documentation).Benefits:
Please compare the following two output files, where with
--null-terminatedenabled new lines in multi-line subtitles were preserved and all frames end with\0.--out=webvtt:ccextractor_webvtt.txt
--out=txt --null-terminated:ccextractor_txt_null-terminated.txt