-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaks on vtt file downloaded from YouTube #439
Comments
If I read the VTT specification correctly, the extra line between |
To some extent, though, the tacit spec is somewhere between the official spec and whatever YouTube does… 😔 |
FYI. I was pointed to the following validator: |
I think allowing an extra line between the cue timing line and the first line of cue contents significantly complicates the parser, which would need to do some look ahead to differentiate between cue contents and cue identifier. Ideally someone from YT could weigh in. |
P.S.: I have asked for input on both the W3C TT WG reflector and the CCSUBS reflector. I plan to merge the PR sometime late next week unless I hear otherwise. |
@zellyn It looks like en.vtt.txt includes a single space character after the timing line, but the inline snippet does not. Can you confirm that the YT download includes that single space? |
Oh, good catch! Yep, it appears to be there:
You can see |
I have revised the PR to fix the detection of empty lines. Empty lines are those that contain no characters (other than |
Version: whatever
uvx
fetches by default right now, presumably 1.1.1.I'm having trouble parsing a vtt file downloaded from YouTube, using a URL that
yt-dlp
gave me.The full file can be fetched with:
This snippet of just the first few lines is enough to provoke the problem:
Here's my snippet of code:
And here's the error I get:
Note: if I edit
.../vtt/reader.py
and add a line withsubtitle_text = ""
after the linecurrent_p = None
, it succeeds in parsing this snippet. However, if I run it on the full vtt file, it starts disliking the timestamps:[Edit]
Hmmm. That URL appears to have a time limit. I'll attach the full vtt file downloaded with:
and renamed to make GitHub happy: en.vtt.txt
The text was updated successfully, but these errors were encountered: