diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 158ba69..1bb0af1 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,10 +1,15 @@ name: CI on: - push: - branches: [master] # only when master gets new commits - pull_request: - branches: [master] # only when a PR is opened/updated that targets master + push: + branches: + - master + - 'development-*' # add pattern for development branches + pull_request: + branches: + - master + - 'development-*' # allow PRs targeting development-* branches + jobs: test: diff --git a/README.md b/README.md index 7121209..857703a 100644 --- a/README.md +++ b/README.md @@ -4,21 +4,19 @@ Japan Association of Radio Industries and Businesses (ARIB) MPEG2 Transport Stre [![CI](https://github.com/johnoneil/arib/actions/workflows/ci.yml/badge.svg?branch=master&event=push)](https://github.com/johnoneil/arib/actions/workflows/ci.yml) +

+ +

-## Description -Closed Captions (CCs) are encoded in Japanese MPEG Transport Streams as a separate PES (Packetized Elementary Stream) within the TS. The format of the data within this PES is described by the (Japanese native) ARIB B-24 standard. An English document describing this standard is included in the Arib/docs directory in this repository. - -This python package provides tools to find and parse this ARIB closed caption information in MPGEG TS files. - -This code can be used in your own applications or used via the arib-ts2ass tool which this package provides. -The image below shows example ARIB closed caption data displayed at runtime on a media player, generated via arib-ts2ass. The text, position and color are all driven by data derived from the MPEG TS Closed Caption elemenatry stream. +# Description +Closed Captions (CCs) are encoded in Japanese MPEG Transport Streams as a separate PES (Packetized Elementary Stream) within the TS. The format of the data within this PES is described by the (Japanese native) ARIB B-24 standard. An English document describing this standard is included in the Arib/docs directory in this repository. -![example of ass file](img/gaki2.png "Example ass file.") +This python package provides tools to find and parse this ARIB closed caption information in MPGEG TS files and can be used in your own applications or used via the tools which this package provides. # Installation -Installation should be typical. We recommend a virtual environment. +Installation should be typical. We recommend using virtual environment. ``` pip install git+https://github.com/johnoneil/arib @@ -32,179 +30,40 @@ cd arib pip install -e . ``` -The above commands may require ```sudo``` though I recommend again installing them in a python virtualenv. - -## `arib-ts2ass` tool - -This package provides a tool (`arib-ts2ass`) that extracts ARIB based closed caption information from an MPEG Transport Stream recording, and formats the info into a standard .ass (Advanced Substation Alpha) subtitle file. The image below shows a resultant .ass subtitle file loaded to the video file it was generated off: -![example of ass file](img/haikyu.png "Example ass file.") -Note the ts2ass tool supports (in a basic way) closed caption locations, furigana (pronunciation guide), text size and color. - -If no PID is specified to the tool, arib-ts2ass will attempt to find the PID of the elementary stream carriing Closed Caption information within the specified MPEG TS file. Or one can be specified if it is known (see below concerning how to find PID values in TS files). - -Basic command line help is available as below. -``` ->arib-ts2ass --help -usage: arib-ts2ass [-h] [-o OUTFILE] [-p PID] [-v] [-q] [-t TMAX] [-m TIMEOFFSET] [--disable-drcs] infile - -Remove ARIB formatted Closed Caption information from an MPEG TS file and format the results as a standard .ass -subtitle file. - -positional arguments: - infile Input filename (MPEG2 Transport Stream File) - -options: - -h, --help show this help message and exit - -o OUTFILE, --outfile OUTFILE - Output filename (.ass subtitle file) - -p PID, --pid PID Specify a PID of a PES known to contain closed caption info (tool will attempt to find the - proper PID if not specified.). - -v, --verbose Verbose output. - -q, --quiet Does not write to stdout. - -t TMAX, --tmax TMAX Subtitle display time limit (seconds). - -m TIMEOFFSET, --timeoffset TIMEOFFSET - Shift all time values in generated .ass file by indicated floating point offset in seconds. - --disable-drcs Disable emitting .ass drawing code for runtime (dynamic) DRCS characters. -``` - -### DRCS Support - -I've introduced basic DRCS (dynamic runtime character) support, so when DRCS characters are encountered in the .ts stream they are cached and emitted as .ass drawing code when encountered in text. See the following image: - -![DRCS in a closed caption](img/drcs.png) - -This behavior can be turned off if the .ass drawing code is too heavyweight by specifying the `--disable-drcs` command line option. This results in previous behavior whereby the "unknown character" glyph is emitted for DRCS (see below). - -![DRCS disabled unknown character](img/no-drcs.png) - -# Experiments and Other Info - -## `arib-ts-extract` and `arib-es-extract` - -This package also installs two additional tools which can be used to draw basic CC information from MPEG ts and es files. These are ```arib-ts-extract``` and ```arib-es-extract```. They skip the usual .ass formatting and show a text representation of the basic ARIB codes present in the .ts or .es file. See the example below: -``` -joneil@joneilDesktop ~/code/arib $ arib-es-extract tests/toriko_subs.es - - - - -<世はグルメ時代> - -<食の探求者 美食屋たちは訢 - -あまたの食材を追い求める> - -<そして この世の食材の頂点 -ゴッド -ほかく -GODの捕獲を目指す訢 - -一人の美食屋がいた!> - - -頰〜 -``` - -In the above output, each line is not timestamped, but you can see the cursor movement info (screen positions in character row/col) text size info, and the on screen CC text data. - -Interestingly, you can see how the furigana for certain words (perl or kanji pronunciation guide) is present for many romaji (latin alphabet) and kanji characters. For example the furigana "ゴッド" is positioned as small text above the normal sized text word "GOD". +# Tools Provided -Timestamp info for the for the various text and clear screen commands would have to be drawn out of the .TS packet info. This functionality is not present in this package. +## `arib-ts2srt` -Also note that in the example above, screen positions and other textual information was described using the ARIB control character set. -There is another way in which such info is carried around: via the ARIB control *seqence* character set. Please refer to the ARIB.control_characters.CS class for more info. +This package provides the `arib-ts2srt` tool which extracts closed caption data from a `.ts` file and produces a simple `.srt` file output. This application also serves as a simple example of how to use the underying library. -An example of inline control sequences carrying text position and other info follows: ``` -えいえゅゃ栄純がきのぃとはゃなに言っくら訢 +arib-ts2srt [-o ] ``` -Refer to the ARIB documentation for descriptions of what these control sequences mean, but some can be summarized here: -* 'S' character indicates the text layout style according to the ARIB std (here 7 indicates horizontal text with geometry based on a screen of 960x540) -* '_' underscore indicates UL corner in pixels of CC area (here at x=170,y=30). -* 'V' indicates the width, height in pixels of the CC area (here 620x480). Note that this is inset inside a stanard screen dimension of 960x540. -* 'W' indicates the height and width of a normal sized character in pixels. Japanese characters tend to be square. -* 'X' is the pixel spacing between characters in CCs. -* 'Y' is the pixel spacing between lines in CCs. -* 'a' Positions the cursor to a screen position in pixels. This is in contrast to the dedicated control character APS (Active Position Set) above which positions the cursor to a particular character *line* and *column*. APS style line and column positions can be translated to pixel positions by using the character width and height, space between characters and lines and the UL position of the CC area (see above). - -# Manually drawing a PID and/or PES from a TS file -I've update the arib-ts2ass tool above to automatically find the id (PID) of the elementary stream carrying closed captions (if there is one) in any MPEG TS file. However, if you'd like to find these PID values for yourself I recommend using the ```tsinfo``` tool as below: -``` -joneil@joneilDesktop ~/code/arib/analysis $ tsinfo .ts -Reading from .ts -Scanning 1000 TS packets - -Packet 452 is PAT -Program list: - Program 2064 -> PID 0101 (257) - -Packet 796 is PMT with PID 0101 (257) - Program 2064, version 15, PCR PID 0100 (256) - Program info (15 bytes): 09 04 00 05 e0 31 f6 04 00 0e e0 32 c1 01 84 - Conditional access: id 0005 (5) PID 0031 (49) data (9 bytes): f6 04 00 0e e0 32 c1 01 84 - Descriptor tag f6 (246) (4 bytes): 00 0e e0 32 - Descriptor tag c1 (193) (1 byte): 84 - Program streams: - PID 0111 ( 273) -> Stream type 02 ( 2) H.262/13818-2 video (MPEG-2) or 11172-2 constrained video - ES info (6 bytes): 52 01 00 c8 01 47 - Descriptor tag 52 ( 82) (1 byte): 00 - Descriptor tag c8 (200) (1 byte): 47 - PID 0112 ( 274) -> Stream type 0f ( 15) 13818-7 Audio with ADTS transport syntax - ES info (3 bytes): 52 01 10 - Descriptor tag 52 ( 82) (1 byte): 10 - PID 0114 ( 276) -> Stream type 06 ( 6) H.222.0/13818-1 PES private data (maybe Dolby/AC-3 in DVB) - ES info (8 bytes): 52 01 30 fd 03 00 08 3d - Descriptor tag 52 ( 82) (1 byte): 30 - Descriptor tag fd (253) (3 bytes): 00 08 3d - PID 0115 ( 277) -> Stream type 06 ( 6) H.222.0/13818-1 PES private data (maybe Dolby/AC-3 in DVB) - ES info (20 bytes): 52 01 38 09 04 00 05 ff ff f6 04 00 0e ff ff fd 03 00 08 3c -... -``` -I recognize the PID 276, (stream type 6) as the PES private CC data from experience. Typically, tsinfo identifies Closed Caption elementary streams as `PES private data (maybe Dolby/AC-3 in DVB)`. The relevant CCs are usually the *first* elementary stream reported as well. -Note that sometimes an adequate PAT (Program allocation table) may not be within the first 1000 packets of the .TS, so you might have to run tsinfo with an additional argument (look through more packets for a PAT). -``` -tsinfo -max 20000 .ts -``` +An option exists to alternately output `.srt` data directly to stdout: -Then, if you wish, you can use ts2es to draw out the ES. ``` -ts2es -pid 276 .ts .es +arib-ts2srt --stdou > output.srt ``` -## arib-autosub -This repo also contains some code for an experimental application "arib-autosub" which draws Closed Caption information out of an MPEG TS file and then translates it via Bing Translate. +## `arib-ts2ass` -As I'm no longer installing this tool when this package is installed the description below is only for reference: +This tool outputs ARIB subtitle information in a formatted `.ass` ("advanced substation alpha") file. The advantage is that text position, color and size can be captured and presented as intended in the `.ts` stream. This is esecially advantageous in presenting furigana or ruby pronunciation guides correctly. -Command line help is available as below: -``` -(arib)joneil@joneilDesktop ~/code/arib $ arib-autosub -h -usage: arib-autosub [-h] infile pid + -Auto translate jp CCs in MPEG TS file. +If no sutitle stream identifieer (PID) is provided to the tool, arib-ts2ass will attempt to find the PID of the elementary stream carriing Closed Caption information, or one can be specified if it is known (see below concerning how to find PID values in TS files). -positional arguments: - infile Input filename (MPEG2 Transport Stream File) - pid Pid of closed caption ES to extract from stream. - -optional arguments: - -h, --help show this help message and exit - -``` - -The application requires 2 command line arguments, the name of the input .ts file and the PID of the CC elementary stream. Please see below regarding how to identify a Closed Caption PID in a .ts file using the tsinfo tool. - -An example screenshot of a resultant subtitle follows (from a news broadcast): -![example of translated ccs](img/news.png "Example of auto translated Closed Captions.") +### DRCS Support -Currently, text position and color are not carried through the translation process. +This tool now has basic DRCS (dynamic runtime character) support, so when DRCS characters are encountered in the .ts stream they are cached and emitted as .ass drawing code when encountered in text. See the following image: -Because this tool uses the Bing Translate API, the user must get their own "Client ID" and "Client scret" credentials from the windows Azue Marketplace. These need be defined in the arib.secret_key module. +![DRCS in a closed caption](img/drcs.png) -To find the PES ID of the closed captions stream within any TS (if it exists!) see the section below. +This behavior can be turned off if the .ass drawing code is too heavyweight by specifying the `--disable-drcs` command line option. This results in previous behavior whereby the "unknown character" glyph is emitted for DRCS (see below). -The translation results are not good. In fact, they are often lewd and comical. Still, this is an interesting experiment. To illustrate the defficiencies of the approach, I present the following screenshot, translating the shot from the previous section. You'll notice that despite the simplicity of the original source, the translation is off. It does give a "general sense" of meaning, however. -![example of auto translation](img/haikyu_eng.png "Example poor auto translation.") +![DRCS disabled unknown character](img/no-drcs.png) +# Experiments and Other Info +See [here](./experiments.md) \ No newline at end of file diff --git a/arib/ass.py b/arib/ass.py index e223c6b..fee4be6 100644 --- a/arib/ass.py +++ b/arib/ass.py @@ -72,7 +72,7 @@ def ass_draw_dialogue(path, p_scale=1, fscx=100, fscy=100, anchor=1): return f"{{\\an{anchor}\\p{p_scale}}}" f"{path}{{\\p0}}" -def ass_draw_drcs_inline(glyph: DrcsGlyph, pad_spaces: int = 2) -> str: +def ass_draw_drcs_inline(glyph: DrcsGlyph, pad_spaces: int = 0) -> str: """ Emit a DRCS vector drawing that inherits the CURRENT ASS state: - inherits \1c (primary color), \1a (alpha), \bord, \\shad, etc. @@ -81,7 +81,7 @@ def ass_draw_drcs_inline(glyph: DrcsGlyph, pad_spaces: int = 2) -> str: - optionally pads with N spaces after the drawing Example use (inline): - "{\\c&H00FF00&}" + ass_draw_drcs_inline(glyph, pad_spaces=2) + "お前たちは" + "{\\c&H00FF00&}" + ass_draw_drcs_inline(glyph, pad_spaces=0) + "お前たちは" """ bmp = drcs_unpack_to_bitmap(glyph.width, glyph.height, glyph.bitmap, depth=glyph.depth_bits) path = bitmap_to_ass_path(bmp, alpha_threshold=1) @@ -155,19 +155,11 @@ def __str__(self): def default_text_glyph_width(glyph) -> float: - if glyph.size == TextSize.NORMAL: - return len(glyph.ch) * (36 + 4) - else: - # medium and small text are half width - return len(glyph.ch) * (36 + 4) / 2.0 + return len(glyph.ch) * CLOSED_CAPTION_AREA.text_width(glyph.size) def drcs_text_glyph_width(glyph) -> float: - if glyph.size == TextSize.NORMAL: - return 36 + 4 - else: - # medium and small text are half width - return (36 + 4) / 2.0 + return CLOSED_CAPTION_AREA.text_width(glyph.size) @dataclass @@ -189,6 +181,7 @@ def __init__(self, pos: Pos): self.items = [] self.pos = copy.copy(pos) self.end_pos = copy.copy(pos) + self.cc_area = CLOSED_CAPTION_AREA def add_glyph(self, glyph: TextGlyph): self.items.append(glyph) @@ -215,16 +208,12 @@ def __str__(self): print("WARNING: generating dialog line for empty teletext run.") return "" - run_is_small = self.is_small() x = self.pos.x y = self.pos.y - # HACK: .ass files don't allow us to easily get lines of text to "fill up" + # .ass files don't allow us to easily get lines of text to "fill up" # the correct vertical space, anchor the text using /an4 (midpoint) and positon # it as if it "fills up" the row. - if run_is_small: - y -= (36 + 24) / 4 - else: - y -= (36 + 24) / 2 + y -= self.cc_area.text_nudge(self.is_small()) current_text_size = None current_text_color = None output = "" @@ -251,6 +240,98 @@ def __str__(self): return output +def rectangles_dialog_union( + runs: list, + start_s: float, + end_s: float, + *, + pad_x: int = 0, + pad_y: int = 0, + alpha: int = 0x80, # 0x00 opaque .. 0xFF invisible + color_bgr: str = "&H000000&", + style: str = "Default", + layer: int = 0, + y_tol: int = 2, # tolerance to group runs into same row band +) -> str: + """ + Build a single Dialogue line that draws a set of axis-aligned rectangles + representing merged background boxes for the given TextRuns. Merges + horizontally within each row band to avoid overlap (and thus stacking). + Coordinates are absolute screen pixels. + """ + + def _ass_time(seconds: float) -> str: + secs = max(0.0, seconds) + total_cs = int(round(secs * 100)) + cs = total_cs % 100 + total_s = total_cs // 100 + s = total_s % 60 + total_m = total_s // 60 + m = total_m % 60 + h = total_m // 60 + return f"{h}:{m:02d}:{s:02d}.{cs:02d}" + + # 1) Collect rectangles (before merging) + rects_by_band = {} # key: (band_y0, band_h) -> list of [x0, x1] + for run in runs: + row_h = 30 if run.is_small() else 60 + left = run.pos.x + right = run.end_pos.x + if right < left: + left, right = right, left + + top = run.pos.y - (row_h / 2) + top -= CLOSED_CAPTION_AREA.text_nudge(run.is_small()) + x0 = int(round(left - pad_x)) + x1 = int(round(right + pad_x)) + y0 = int(round(top - pad_y)) + h = int(round(row_h + 2 * pad_y)) + + # Snap y0 to an existing band within tolerance, or create a new band + chosen_key = None + for by0, bh in rects_by_band.keys(): + if abs(by0 - y0) <= y_tol and bh == h: + chosen_key = (by0, bh) + break + if chosen_key is None: + chosen_key = (y0, h) + rects_by_band[chosen_key] = [] + rects_by_band[chosen_key].append([x0, x1]) + + # 2) Merge intervals within each band + merged_rects = [] # list of (x0, y0, w, h) + for (y0, h), intervals in rects_by_band.items(): + intervals.sort(key=lambda ab: (ab[0], ab[1])) + merged = [] + for a, b in intervals: + if not merged or a > merged[-1][1]: + merged.append([a, b]) + else: + merged[-1][1] = max(merged[-1][1], b) + for a, b in merged: + merged_rects.append((a, y0, b - a, h)) + + if not merged_rects: + return "" + + # 3) Build one drawing with absolute coords; \pos(0,0) + \an7 + t0 = _ass_time(start_s) + t1 = _ass_time(end_s) + + tags = ( + f"{{\\an7}}{{\\pos(0,0)}}{{\\p1}}{{\\bord0}}{{\\shad0}}" + f"{{\\1c{color_bgr}}}{{\\1a&H{alpha:02X}&}}" + ) + + # Multi-rect path; each rect is its own subpath + path_parts = [] + for x, y, w, h in merged_rects: + path_parts.append(f"m {x} {y} l {x+w} {y} l {x+w} {y+h} l {x} {y+h} l {x} {y}") + + path = " ".join(path_parts) + return f"Dialogue: {layer},{t0},{t1},{style},,0,0,0,,{tags}{path}{{\\p0}}\n" + + class ClosedCaptionArea(object): def __init__(self): # these values represent horizontal mode ('7') @@ -269,6 +350,18 @@ def UL(self): def Dimensions(self): return self._Dimensions + def text_nudge(self, is_small: bool): + if is_small: + return (self._CharacterDim.height + self._line_spacing) // 4 + else: + return (self._CharacterDim.height + self._line_spacing) // 2 + + def text_width(self, size: TextSize): + if size == TextSize.NORMAL: + return self._CharacterDim.width + self._char_spacing + else: + return (self._CharacterDim.width + self._char_spacing) // 2 + # A tricky function. # Text ROWs are actually "number of line feeds", or zero based. # The vertical position is determined by current text size when the @@ -295,6 +388,9 @@ def RowCol2ScreenPos(self, row, col, size=TextSize.NORMAL): return Pos(int(round(x)), int(round(y))) +CLOSED_CAPTION_AREA = ClosedCaptionArea() + + class ASSFile(object): """Wrapper for a single open utf-8 encoded .ass subtitle file""" @@ -535,9 +631,8 @@ def clear_screen(formatter, cs, timestamp): formatter._last_end_time_s = end_time_s formatter._current_textsize = TextSize.NORMAL formatter._current_text_color = TextColor.WHITE - start_time = asstime(start_time_s) - end_time = asstime(end_time_s) - runs = formatter.get_dialog_text_runs(start_time, end_time) + + runs = formatter.get_dialog_text_runs(start_time_s, end_time_s) for run in runs: if formatter._ass_file: formatter._ass_file.write(run) @@ -594,6 +689,7 @@ def __init__( video_filename="output.ass", verbose=False, disable_drcs=False, + disable_backgrounds=False, show_debug_grid=False, ): """ @@ -604,7 +700,7 @@ def __init__( """ self._color = default_color self._tmax = tmax - self._CCArea = ClosedCaptionArea() + self._CCArea = CLOSED_CAPTION_AREA self._pos = Pos(0, 0) self._elapsed_time_s = 0.0 self._last_end_time_s = 0.0 @@ -617,6 +713,7 @@ def __init__( self._height = height self._verbose = verbose self._disable_drcs = disable_drcs + self._disable_backgrounds = disable_backgrounds self._show_debug_grid = show_debug_grid self._accumulated_text_runs: List[TextRun] = [] @@ -649,11 +746,18 @@ def add_char(self, ch: str, size_strategy=default_text_glyph_width): self._accumulated_text_runs[-1].add_glyph(glyph) - def get_dialog_text_runs(self, start_time, end_time): + def get_dialog_text_runs(self, start_time_s, end_time_s): + start_time = asstime(start_time_s) + end_time = asstime(end_time_s) runs = [] - prefix = f"Dialogue: 0,{start_time},{end_time}," + prefix = f"Dialogue: 1,{start_time},{end_time}," for run in self._accumulated_text_runs: runs.append(prefix + str(run)) + if not self._disable_backgrounds: + rectangles = rectangles_dialog_union( + self._accumulated_text_runs, start_s=start_time_s, end_s=end_time_s + ) + runs.append(rectangles) self._accumulated_text_runs = [] return runs diff --git a/arib/autosub.py b/arib/autosub.py index 01faf40..e8af0a7 100755 --- a/arib/autosub.py +++ b/arib/autosub.py @@ -1,6 +1,4 @@ -#!/usr/bin/env python -# -*- coding: utf-8 -*- -# vim: set ts=2 expandtab: +#!/usr/bin/env python3 """ Module: autosub.py Desc: Extract CCs from .ts file-->translate via bing-->output .ass subtitle file diff --git a/arib/bing.py b/arib/bing.py index 00a86d0..dcea9b6 100755 --- a/arib/bing.py +++ b/arib/bing.py @@ -1,5 +1,4 @@ -#!/usr/bin/env python -# vim: set ts=2 expandtab: +#!/usr/bin/env python3 """ Module: bing.py Desc: Translate strings via Bing traslate diff --git a/arib/es_extract.py b/arib/es_extract.py index 177c092..814cb6a 100755 --- a/arib/es_extract.py +++ b/arib/es_extract.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 """ Module: es-extract Desc: Extract ARIB closed caption info from a previously demuxed Elementary Stream diff --git a/arib/mpeg/ts.py b/arib/mpeg/ts.py index 72876e5..0afa40f 100755 --- a/arib/mpeg/ts.py +++ b/arib/mpeg/ts.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 """ Module: ts Desc: Minimalist MPEG ts packet parsing diff --git a/arib/srt.py b/arib/srt.py new file mode 100644 index 0000000..48d3634 --- /dev/null +++ b/arib/srt.py @@ -0,0 +1,287 @@ +from pathlib import Path +from enum import Enum +import sys +import math +from typing import Optional, Union, Callable, Dict, Any +import re + +import arib.code_set as code_set +import arib.control_characters as control_characters +from arib.arib_exceptions import FileOpenError + +Number = Union[int, float] + + +class TextSize(Enum): + SMALL = "small" + MEDIUM = "medium" + NORMAL = "normal" + + def __str__(self): + return self.value + + +class SRTWriter: + """ + A simple sink that can write SRT content either to a file or to stdout. + + - Lazy open (file only opened on first write) + - Context-manager friendly + - If to_stdout=True or path == "-", writes to sys.stdout and does not close it + """ + + def __init__(self, path: Optional[str], *, to_stdout: bool = False): + self._path = None if to_stdout else (None if path is None else str(path)) + if path == "-": # conventional stdout marker + to_stdout = True + self._path = None + self._to_stdout = to_stdout + self._fh = None # type: Optional[object] + + def __enter__(self): + # Opening is lazy; we still return self so "with" works. + return self + + def __exit__(self, exc_type, exc, tb): + self.close() + + @property + def is_open(self) -> bool: + return self._fh is not None + + def _ensure_open(self): + if self.is_open: + return + if self._to_stdout: + self._fh = sys.stdout + return + if not self._path: + raise FileOpenError("No output path provided and stdout not selected.") + # Ensure parent dir exists + Path(self._path).parent.mkdir(parents=True, exist_ok=True) + try: + self._fh = open(self._path, "w", encoding="utf-8", newline="") + except Exception as e: + raise FileOpenError(f"Could not open file {self._path!r} for writing: {e}") from e + + def write(self, text: str) -> None: + self._ensure_open() + self._fh.write(text) + + def flush(self) -> None: + if self.is_open and self._fh is not sys.stdout: + self._fh.flush() + + def close(self) -> None: + if self.is_open and self._fh is not sys.stdout: + try: + self._fh.close() + finally: + self._fh = None + + +def srt_timecode(seconds: Number, *, clamp_negative: bool = True) -> str: + if not isinstance(seconds, (int, float)) or not math.isfinite(seconds): + raise ValueError("seconds must be a finite int or float") + if clamp_negative and seconds < 0: + seconds = 0.0 + total_ms = int(round(seconds * 1000)) + sign = "" + if total_ms < 0: + sign = "-" + total_ms = -total_ms + hours, rem = divmod(total_ms, 3_600_000) + minutes, rem = divmod(rem, 60_000) + secs, ms = divmod(rem, 1000) + return f"{sign}{hours:02d}:{minutes:02d}:{secs:02d},{ms:03d}" + + +# Precompiled patterns used by control character handler +_a_regex = re.compile(rb'\d{1,4});(?P\d{1,4}) a">') +_pos_regex = r"({\\pos\(\d{1,4},\d{1,4}\)})" # kept for compatibility (not used directly here) + + +class SRTFormatter: + def __init__( + self, + default_color: str = "white", + tmax: int = 5, + width: int = 960, + height: int = 540, + video_filename: str = "output.srt", + verbose: bool = False, + enable_small_text: bool = False, + *, # forced explicit naming for output_to_stdout + output_to_stdout: bool = False, + ): + self._color = default_color + self._tmax = tmax + self._elapsed_time_s = 0.0 + self._last_end_time_s = 0.0 # preserved + self._filename = video_filename + self._width = width + self._height = height + self._verbose = verbose + self.line_count = 1 + self.current_lines = [""] # start ready to receive chars + self.enable_small_text = enable_small_text + self._current_textsize = TextSize.NORMAL + + # writer is created lazily on first emit (so we never open unless we have content) + self._writer: Optional[SRTWriter] = None + self._to_stdout = output_to_stdout or (video_filename == "-") + + # Bound-method dispatch + self._dispatch: Dict[Any, Callable[[Any, float], None]] = { + code_set.Kanji: self._kanji, + code_set.Alphanumeric: self._alphanumeric, + code_set.Hiragana: self._hiragana, + code_set.Katakana: self._katakana, + control_characters.APS: self._position_set, + control_characters.MSZ: self._medium, + control_characters.NSZ: self._normal, + control_characters.SP: self._space, + control_characters.SSZ: self._small, + control_characters.CS: self._clear_screen, + control_characters.CSI: self._control_character, + control_characters.PAPF: self._active_position_forward, + # DRCS -> unknown char replacement + code_set.DRCS0: self._drcs, + code_set.DRCS1: self._drcs, + code_set.DRCS2: self._drcs, + code_set.DRCS3: self._drcs, + code_set.DRCS4: self._drcs, + code_set.DRCS5: self._drcs, + code_set.DRCS6: self._drcs, + code_set.DRCS7: self._drcs, + code_set.DRCS8: self._drcs, + code_set.DRCS9: self._drcs, + code_set.DRCS10: self._drcs, + code_set.DRCS11: self._drcs, + code_set.DRCS12: self._drcs, + code_set.DRCS13: self._drcs, + code_set.DRCS14: self._drcs, + code_set.DRCS15: self._drcs, + } + + # ---------- public API ---------- + + def add_char(self, ch: str) -> None: + if self._current_textsize != TextSize.SMALL or self.enable_small_text: + if not self.current_lines: + self.current_lines.append("") + self.current_lines[-1] += ch + + def new_line(self, timestamp: float) -> None: + if self.current_lines and self.current_lines[-1]: + self.current_lines.append("") + + def emit_lines(self, timestamp: float) -> None: + + start_time = self._elapsed_time_s + end_time = timestamp + if self._elapsed_time_s == timestamp: + start_time = self._last_end_time_s + end_time = start_time + self._tmax + elif timestamp - self._elapsed_time_s > self._tmax: + end_time = self._elapsed_time_s + self._tmax + + self._elapsed_time_s = timestamp + self._last_end_time_s = end_time + self._current_textsize = TextSize.NORMAL + + start = srt_timecode(start_time) + end = srt_timecode(end_time) + + lines = "\n".join(s for s in self.current_lines if s) + if lines: + # lazily create writer only when we truly write the first non-empty subtitle + if not self._writer: + if self._verbose and not self._to_stdout: + print("Found nonempty ARIB closed caption data in file.") + target = "(stdout)" if self._to_stdout else self._filename + print("Writing .srt output to: " + target) + self._writer = SRTWriter(self._filename, to_stdout=self._to_stdout) + + self._writer.write(f"{self.line_count}\n") + self._writer.write(f"{start} --> {end}\n") + self._writer.write(f"{lines}\n\n") + + self.line_count += 1 + self._elapsed_time_s = timestamp + + self.current_lines = [""] + + def position_forward(self, n: int) -> None: + self.add_char(" " * n) + + def open_file(self) -> None: + """Compatibility no-op: writer is opened lazily in emit_lines().""" + return + + def file_written(self) -> bool: + return self._writer is not None and self._writer.is_open + + def finalize(self) -> None: + """Optional: flush/close output (useful for CLIs/tests).""" + # No implicit emit here; we just close the writer if it exists. + if self._writer: + self._writer.flush() + self._writer.close() + + def format(self, captions, timestamp: float) -> None: + for c in captions: + handler = self._dispatch.get(type(c)) + if handler is not None: + handler(c, timestamp) + else: + # TODO: warning/log for unhandled characters + pass + + # ---------- bound handlers ---------- + + def _kanji(self, k, _ts: float) -> None: + self.add_char(str(k)) + + def _alphanumeric(self, a, _ts: float) -> None: + self.add_char(str(a)) + + def _hiragana(self, h, _ts: float) -> None: + self.add_char(str(h)) + + def _katakana(self, k, _ts: float) -> None: + self.add_char(str(k)) + + def _medium(self, _k, _ts: float) -> None: + self._current_textsize = TextSize.MEDIUM + + def _normal(self, _k, _ts: float) -> None: + self._current_textsize = TextSize.NORMAL + + def _small(self, _k, _ts: float) -> None: + self._current_textsize = TextSize.SMALL + + def _space(self, _k, _ts: float) -> None: + self.add_char(" ") + + def _drcs(self, _c, _ts: float) -> None: + self.add_char("�") + + def _position_set(self, _p, ts: float) -> None: + self.new_line(ts) + + def _active_position_forward(self, papf, _ts: float) -> None: + try: + n = int(getattr(papf, "count", 0)) + except Exception: + n = 0 + if n > 0: + self.position_forward(n) + + def _control_character(self, csi, ts: float) -> None: + cmd = csi if isinstance(csi, (bytes, bytearray)) else str(csi).encode("utf-8") + if _a_regex.search(cmd): + self.new_line(ts) + + def _clear_screen(self, _cs, ts: float) -> None: + self.emit_lines(ts) diff --git a/arib/ts2ass.py b/arib/ts2ass.py index 7fe2c0d..7a664ca 100755 --- a/arib/ts2ass.py +++ b/arib/ts2ass.py @@ -1,18 +1,22 @@ -#!/usr/bin/env python -# vim: set ts=2 expandtab: +#!/usr/bin/env python3 """ Module: ts2ass -Desc: Extract ARIB CCs from an MPEG transport stream and produce an .ass subtitle file off them. +Desc: Extract ARIB CCs from an MPEG transport stream and produce an .ass subtitle file.. Author: John O'Neil Email: oneil.john@gmail.com DATE: Saturday, May 24th 2014 UPDATED: Saturday, Jan 12th 2017 +UPDATED: Saturday, Oct 4th, 2025 """ -import os +from __future__ import annotations + import sys import argparse import traceback +from dataclasses import dataclass +from pathlib import Path +from typing import Optional from arib.read import EOFError @@ -26,157 +30,142 @@ from arib.ass import ASSFormatter -# GLOBALS TO KEEP TRACK OF STATE -initial_timestamp = None -elapsed_time_s = 0 -pid = -1 -VERBOSE = False -SILENT = False -DEBUG = False -ass = None -infilename = "" -outfilename = "" -tmax = 0 -disable_drcs = False -show_debug_grid = False - - -def OnProgress(bytes_read, total_bytes, percent): - """ - Callback method invoked on a change in file progress percent (not every packet) - Meant as a lower frequency callback to update onscreen progress percent or something. - :param bytes_read: - :param total_bytes: - :param percent: - :return: - """ - global VERBOSE - global SILENT - if not VERBOSE and not SILENT: - sys.stdout.write("progress: %.2f%% \r" % (percent)) - sys.stdout.flush() - - -def OnTSPacket(packet): - """ - Callback invoked on the successful extraction of a single TS packet from a ts file - :param packet: The entire packet (header and payload) as a string - :return: None - """ - global initial_timestamp - global elapsed_time_s - global time_offset - - # pcr (program count record) can be used to calculate elapsed time in seconds - # we've read through the .ts file - pcr = TS.get_pcr(packet) - if pcr > 0: - current_timestamp = pcr - initial_timestamp = initial_timestamp or current_timestamp - delta = current_timestamp - initial_timestamp - elapsed_time_s = float(delta) / 90000.0 + time_offset - - -def OnESPacket(current_pid, packet, header_size): - """ - Callback invoked on the successful extraction of an Elementary Stream packet from the - Transport Stream file packets. - :param current_pid: The TS Program ID for the TS packets this info originated from - :param packet: The ENTIRE ES packet, header and payload-- which may have been assembled - from multiple TS packet payloads. - :param header_size: Size of the header in bytes (characters in the string). Provided to more - easily separate the packet into header and payload. - :return: None - """ - global pid - global VERBOSE - global SILENT - global elapsed_time_s - global ass - global infilename - global outfilename - global tmax - global time_offset - - if pid >= 0 and current_pid != pid: - return - - try: - payload = ES.get_pes_payload(packet) - f = list(payload) - data_group = DataGroup(f) - if not data_group.is_management_data(): - # We now have a Data Group that contains caption data. - # We take out its payload, but this is further divided into 'Data Unit' structures - caption = data_group.payload() - # iterate through the Data Units in this payload via another generator. - for data_unit in next_data_unit(caption): - # we're only interested in those Data Units which are - # "statement body" to get CC data. - if not isinstance(data_unit.payload(), StatementBody): - continue - - if not ass: - v = not SILENT - ass = ASSFormatter( - tmax=tmax, - video_filename=outfilename, - verbose=v, - disable_drcs=disable_drcs, - show_debug_grid=show_debug_grid, - ) - - ass.format(data_unit.payload().payload(), elapsed_time_s) - - # this code used to sed the PID we're scanning via first successful ARIB decode - # but i've changed it below to draw present CC language info form ARIB - # management data. Leaving this here for reference. - # if pid < 0 and not SILENT: - # pid = current_pid - # print("Found Closed Caption data in PID: " + str(pid)) - # print("Will now only process this PID to improve performance.") - - else: - # management data - management_data = data_group.payload() - numlang = management_data.num_languages() - if pid < 0 and numlang > 0: - for language in range(numlang): - if not SILENT: - print( - "Closed caption management data for language: " - + management_data.language_code(language) - + " available in PID: " - + str(current_pid) + +@dataclass(frozen=True) +class Config: + infile: Path + outfile: Path + pid: int + verbose: bool + quiet: bool + tmax: int + time_offset: float + disable_drcs: bool + disable_backgrounds: bool + show_debug_grid: bool + + +class TS2ASS: + def __init__(self, cfg: Config): + self.cfg = cfg + + # state formerly in globals + self.initial_timestamp: Optional[int] = None + self.elapsed_time_s: float = 0.0 + self.pid: int = cfg.pid # may be discovered later from mgmt data if -1 + self.ass: Optional[ASSFormatter] = None + + # ---- callbacks (former On* functions) ---- + + def on_progress(self, bytes_read, total_bytes, percent): + # preserve original behavior: show progress only when not verbose and not quiet + if not self.cfg.verbose and not self.cfg.quiet: + sys.stdout.write(f"progress: {percent:.2f}% \r") + sys.stdout.flush() + + def on_ts_packet(self, packet): + # pcr can be used to calculate elapsed time in seconds through the .ts file + pcr = TS.get_pcr(packet) + if pcr > 0: + current_timestamp = pcr + self.initial_timestamp = self.initial_timestamp or current_timestamp + delta = current_timestamp - self.initial_timestamp + self.elapsed_time_s = float(delta) / 90000.0 + self.cfg.time_offset + + def on_es_packet(self, current_pid, packet, header_size): + # honor fixed PID if provided + if self.pid >= 0 and current_pid != self.pid: + return + + try: + payload = ES.get_pes_payload(packet) + f = list(payload) + data_group = DataGroup(f) + if not data_group.is_management_data(): + # Data group contains caption data -> iterate data units + caption = data_group.payload() + for data_unit in next_data_unit(caption): + # Only interested in "statement body" units + if not isinstance(data_unit.payload(), StatementBody): + continue + + if not self.ass: + v = not self.cfg.quiet + self.ass = ASSFormatter( + tmax=self.cfg.tmax, + video_filename=str(self.cfg.outfile), + verbose=v, + disable_drcs=self.cfg.disable_drcs, + disable_backgrounds=self.cfg.disable_backgrounds, + show_debug_grid=self.cfg.show_debug_grid, ) - print("Will now only process this PID to improve performance.") - pid = current_pid - - except EOFError: - pass - except FileOpenError as ex: - # allow IOErrors to kill application - raise ex - except Exception: - if not SILENT and pid >= 0: + + self.ass.format(data_unit.payload().payload(), self.elapsed_time_s) + + # (Old commented PID detection code retained in spirit by mgmt data branch below) + + else: + # management data + management_data = data_group.payload() + numlang = management_data.num_languages() + if self.pid < 0 and numlang > 0: + for language in range(numlang): + if not self.cfg.quiet: + print( + "Closed caption management data for language: " + + management_data.language_code(language) + + " available in PID: " + + str(current_pid) + ) + print("Will now only process this PID to improve performance.") + self.pid = current_pid + + except EOFError: + pass + except FileOpenError as ex: + # allow IOErrors to kill application + raise ex + except Exception: + # Preserve original behavior: print when we have (or found) a PID + if not self.cfg.quiet and self.pid >= 0: + print( + "Exception thrown while handling DataGroup in ES." + "This may be due to many factors " + + "such as file corruption or the .ts file using" + " as yet unsupported features." + ) + traceback.print_exc(file=sys.stdout) + + # ---- driver ---- + + def run(self) -> int: + if not self.cfg.infile.exists() and not self.cfg.quiet: + print(f"Input filename :{self.cfg.infile} does not exist.") + return -1 + + ts = TS(str(self.cfg.infile)) + ts.Progress = self.on_progress + ts.OnTSPacket = self.on_ts_packet + ts.OnESPacket = self.on_es_packet + + ts.Parse() + + if self.pid < 0 and not self.cfg.quiet: + print(f"*** Sorry. No ARIB subtitle content was found in file: {self.cfg.infile} ***") + return -1 + + if self.ass and not self.ass.file_written() and not self.cfg.quiet: print( - "Exception thrown while handling DataGroup in ES. This may be due to many factors" - + "such as file corruption or the .ts file using as yet unsupported features." + "*** Sorry. No nonempty ARIB closed caption content found in file " + + str(self.cfg.infile) + + " ***" ) - traceback.print_exc(file=sys.stdout) + return -1 + return 0 -def main(): - global pid - global VERBOSE - global SILENT - global infilename - global outfilename - global tmax - global time_offset - global disable_drcs - global show_debug_grid +def parse_args(argv=None) -> Config: parser = argparse.ArgumentParser( description=( "Remove ARIB formatted Closed Caption information from an MPEG TS file " @@ -217,50 +206,40 @@ def main(): help="Disable emitting .ass drawing code for runtime (dynamic) DRCS characters.", action="store_true", ) + parser.add_argument( + "--disable-backgrounds", + help="Disable shaded backgrounds behind on screen text.", + action="store_true", + ) parser.add_argument( "--show-debug-grid", - help="Generate a character position debug grid" "visible onscreen.", + help="Generate a character position debug grid visible onscreen.", action="store_true", ) - args = parser.parse_args() - - pid = args.pid - infilename = args.infile - outfilename = infilename + ".ass" - if args.outfile is not None: - outfilename = args.outfile - SILENT = args.quiet - VERBOSE = args.verbose - tmax = args.tmax - time_offset = args.timeoffset - disable_drcs = args.disable_drcs - show_debug_grid = args.show_debug_grid - - if not os.path.exists(infilename) and not SILENT: - print("Input filename :" + infilename + " does not exist.") - sys.exit(-1) - - ts = TS(infilename) - - ts.Progress = OnProgress - ts.OnTSPacket = OnTSPacket - ts.OnESPacket = OnESPacket - - ts.Parse() - - if pid < 0 and not SILENT: - print("*** Sorry. No ARIB subtitle content was found in file: " + infilename + " ***") - sys.exit(-1) - - if ass and not ass.file_written() and not SILENT: - print( - "*** Sorry. No nonempty ARIB closed caption content found in file " - + infilename - + " ***" - ) - sys.exit(-1) + args = parser.parse_args(argv) + + infile = Path(args.infile) + outfile = Path(args.outfile) if args.outfile is not None else infile.with_suffix(".ass") + + return Config( + infile=infile, + outfile=outfile, + pid=args.pid, + verbose=bool(args.verbose), + quiet=bool(args.quiet), + tmax=int(args.tmax), + time_offset=float(args.timeoffset), + disable_drcs=bool(args.disable_drcs), + disable_backgrounds=bool(args.disable_backgrounds), + show_debug_grid=bool(args.show_debug_grid), + ) + - sys.exit(0) +def main(argv=None): + cfg = parse_args(argv) + app = TS2ASS(cfg) + rc = app.run() + sys.exit(rc) if __name__ == "__main__": diff --git a/arib/ts2srt.py b/arib/ts2srt.py new file mode 100755 index 0000000..858085c --- /dev/null +++ b/arib/ts2srt.py @@ -0,0 +1,234 @@ +#!/usr/bin/env python3 +""" +Module: ts2srt +Desc: Extract ARIB CCs from an MPEG transport stream and produce an .srt subtitle file off them. +Author: John O'Neil +Email: oneil.john@gmail.com +DATE: Saturday, Oct 4th 2025 +""" + +from __future__ import annotations + +import sys +import argparse +import traceback +from dataclasses import dataclass +from pathlib import Path +from typing import Optional + +from arib.read import EOFError + +from arib.closed_caption import next_data_unit +from arib.closed_caption import StatementBody +from arib.data_group import DataGroup +from arib.arib_exceptions import FileOpenError + +from arib.mpeg.ts import TS +from arib.mpeg.ts import ES + +from arib.srt import SRTFormatter + + +@dataclass(frozen=True) +class Config: + infile: Path + outfile: Path + pid: int + verbose: bool + quiet: bool + tmax: int + time_offset: float + enable_small_text: bool + output_to_stdout: bool + + +class TS2srt: + def __init__(self, cfg: Config): + self.cfg = cfg + + # state formerly in globals + self.initial_timestamp: Optional[int] = None + self.elapsed_time_s: float = 0.0 + self.pid: int = cfg.pid # may be discovered later from mgmt data if -1 + self.srt: Optional[SRTFormatter] = None + + # ---- callbacks (former On* functions) ---- + + def on_progress(self, bytes_read, total_bytes, percent): + # preserve original behavior: show progress only when not verbose and not quiet + if not self.cfg.verbose and not self.cfg.quiet and not self.cfg.output_to_stdout: + sys.stdout.write(f"progress: {percent:.2f}% \r") + sys.stdout.flush() + + def on_ts_packet(self, packet): + # pcr can be used to calculate elapsed time in seconds through the .ts file + pcr = TS.get_pcr(packet) + if pcr > 0: + current_timestamp = pcr + self.initial_timestamp = self.initial_timestamp or current_timestamp + delta = current_timestamp - self.initial_timestamp + self.elapsed_time_s = float(delta) / 90000.0 + self.cfg.time_offset + + def on_es_packet(self, current_pid, packet, header_size): + # honor fixed PID if provided + if self.pid >= 0 and current_pid != self.pid: + return + + try: + payload = ES.get_pes_payload(packet) + f = list(payload) + data_group = DataGroup(f) + if not data_group.is_management_data(): + # Data group contains caption data -> iterate data units + caption = data_group.payload() + for data_unit in next_data_unit(caption): + # Only interested in "statement body" units + if not isinstance(data_unit.payload(), StatementBody): + continue + + if not self.srt: + v = not self.cfg.quiet + self.srt = SRTFormatter( + tmax=self.cfg.tmax, + video_filename=str(self.cfg.outfile), + verbose=v, + enable_small_text=self.cfg.enable_small_text, + output_to_stdout=self.cfg.output_to_stdout, + ) + + self.srt.format(data_unit.payload().payload(), self.elapsed_time_s) + + # (Old commented PID detection code retained in spirit by mgmt data branch below) + + else: + # management data + management_data = data_group.payload() + numlang = management_data.num_languages() + if self.pid < 0 and numlang > 0: + for language in range(numlang): + if not self.cfg.quiet and not self.cfg.output_to_stdout: + print( + "Closed caption management data for language: " + + management_data.language_code(language) + + " available in PID: " + + str(current_pid) + ) + print("Will now only process this PID to improve performance.") + self.pid = current_pid + + except EOFError: + pass + except FileOpenError as ex: + # allow IOErrors to kill application + raise ex + except Exception: + # Preserve original behavior: print when we have (or found) a PID + if not self.cfg.quiet and self.pid >= 0: + print( + "Exception thrown while handling DataGroup in ES." + "This may be due to many factors " + "such as file corruption or the .ts file using" + "as yet unsupported features." + ) + traceback.print_exc(file=sys.stdout) + + # ---- driver ---- + + def run(self) -> int: + if not self.cfg.infile.exists() and not self.cfg.quiet: + print(f"Input filename :{self.cfg.infile} does not exist.") + return -1 + + ts = TS(str(self.cfg.infile)) + ts.Progress = self.on_progress + ts.OnTSPacket = self.on_ts_packet + ts.OnESPacket = self.on_es_packet + + ts.Parse() + + if self.pid < 0 and not self.cfg.quiet: + print(f"*** Sorry. No ARIB subtitle content was found in file: {self.cfg.infile} ***") + return -1 + + if self.srt and not self.srt.file_written() and not self.cfg.quiet: + print( + "*** Sorry. No nonempty ARIB closed caption content found in file " + + str(self.cfg.infile) + + " ***" + ) + return -1 + + return 0 + + +def parse_args(argv=None) -> Config: + parser = argparse.ArgumentParser( + description=( + "Remove ARIB formatted Closed Caption information from an MPEG TS file " + "and format the results as a standard .srt subtitle file." + ) + ) + parser.add_argument("infile", help="Input filename (MPEG2 Transport Stream File)", type=str) + parser.add_argument( + "-o", "--outfile", help="Output filename (.srt subtitle file)", type=str, default=None + ) + parser.add_argument("--stdout", help="Output .srt content to stdout.", action="store_true") + parser.add_argument( + "-p", + "--pid", + help=( + "Specify a PID of a PES known to contain closed caption info " + "(tool will attempt to find the proper PID if not specified.)." + ), + type=int, + default=-1, + ) + parser.add_argument("-v", "--verbose", help="Verbose output.", action="store_true") + parser.add_argument("-q", "--quiet", help="Does not write to stdout.", action="store_true") + parser.add_argument( + "-t", "--tmax", help="Subtitle display time limit (seconds).", type=int, default=4 + ) + parser.add_argument( + "-m", + "--timeoffset", + help=( + "Shift all time values in generated .srt file" + "by indicated floating point offset in seconds." + ), + type=float, + default=0.0, + ) + parser.add_argument( + "--enable-small-text", + help=( + "Enable the extraction of small (furigana or ruby) text" "and emit it to the .srt file." + ), + action="store_true", + ) + args = parser.parse_args(argv) + + infile = Path(args.infile) + outfile = Path(args.outfile) if args.outfile is not None else infile.with_suffix(".srt") + + return Config( + infile=infile, + outfile=outfile, + pid=args.pid, + verbose=bool(args.verbose), + quiet=bool(args.quiet), + tmax=int(args.tmax), + time_offset=float(args.timeoffset), + enable_small_text=bool(args.enable_small_text), + output_to_stdout=bool(args.stdout), + ) + + +def main(argv=None): + cfg = parse_args(argv) + app = TS2srt(cfg) + rc = app.run() + sys.exit(rc) + + +if __name__ == "__main__": + main() diff --git a/arib/ts_extract.py b/arib/ts_extract.py index c808f0a..84b42b1 100755 --- a/arib/ts_extract.py +++ b/arib/ts_extract.py @@ -1,4 +1,4 @@ -#!/usr/bin/env python +#!/usr/bin/env python3 """ Module: test Desc: Test to see how quickly I can parse TS es packets diff --git a/experiments.md b/experiments.md new file mode 100644 index 0000000..85a45dc --- /dev/null +++ b/experiments.md @@ -0,0 +1,129 @@ +# Experiments and Other Info + +## `arib-ts-extract` and `arib-es-extract` + +This package also installs two additional tools which can be used to draw basic CC information from MPEG ts and es files. These are ```arib-ts-extract``` and ```arib-es-extract```. They skip the usual .ass formatting and show a text representation of the basic ARIB codes present in the .ts or .es file. See the example below: +``` +joneil@joneilDesktop ~/code/arib $ arib-es-extract tests/toriko_subs.es + + + + +<世はグルメ時代> + +<食の探求者 美食屋たちは訢 + +あまたの食材を追い求める> + +<そして この世の食材の頂点 +ゴッド +ほかく +GODの捕獲を目指す訢 + +一人の美食屋がいた!> + + +頰〜 +``` + +In the above output, each line is not timestamped, but you can see the cursor movement info (screen positions in character row/col) text size info, and the on screen CC text data. + +Interestingly, you can see how the furigana for certain words (perl or kanji pronunciation guide) is present for many romaji (latin alphabet) and kanji characters. For example the furigana "ゴッド" is positioned as small text above the normal sized text word "GOD". + +Timestamp info for the for the various text and clear screen commands would have to be drawn out of the .TS packet info. This functionality is not present in this package. + +Also note that in the example above, screen positions and other textual information was described using the ARIB control character set. +There is another way in which such info is carried around: via the ARIB control *seqence* character set. Please refer to the ARIB.control_characters.CS class for more info. + +An example of inline control sequences carrying text position and other info follows: +``` +えいえゅゃ栄純がきのぃとはゃなに言っくら訢 +``` +Refer to the ARIB documentation for descriptions of what these control sequences mean, but some can be summarized here: +* 'S' character indicates the text layout style according to the ARIB std (here 7 indicates horizontal text with geometry based on a screen of 960x540) +* '_' underscore indicates UL corner in pixels of CC area (here at x=170,y=30). +* 'V' indicates the width, height in pixels of the CC area (here 620x480). Note that this is inset inside a stanard screen dimension of 960x540. +* 'W' indicates the height and width of a normal sized character in pixels. Japanese characters tend to be square. +* 'X' is the pixel spacing between characters in CCs. +* 'Y' is the pixel spacing between lines in CCs. +* 'a' Positions the cursor to a screen position in pixels. This is in contrast to the dedicated control character APS (Active Position Set) above which positions the cursor to a particular character *line* and *column*. APS style line and column positions can be translated to pixel positions by using the character width and height, space between characters and lines and the UL position of the CC area (see above). + +# Manually drawing a PID and/or PES from a TS file +I've update the arib-ts2ass tool above to automatically find the id (PID) of the elementary stream carrying closed captions (if there is one) in any MPEG TS file. However, if you'd like to find these PID values for yourself I recommend using the ```tsinfo``` tool as below: +``` +joneil@joneilDesktop ~/code/arib/analysis $ tsinfo .ts +Reading from .ts +Scanning 1000 TS packets + +Packet 452 is PAT +Program list: + Program 2064 -> PID 0101 (257) + +Packet 796 is PMT with PID 0101 (257) + Program 2064, version 15, PCR PID 0100 (256) + Program info (15 bytes): 09 04 00 05 e0 31 f6 04 00 0e e0 32 c1 01 84 + Conditional access: id 0005 (5) PID 0031 (49) data (9 bytes): f6 04 00 0e e0 32 c1 01 84 + Descriptor tag f6 (246) (4 bytes): 00 0e e0 32 + Descriptor tag c1 (193) (1 byte): 84 + Program streams: + PID 0111 ( 273) -> Stream type 02 ( 2) H.262/13818-2 video (MPEG-2) or 11172-2 constrained video + ES info (6 bytes): 52 01 00 c8 01 47 + Descriptor tag 52 ( 82) (1 byte): 00 + Descriptor tag c8 (200) (1 byte): 47 + PID 0112 ( 274) -> Stream type 0f ( 15) 13818-7 Audio with ADTS transport syntax + ES info (3 bytes): 52 01 10 + Descriptor tag 52 ( 82) (1 byte): 10 + PID 0114 ( 276) -> Stream type 06 ( 6) H.222.0/13818-1 PES private data (maybe Dolby/AC-3 in DVB) + ES info (8 bytes): 52 01 30 fd 03 00 08 3d + Descriptor tag 52 ( 82) (1 byte): 30 + Descriptor tag fd (253) (3 bytes): 00 08 3d + PID 0115 ( 277) -> Stream type 06 ( 6) H.222.0/13818-1 PES private data (maybe Dolby/AC-3 in DVB) + ES info (20 bytes): 52 01 38 09 04 00 05 ff ff f6 04 00 0e ff ff fd 03 00 08 3c +... +``` +I recognize the PID 276, (stream type 6) as the PES private CC data from experience. Typically, tsinfo identifies Closed Caption elementary streams as `PES private data (maybe Dolby/AC-3 in DVB)`. The relevant CCs are usually the *first* elementary stream reported as well. + +Note that sometimes an adequate PAT (Program allocation table) may not be within the first 1000 packets of the .TS, so you might have to run tsinfo with an additional argument (look through more packets for a PAT). +``` +tsinfo -max 20000 .ts +``` + +Then, if you wish, you can use ts2es to draw out the ES. +``` +ts2es -pid 276 .ts .es +``` + +## arib-autosub +This repo also contains some code for an experimental application "arib-autosub" which draws Closed Caption information out of an MPEG TS file and then translates it via Bing Translate. + +As I'm no longer installing this tool when this package is installed the description below is only for reference: + +Command line help is available as below: +``` +(arib)joneil@joneilDesktop ~/code/arib $ arib-autosub -h +usage: arib-autosub [-h] infile pid + +Auto translate jp CCs in MPEG TS file. + +positional arguments: + infile Input filename (MPEG2 Transport Stream File) + pid Pid of closed caption ES to extract from stream. + +optional arguments: + -h, --help show this help message and exit + +``` + +The application requires 2 command line arguments, the name of the input .ts file and the PID of the CC elementary stream. Please see below regarding how to identify a Closed Caption PID in a .ts file using the tsinfo tool. + +An example screenshot of a resultant subtitle follows (from a news broadcast): +![example of translated ccs](img/news.png "Example of auto translated Closed Captions.") + +Currently, text position and color are not carried through the translation process. + +Because this tool uses the Bing Translate API, the user must get their own "Client ID" and "Client scret" credentials from the windows Azue Marketplace. These need be defined in the arib.secret_key module. + +To find the PES ID of the closed captions stream within any TS (if it exists!) see the section below. + +The translation results are not good. In fact, they are often lewd and comical. Still, this is an interesting experiment. To illustrate the defficiencies of the approach, I present the following screenshot, translating the shot from the previous section. You'll notice that despite the simplicity of the original source, the translation is off. It does give a "general sense" of meaning, however. +![example of auto translation](img/haikyu_eng.png "Example poor auto translation.") diff --git a/img/ace-of-diamond.jpg b/img/ace-of-diamond.jpg new file mode 100644 index 0000000..ee332ae Binary files /dev/null and b/img/ace-of-diamond.jpg differ diff --git a/img/gaki-2.jpg b/img/gaki-2.jpg new file mode 100644 index 0000000..f4498d7 Binary files /dev/null and b/img/gaki-2.jpg differ diff --git a/img/gaki.png b/img/gaki.png deleted file mode 100644 index cd7a961..0000000 Binary files a/img/gaki.png and /dev/null differ diff --git a/img/gaki2.png b/img/gaki2.png deleted file mode 100644 index 84dc4cc..0000000 Binary files a/img/gaki2.png and /dev/null differ diff --git a/img/knights-of-sidonia-2.jpg b/img/knights-of-sidonia-2.jpg new file mode 100644 index 0000000..9506334 Binary files /dev/null and b/img/knights-of-sidonia-2.jpg differ diff --git a/img/knights-of-sidonia.jpg b/img/knights-of-sidonia.jpg new file mode 100644 index 0000000..cec070a Binary files /dev/null and b/img/knights-of-sidonia.jpg differ diff --git a/pyproject.toml b/pyproject.toml index b8354b8..9bc6fa7 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -5,7 +5,7 @@ build-backend = "setuptools.build_meta" [project] name = "arib" -version = "0.7.3" +version = "0.7.4" description = "ARIB MPEG-2 TS Closed Caption Decoding Tools" readme = { file = "README.md", content-type = "text/markdown" } requires-python = ">=3.10" @@ -34,6 +34,7 @@ Issues = "https://github.com/johnoneil/arib/issues" [project.entry-points."console_scripts"] arib-ts2ass = "arib.ts2ass:main" +arib-ts2srt = "arib.ts2srt:main" arib-ts-extract = "arib.ts_extract:main" arib-es-extract = "arib.es_extract:main"