1
1
py-caption
2
2
==========
3
3
4
- ` py-caption ` is a caption reading/writing module. Use one of the given Readers to read content into an intermediary format known as PCC (PBS Common Captions), and then use one of the Writers to output the PCC into captions of your desired format.
4
+ |Build Status |
5
+
6
+ ``pycaption `` is a caption reading/writing module. Use one of the given
7
+ Readers to read content into a CaptionSet object,
8
+ and then use one of the Writers to output the CaptionSet into
9
+ captions of your desired format.
5
10
6
11
Turn a caption into multiple caption outputs:
7
12
13
+ ::
14
+
8
15
srt_caps = '''1
9
16
00:00:09,209 --> 00:00:12,312
10
17
This is an example SRT file,
11
18
which, while extremely short,
12
19
is still a valid SRT file.
13
20
'''
14
-
21
+
15
22
converter = CaptionConverter()
16
23
converter.read(srt_caps, SRTReader())
17
24
print converter.write(SAMIWriter())
18
25
print converter.write(DFXPWriter())
19
26
print converter.write(TranscriptWriter())
20
-
27
+
21
28
Not sure what format the caption is in? Detect it:
22
29
30
+ ::
31
+
23
32
caps = '''1
24
33
00:00:01,500 --> 00:00:12,345
25
34
Small caption'''
@@ -34,30 +43,26 @@ Not sure what format the caption is in? Detect it:
34
43
Supported Formats
35
44
-----------------
36
45
37
- Read:
38
- - SCC
39
- - SAMI
40
- - SRT
41
- - DFXP
46
+ Read: - DFXP/TTML - SAMI - SCC - SRT - WebVTT
42
47
43
- Write:
44
- - DFXP
45
- - SAMI
46
- - SRT
47
- - Transcript
48
+ Write: - DFXP/TTML - SAMI - SRT - Transcript - WebVTT
48
49
49
- See the [ examples folder] [ 1 ] for example captions that currently can be read correctly.
50
+ See the `examples
51
+ folder <https://github.com/pbs/pycaption/tree/master/examples/> `__ for
52
+ example captions that currently can be read correctly.
50
53
51
54
Python Usage
52
- ------------
55
+ ------------
53
56
54
57
Example: Convert from SAMI to DFXP
55
58
59
+ ::
60
+
56
61
from pycaption import SAMIReader, DFXPWriter
57
62
58
63
sami = '''<SAMI><HEAD><TITLE>NOVA3213</TITLE><STYLE TYPE="text/css">
59
64
<!--
60
- P { margin-left: 1pt;
65
+ P { margin-left: 1pt;
61
66
margin-right: 1pt;
62
67
margin-bottom: 2pt;
63
68
margin-top: 2pt;
@@ -67,10 +72,10 @@ Example: Convert from SAMI to DFXP
67
72
font-weight: normal;
68
73
font-style: normal;
69
74
color: #ffffff; }
70
-
75
+
71
76
.ENCC {Name: English; lang: en-US; SAMI_Type: CC;}
72
77
.FRCC {Name: French; lang: fr-cc; SAMI_Type: CC;}
73
-
78
+
74
79
--></STYLE></HEAD><BODY>
75
80
<SYNC start="9209"><P class="ENCC">
76
81
( clock ticking )
@@ -85,12 +90,13 @@ Example: Convert from SAMI to DFXP
85
90
</P><P class="FRCC">
86
91
FRENCH LINE 2?
87
92
</P></SYNC>'''
88
-
89
- print DFXPWriter().write(SAMIReader().read(sami))
90
93
94
+ print DFXPWriter().write(SAMIReader().read(sami))
91
95
92
96
Which will output the following:
93
97
98
+ ::
99
+
94
100
<?xml version="1.0" encoding="utf-8"?>
95
101
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling">
96
102
<head>
@@ -120,164 +126,104 @@ Which will output the following:
120
126
</body>
121
127
</tt>
122
128
129
+ Extensibility
130
+ -------------
123
131
124
- Scalability
125
- -----------
126
-
127
- Different readers and writers are easy to add if you would like to:
128
- - Read/Write a previously unsupported format
129
- - Read/Write a supported format in a different way (more styling?)
130
-
131
- Simply follow the format of a current Reader or Writer, and edit to your heart's desire.
132
-
133
-
134
- PyCaps Format:
135
- ------------------
136
-
137
- The different Readers will return the captions in PBS Common Captions (PCC) format.
138
- The Writers will be expecting captions in PCC format as well.
139
-
140
- PCC format:
141
-
142
- {
143
- "captions": {
144
- lang: list of captions
145
- }
146
- "styles":{
147
- style: styling
148
- }
149
- }
150
-
151
- Example PCC json:
152
-
153
- {
154
- "captions": {
155
- "en": [
156
- [
157
- 9209000,
158
- 12312000,
159
- [
160
- {"type": "text", "content": "Line 1"},
161
- {"type": "break"},
162
- {"type": "style", "start": True, "content": {"italics": True}},
163
- {"type": "text", "content": "Line 2"},
164
- {"type": "style", "start": False, "content": {"italics": True}}
165
- ],
166
- {
167
- "class": "encc",
168
- "text-align": "right"
169
- }
170
- ],
171
- [
172
- 14556000,
173
- 18993000,
174
- [
175
- {"type": "text", "content": "Line 3, all by itself"}
176
- ],
177
- {
178
- "class": "encc",
179
- "italics": True
180
- }
181
- ]
182
- ]
183
- },
184
- "styles": {
185
- "encc": {
186
- "lang": "en-US"
187
- },
188
- "p": {
189
- "color": "#fff",
190
- "font-size": "10pt",
191
- "font-family": "Arial",
192
- "text-align": "center"
193
- }
194
- }
195
- }
196
-
197
-
198
- SAMI Reader / Writer :: [ spec] [ 2 ]
199
- --------------------
200
-
201
- Microsoft Synchronized Accessible Media Interchange. Supports multiple languages.
202
-
203
- Supported Styling:
204
- - text-align
205
- - italics
206
- - font-size
207
- - font-family
208
- - color
209
-
210
- If the SAMI file is not valid XML (e.g. unclosed tags), will still attempt to read it.
211
-
212
-
213
- DFXP Reader / Writer :: [ spec] [ 3 ]
214
- --------------------
132
+ Different readers and writers are easy to add if you would like to: -
133
+ Read/Write a previously unsupported format - Read/Write a supported
134
+ format in a different way (more styling?)
215
135
216
- The W3 standard. Supports multiple languages.
136
+ Simply follow the format of a current Reader or Writer, and edit to your
137
+ heart's desire.
138
+
139
+ SAMI Reader / Writer :: `spec <http://msdn.microsoft.com/en-us/library/ms971327.aspx >`__
140
+ ----------------------------------------------------------------------------------------
141
+
142
+ Microsoft Synchronized Accessible Media Interchange. Supports multiple
143
+ languages.
217
144
218
- Supported Styling:
219
- - text-align
220
- - italics
221
- - font-size
222
- - font-family
223
- - color
145
+ Supported Styling: - text-align - italics - font-size - font-family -
146
+ color
224
147
148
+ If the SAMI file is not valid XML (e.g. unclosed tags), will still
149
+ attempt to read it.
225
150
226
- SRT Reader / Writer :: [ spec] [ 4 ]
227
- -------------------
151
+ DFXP/TTML Reader / Writer :: ` spec < http://www.w3.org/TR/ttaf1-dfxp/ >`__
152
+ -------------------------------------------------------------------
228
153
229
- SubRip captions. If given multiple languages to write, will output all joined together by a 'MULTI-LANGUAGE SRT' line.
154
+ The W3 standard. Supports multiple languages.
155
+
156
+ Supported Styling: - text-align - italics - font-size - font-family -
157
+ color
158
+
159
+ SRT Reader / Writer :: `spec <http://matroska.org/technical/specs/subtitles/srt.html >`__
160
+ ----------------------------------------------------------------------------------------
161
+
162
+ SubRip captions. If given multiple languages to write, will output all
163
+ joined together by a 'MULTI-LANGUAGE SRT' line.
230
164
231
- Supported Styling:
232
- - None
165
+ Supported Styling: - None
233
166
234
167
Assumes input language is english. To change:
235
168
236
- pycaps = SRTReader().read(srt_content, lang='fr')
169
+ ::
237
170
171
+ pycaps = SRTReader().read(srt_content, lang='fr')
238
172
239
- SCC Reader :: [ spec] [ 5 ]
240
- ----------
173
+ SCC Reader :: ` spec < http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML >`__
174
+ -----------------------------------------------------------------------------------------------
241
175
242
176
Scenarist Closed Caption format. Assumes Channel 1 input.
243
177
244
- Supported Styling:
245
- - italics
178
+ Supported Styling: - italics
179
+
180
+ By default, the SCC Reader does not simulate roll-up captions. To enable
181
+ roll-ups:
246
182
247
- By default, the SCC Reader does not simulate roll-up captions. To enable roll-ups :
183
+ : :
248
184
249
185
pycaps = SCCReader().read(scc_content, simulate_roll_up=True)
250
186
251
187
Also, assumes input language is english. To change:
252
188
189
+ ::
190
+
253
191
pycaps = SCCReader().read(scc_content, lang='fr')
254
192
255
- Now has the option of specifying an offset (measured in seconds) for the timestamp. For example, if the SCC file is 45 seconds ahead of the video:
193
+ Now has the option of specifying an offset (measured in seconds) for the
194
+ timestamp. For example, if the SCC file is 45 seconds ahead of the
195
+ video:
256
196
257
- pycaps = SCCReader().read(scc_content, offset=45)
197
+ ::
258
198
259
- The SCC Reader handles both dropframe and non-dropframe captions, and will auto-detect which format the captions are in.
199
+ pycaps = SCCReader().read(scc_content, offset=45)
260
200
201
+ The SCC Reader handles both dropframe and non-dropframe captions, and
202
+ will auto-detect which format the captions are in.
261
203
262
204
Transcript Writer
263
205
-----------------
264
206
265
207
Text stripped of styling, arranged in sentences.
266
208
267
- Supported Styling:
268
- - None
209
+ Supported Styling: - None
210
+
211
+ The transcript writer uses natural sentence boundary detection
212
+ algorithms to create the transcript.
213
+
214
+ WebVTT Reader / Writer `spec <http://dev.w3.org/html5/webvtt/ >`__
215
+ -----------------------------------------------------------------
216
+
217
+ Web Video Text Tracks format.
218
+
219
+ Supported Styling - None (yet)
269
220
270
- The transcript writer uses natural sentence boundary detection algorithms to create the transcript.
271
-
272
221
273
222
License
274
223
-------
275
224
276
- This module is Copyright 2012 PBS.org and is available under the [ Apache License, Version 2.0] [ 6 ] .
225
+ This module is Copyright 2012 PBS.org and is available under the `Apache
226
+ License, Version 2.0 <http://www.apache.org/licenses/LICENSE-2.0> `__.
277
227
278
- [ 1 ] : https://github.com/pbs/pycaption/tree/master/examples/
279
- [ 2 ] : http://msdn.microsoft.com/en-us/library/ms971327.aspx
280
- [ 3 ] : http://www.w3.org/TR/ttaf1-dfxp/
281
- [ 4 ] : http://matroska.org/technical/specs/subtitles/srt.html
282
- [ 5 ] : http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML
283
- [ 6 ] : http://www.apache.org/licenses/LICENSE-2.0
228
+ .. |Build Status | image :: https://travis-ci.org/pbs/pycaption.png?branch=master
229
+ :target: https://travis-ci.org/pbs/pycaption
0 commit comments