Skip to content

Commit 2f8bb0b

Browse files
committed
Merged in changes from PBS master, fixed tests, added WebVTTReader and .vtt example file, updated readme, squashed bugs
1 parent e7f3920 commit 2f8bb0b

23 files changed

+8401
-1092
lines changed

.travis.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
language: python
2+
3+
python:
4+
- 2.6
5+
- 2.7
6+
7+
script: python setup.py test
8+
9+
notifications:
10+
email: false

MANIFEST.in

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
1-
include pycaption/english.pickle
1+
include pycaption/english.pickle
2+
recursive-exclude tests *
3+
include README.rst

README.md renamed to README.rst

Lines changed: 89 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,34 @@
11
py-caption
22
==========
33

4-
`py-caption` is a caption reading/writing module. Use one of the given Readers to read content into an intermediary format known as PCC (PBS Common Captions), and then use one of the Writers to output the PCC into captions of your desired format.
4+
|Build Status|
5+
6+
``pycaption`` is a caption reading/writing module. Use one of the given
7+
Readers to read content into a CaptionSet object,
8+
and then use one of the Writers to output the CaptionSet into
9+
captions of your desired format.
510

611
Turn a caption into multiple caption outputs:
712

13+
::
14+
815
srt_caps = '''1
916
00:00:09,209 --> 00:00:12,312
1017
This is an example SRT file,
1118
which, while extremely short,
1219
is still a valid SRT file.
1320
'''
14-
21+
1522
converter = CaptionConverter()
1623
converter.read(srt_caps, SRTReader())
1724
print converter.write(SAMIWriter())
1825
print converter.write(DFXPWriter())
1926
print converter.write(TranscriptWriter())
20-
27+
2128
Not sure what format the caption is in? Detect it:
2229

30+
::
31+
2332
caps = '''1
2433
00:00:01,500 --> 00:00:12,345
2534
Small caption'''
@@ -34,30 +43,26 @@ Not sure what format the caption is in? Detect it:
3443
Supported Formats
3544
-----------------
3645

37-
Read:
38-
- SCC
39-
- SAMI
40-
- SRT
41-
- DFXP
46+
Read: - DFXP/TTML - SAMI - SCC - SRT - WebVTT
4247

43-
Write:
44-
- DFXP
45-
- SAMI
46-
- SRT
47-
- Transcript
48+
Write: - DFXP/TTML - SAMI - SRT - Transcript - WebVTT
4849

49-
See the [examples folder][1] for example captions that currently can be read correctly.
50+
See the `examples
51+
folder <https://github.com/pbs/pycaption/tree/master/examples/>`__ for
52+
example captions that currently can be read correctly.
5053

5154
Python Usage
52-
------------
55+
------------
5356

5457
Example: Convert from SAMI to DFXP
5558

59+
::
60+
5661
from pycaption import SAMIReader, DFXPWriter
5762

5863
sami = '''<SAMI><HEAD><TITLE>NOVA3213</TITLE><STYLE TYPE="text/css">
5964
<!--
60-
P { margin-left: 1pt;
65+
P { margin-left: 1pt;
6166
margin-right: 1pt;
6267
margin-bottom: 2pt;
6368
margin-top: 2pt;
@@ -67,10 +72,10 @@ Example: Convert from SAMI to DFXP
6772
font-weight: normal;
6873
font-style: normal;
6974
color: #ffffff; }
70-
75+
7176
.ENCC {Name: English; lang: en-US; SAMI_Type: CC;}
7277
.FRCC {Name: French; lang: fr-cc; SAMI_Type: CC;}
73-
78+
7479
--></STYLE></HEAD><BODY>
7580
<SYNC start="9209"><P class="ENCC">
7681
( clock ticking )
@@ -85,12 +90,13 @@ Example: Convert from SAMI to DFXP
8590
</P><P class="FRCC">
8691
FRENCH LINE 2?
8792
</P></SYNC>'''
88-
89-
print DFXPWriter().write(SAMIReader().read(sami))
9093

94+
print DFXPWriter().write(SAMIReader().read(sami))
9195

9296
Which will output the following:
9397

98+
::
99+
94100
<?xml version="1.0" encoding="utf-8"?>
95101
<tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:tts="http://www.w3.org/ns/ttml#styling">
96102
<head>
@@ -120,164 +126,104 @@ Which will output the following:
120126
</body>
121127
</tt>
122128

129+
Extensibility
130+
-------------
123131

124-
Scalability
125-
-----------
126-
127-
Different readers and writers are easy to add if you would like to:
128-
- Read/Write a previously unsupported format
129-
- Read/Write a supported format in a different way (more styling?)
130-
131-
Simply follow the format of a current Reader or Writer, and edit to your heart's desire.
132-
133-
134-
PyCaps Format:
135-
------------------
136-
137-
The different Readers will return the captions in PBS Common Captions (PCC) format.
138-
The Writers will be expecting captions in PCC format as well.
139-
140-
PCC format:
141-
142-
{
143-
"captions": {
144-
lang: list of captions
145-
}
146-
"styles":{
147-
style: styling
148-
}
149-
}
150-
151-
Example PCC json:
152-
153-
{
154-
"captions": {
155-
"en": [
156-
[
157-
9209000,
158-
12312000,
159-
[
160-
{"type": "text", "content": "Line 1"},
161-
{"type": "break"},
162-
{"type": "style", "start": True, "content": {"italics": True}},
163-
{"type": "text", "content": "Line 2"},
164-
{"type": "style", "start": False, "content": {"italics": True}}
165-
],
166-
{
167-
"class": "encc",
168-
"text-align": "right"
169-
}
170-
],
171-
[
172-
14556000,
173-
18993000,
174-
[
175-
{"type": "text", "content": "Line 3, all by itself"}
176-
],
177-
{
178-
"class": "encc",
179-
"italics": True
180-
}
181-
]
182-
]
183-
},
184-
"styles": {
185-
"encc": {
186-
"lang": "en-US"
187-
},
188-
"p": {
189-
"color": "#fff",
190-
"font-size": "10pt",
191-
"font-family": "Arial",
192-
"text-align": "center"
193-
}
194-
}
195-
}
196-
197-
198-
SAMI Reader / Writer :: [spec][2]
199-
--------------------
200-
201-
Microsoft Synchronized Accessible Media Interchange. Supports multiple languages.
202-
203-
Supported Styling:
204-
- text-align
205-
- italics
206-
- font-size
207-
- font-family
208-
- color
209-
210-
If the SAMI file is not valid XML (e.g. unclosed tags), will still attempt to read it.
211-
212-
213-
DFXP Reader / Writer :: [spec][3]
214-
--------------------
132+
Different readers and writers are easy to add if you would like to: -
133+
Read/Write a previously unsupported format - Read/Write a supported
134+
format in a different way (more styling?)
215135

216-
The W3 standard. Supports multiple languages.
136+
Simply follow the format of a current Reader or Writer, and edit to your
137+
heart's desire.
138+
139+
SAMI Reader / Writer :: `spec <http://msdn.microsoft.com/en-us/library/ms971327.aspx>`__
140+
----------------------------------------------------------------------------------------
141+
142+
Microsoft Synchronized Accessible Media Interchange. Supports multiple
143+
languages.
217144

218-
Supported Styling:
219-
- text-align
220-
- italics
221-
- font-size
222-
- font-family
223-
- color
145+
Supported Styling: - text-align - italics - font-size - font-family -
146+
color
224147

148+
If the SAMI file is not valid XML (e.g. unclosed tags), will still
149+
attempt to read it.
225150

226-
SRT Reader / Writer :: [spec][4]
227-
-------------------
151+
DFXP/TTML Reader / Writer :: `spec <http://www.w3.org/TR/ttaf1-dfxp/>`__
152+
-------------------------------------------------------------------
228153

229-
SubRip captions. If given multiple languages to write, will output all joined together by a 'MULTI-LANGUAGE SRT' line.
154+
The W3 standard. Supports multiple languages.
155+
156+
Supported Styling: - text-align - italics - font-size - font-family -
157+
color
158+
159+
SRT Reader / Writer :: `spec <http://matroska.org/technical/specs/subtitles/srt.html>`__
160+
----------------------------------------------------------------------------------------
161+
162+
SubRip captions. If given multiple languages to write, will output all
163+
joined together by a 'MULTI-LANGUAGE SRT' line.
230164

231-
Supported Styling:
232-
- None
165+
Supported Styling: - None
233166

234167
Assumes input language is english. To change:
235168

236-
pycaps = SRTReader().read(srt_content, lang='fr')
169+
::
237170

171+
pycaps = SRTReader().read(srt_content, lang='fr')
238172

239-
SCC Reader :: [spec][5]
240-
----------
173+
SCC Reader :: `spec <http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML>`__
174+
-----------------------------------------------------------------------------------------------
241175

242176
Scenarist Closed Caption format. Assumes Channel 1 input.
243177

244-
Supported Styling:
245-
- italics
178+
Supported Styling: - italics
179+
180+
By default, the SCC Reader does not simulate roll-up captions. To enable
181+
roll-ups:
246182

247-
By default, the SCC Reader does not simulate roll-up captions. To enable roll-ups:
183+
::
248184

249185
pycaps = SCCReader().read(scc_content, simulate_roll_up=True)
250186

251187
Also, assumes input language is english. To change:
252188

189+
::
190+
253191
pycaps = SCCReader().read(scc_content, lang='fr')
254192

255-
Now has the option of specifying an offset (measured in seconds) for the timestamp. For example, if the SCC file is 45 seconds ahead of the video:
193+
Now has the option of specifying an offset (measured in seconds) for the
194+
timestamp. For example, if the SCC file is 45 seconds ahead of the
195+
video:
256196

257-
pycaps = SCCReader().read(scc_content, offset=45)
197+
::
258198

259-
The SCC Reader handles both dropframe and non-dropframe captions, and will auto-detect which format the captions are in.
199+
pycaps = SCCReader().read(scc_content, offset=45)
260200

201+
The SCC Reader handles both dropframe and non-dropframe captions, and
202+
will auto-detect which format the captions are in.
261203

262204
Transcript Writer
263205
-----------------
264206

265207
Text stripped of styling, arranged in sentences.
266208

267-
Supported Styling:
268-
- None
209+
Supported Styling: - None
210+
211+
The transcript writer uses natural sentence boundary detection
212+
algorithms to create the transcript.
213+
214+
WebVTT Reader / Writer `spec <http://dev.w3.org/html5/webvtt/>`__
215+
-----------------------------------------------------------------
216+
217+
Web Video Text Tracks format.
218+
219+
Supported Styling - None (yet)
269220

270-
The transcript writer uses natural sentence boundary detection algorithms to create the transcript.
271-
272221

273222
License
274223
-------
275224

276-
This module is Copyright 2012 PBS.org and is available under the [Apache License, Version 2.0][6].
225+
This module is Copyright 2012 PBS.org and is available under the `Apache
226+
License, Version 2.0 <http://www.apache.org/licenses/LICENSE-2.0>`__.
277227

278-
[1]: https://github.com/pbs/pycaption/tree/master/examples/
279-
[2]: http://msdn.microsoft.com/en-us/library/ms971327.aspx
280-
[3]: http://www.w3.org/TR/ttaf1-dfxp/
281-
[4]: http://matroska.org/technical/specs/subtitles/srt.html
282-
[5]: http://www.theneitherworld.com/mcpoodle/SCC_TOOLS/DOCS/SCC_FORMAT.HTML
283-
[6]: http://www.apache.org/licenses/LICENSE-2.0
228+
.. |Build Status| image:: https://travis-ci.org/pbs/pycaption.png?branch=master
229+
:target: https://travis-ci.org/pbs/pycaption

REQUIREMENTS

Lines changed: 0 additions & 5 deletions
This file was deleted.

0 commit comments

Comments
 (0)