Skip to content

Simple tools for dealing with transcripts generated by AWS Transcribe

Notifications You must be signed in to change notification settings

efstone/aws_transcribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

aws_transcribe

Simple tools for dealing with transcripts generated by AWS Transcribe

AwsTranscribe class

This class is specifically for handling multi-speaker json files generated by the Amazon Web Services (AWS) "Transcribe" service. This will not work with single-speaker files.

Simply instantiate the class with a json file object and the names of the speakers, in order. (AWS lists them as 'spk_0', 'spk_1' etc.) AWS breaks up the file into segments, which roughly comport with a short burst of spoken words. After a pause, a new segment starts. With the resulting instance you can use the print_segment method to retrieve all the words from a specific segment, and the speaker who spoke those words. If you'd like to print a whole transcript, simply find the total number of segments with the count_segment method and loop the print_segment method as many times as necessary.

"Line" numbers can be included by setting "show_seg_num" to True before using the print_segment method.

There's also a method to retrieve the start_time of a segment (in seconds). That method is get_segment_start

Here's an example:

f = open('/Users/efstone/Downloads/depo_audio_6052.json', 'r')
transcript = AwsTranscript(f, 'Deponent', 'Lawyer', 'unknown1', 'unknown2')
for i in range(8):
    print(transcript.print_segment(i))

Which will generate the following:

Lawyer: And you also understand that you cannot consult with your attorneys before answering the question unless it regards a matter of privilege.
Lawyer: Just All
Lawyer: All right.
Lawyer: You talk a little bit about your personal history when
Deponent: when you say personal. Start went from bird dust on El Richard. Grow up.
Deponent: No, the city of New Orleans.
Lawyer: Okay.
Lawyer: And where'd you go to school

Amazon Transcribe isn't perfect, but it's SUPER cheap and this is DAMN handy when you've got to wait a few weeks for, say, a deposition transcript to be prepared, but need to start picking through the specifics of the deposition ASAP.

I hope someone finds this as useful as I did!

About

Simple tools for dealing with transcripts generated by AWS Transcribe

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages