Some improving performance suggestions

Due to certain reasons, I needed to use the pyTranscriber project to generate subtitles for some of my videos. However, after using pyTranscriber, the experience was not great. The main issues are: 
1. Excessive memory usage, often exceeding 2GB. 
2. Unstable and prone to crashing (the most frustrating part—running for an hour only to find it crashed, and all the conversion progress is lost, shit. )
3. Too slow (often taking 2-3 hours per video).

Therefore, I created this [goTranscriber](https://github.com/MeteorsLiu/goTranscriber). This is not meant to replace pyTranscriber but was initially just for my own use. After that development, I have a few suggestions regarding performance:

**For speed:** 
1. I used Go because I like programming with Go, but for pyTranscriber, instead of using its multi-process module, opting for asynchronous processing should significantly improve both stability and speed.
2. Avoid using FLAC and choose the PCM S16LE format at 16kHz. Google's Speech-to-Text API also supports PCM format at 16kHz. PCM extraction is much faster (significantly faster) than FLAC, and I tested that FLAC doesn't improve much anyway.
3. Avoid loading the entire audio file into memory at once. Instead, adopt a lazy loading approach—only load data when submitting it to the API or during the recognition process. This can significantly reduce memory usage.

By replacing FLAC with PCM S16LE format, processing a video (2 hours long) typically takes only 15-30 minutes.

**For the accuracy of voice area:**
Currently, pyTranscriber uses a simple RMS calculation method to determine sound intensity and identify speech regions. 

However, during the development, i found that speech recognition is actually quite complex. 

A better approach is to use WebRTC VAD. Although I haven't directly compared the differences between these two methods, WebRTC VAD considers more comprehensive factors and could theoretically improve accuracy to some extent.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some improving performance suggestions #61

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Some improving performance suggestions #61

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions