Skip to content

Some improving performance suggestions #61

@MeteorsLiu

Description

@MeteorsLiu

Due to certain reasons, I needed to use the pyTranscriber project to generate subtitles for some of my videos. However, after using pyTranscriber, the experience was not great. The main issues are:

  1. Excessive memory usage, often exceeding 2GB.
  2. Unstable and prone to crashing (the most frustrating part—running for an hour only to find it crashed, and all the conversion progress is lost, shit. )
  3. Too slow (often taking 2-3 hours per video).

Therefore, I created this goTranscriber. This is not meant to replace pyTranscriber but was initially just for my own use. After that development, I have a few suggestions regarding performance:

For speed:

  1. I used Go because I like programming with Go, but for pyTranscriber, instead of using its multi-process module, opting for asynchronous processing should significantly improve both stability and speed.
  2. Avoid using FLAC and choose the PCM S16LE format at 16kHz. Google's Speech-to-Text API also supports PCM format at 16kHz. PCM extraction is much faster (significantly faster) than FLAC, and I tested that FLAC doesn't improve much anyway.
  3. Avoid loading the entire audio file into memory at once. Instead, adopt a lazy loading approach—only load data when submitting it to the API or during the recognition process. This can significantly reduce memory usage.

By replacing FLAC with PCM S16LE format, processing a video (2 hours long) typically takes only 15-30 minutes.

For the accuracy of voice area:
Currently, pyTranscriber uses a simple RMS calculation method to determine sound intensity and identify speech regions.

However, during the development, i found that speech recognition is actually quite complex.

A better approach is to use WebRTC VAD. Although I haven't directly compared the differences between these two methods, WebRTC VAD considers more comprehensive factors and could theoretically improve accuracy to some extent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions