Skip to content

How to use AI to blog from Audio

AJ!! edited this page Apr 8, 2024 · 7 revisions

Alwrity, AI writer transcribes/translates audio files into unique blogs. One can provide the following inputs and convert them into blogs:

  • Youtube video URL links - Simple copy and paste the youtube video link and Alwrity will convert it into blog.

  • File path to local audio files - If you have audio files locally saved on your desktop or laptop, simply pass the file location to convert it into text blog content.

  • Alwrity handles videos longer than the context limit of the Whisper API by dividing the video into 10-minute segments, transcribing each segment individually, and then combining the results.

  • TBD: Include other sources such google drive, vimeo, etc

Click to Know More about model details, languages supported, features Getting Started with Alwrity

Alwrity steps to generate blog from audio: 1). Download the audio file, if URL is given. Read the audio file, if filepath is given. 2). Check the size of audio and chunk it, if it exceeds 26mb, openai restrictions. 3). Pass the transcript to LLM to convert into blog. 4). Do a google search for the youtube title. 5). Enhance the blog from step 3 with latest google results. 6). Output the final blog with blog metadata.

Alwrity supports following Speech to Text(STT) models, for transcribing audio files into text:

1).Openai Whisper Model Required: Openai API key

The Audio API provides two speech-to-text endpoints: transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. They can be used to:

  • Transcribe audio into whatever language the audio is in.
  • Translate and transcribe the audio into English.
  • Whisper Supported languages
  • Alwrity supports Longer Audio file
  • Following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

Note: Local install whisper is not a priority, as its takes up a lot of h/w resources and slow. If there is demand for alwrity to support, thy will be done(then) : https://github.com/openai/whisper

2). Assembly AI WIP

Built by AI experts, AssemblyAI’s Speech AI models include accurate speech-to-text for voice data (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, chapter detection, PII redaction, and more.

  • Their pricing plan is also generous. Assembly AI is purpose build for STT.
  • Assembly AI model Universal-1, is their most powerful and accurate Speech AI model yet, trained on 12.5M hours of multilingual audio data
  • Free - Get started at no cost. Using their API, Transcribe up to 100 hours of audio.
  • Worth signing up and using their API keys with alwrity for Audio to blog generation.