Skip to content

This library is a wrapper of MSEdge Read aloud function API. You can use it to synthesize text to speech with many voices MS provided.

License

Notifications You must be signed in to change notification settings

hs-CN/msedge-tts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

This library is a wrapper of MSEdge Read aloud function API. You can use it to synthesize text to speech with many voices MS provided.

How to use

  1. You need get a SpeechConfig to configure the voice of text to speech.
    You can convert Voice to SpeechConfig simply. Use get_voices_list function to get all available voices.
    Voice and SpeechConfig implemented serde::Serialize and serde::Deserialize.
    For example:
    use msedge_tts::voice::get_voices_list;
    use msedge_tts::tts::SpeechConfig;
    
    fn main() {
        let voices = get_voices_list().unwrap();
        let speechConfig = SpeechConfig::from(&voices[0]);
    }
    You can also create SpeechConfig by yourself. Make sure you know the right voice name and audio format.
  2. Create a TTS Client or Stream. Both of them have sync and async version. Example below step 3.
  3. Synthesize text to speech.

    Sync Client

    Call client function synthesize to synthesize text to speech. This function return Type SynthesizedAudio, you can get audio_bytes and audio_metadata.
    use msedge_tts::{tts::client::connect, tts::SpeechConfig, voice::get_voices_list};
    
    fn main() {
        let voices = get_voices_list().unwrap();
        for voice in &voices {
            if voice.name.contains("YunyangNeural") {
                let config = SpeechConfig::from(voice);
                let mut tts = connect().unwrap();
                let audio = tts
                    .synthesize("Hello, World! 你好,世界!", &config)
                    .unwrap();
                break;
            }
        }
    }

    Async Client

    Call client function synthesize to synthesize text to speech. This function return Type SynthesizedAudio, you can get audio_bytes and audio_metadata.
    use msedge_tts::{tts::client::connect_async, tts::SpeechConfig, voice::get_voices_list_async};
    
    fn main() {
        smol::block_on(async {
            let voices = get_voices_list_async().await.unwrap();
            for voice in &voices {
                if voice.name.contains("YunyangNeural") {
                    let config = SpeechConfig::from(voice);
                    let mut tts = connect_async().await.unwrap();
                    let audio = tts
                        .synthesize("Hello, World! 你好,世界!", &config)
                        .await
                        .unwrap();
                    break;
                }
            }
        });
    }

    Sync Stream

    Call Sender Stream function send to synthesize text to speech. Call Reader Stream function read to get data.
    read return Option<SynthesizedResponse>, the response may be AudioBytes or AudioMetadata or None. This is because the MSEdge Read aloud API returns multiple data segment and metadata and other information sequentially.
    Caution: One send corresponds to multiple read. Next send call will block until there no data to read. read will block before you call a send.
    use msedge_tts::{
        tts::stream::{msedge_tts_split, SynthesizedResponse},
        tts::SpeechConfig,
        voice::get_voices_list,
    };
    use std::{
        sync::{
            atomic::{AtomicBool, Ordering},
            Arc,
        },
        thread::spawn,
    };
    
    fn main() {
        let voices = get_voices_list().unwrap();
        for voice in &voices {
            if voice.name.contains("YunyangNeural") {
                let config = SpeechConfig::from(voice);
                let (mut sender, mut reader) = msedge_tts_split().unwrap();
    
                let signal = Arc::new(AtomicBool::new(false));
                let end = signal.clone();
                spawn(move || {
                    sender.send("Hello, World! 你好,世界!", &config).unwrap();
                    println!("synthesizing...1");
                    sender.send("Hello, World! 你好,世界!", &config).unwrap();
                    println!("synthesizing...2");
                    sender.send("Hello, World! 你好,世界!", &config).unwrap();
                    println!("synthesizing...3");
                    sender.send("Hello, World! 你好,世界!", &config).unwrap();
                    println!("synthesizing...4");
                    end.store(true, Ordering::Relaxed);
                });
    
                loop {
                    if signal.load(Ordering::Relaxed) && !reader.can_read() {
                        break;
                    }
                    let audio = reader.read().unwrap();
                    if let Some(audio) = audio {
                        match audio {
                            SynthesizedResponse::AudioBytes(_) => {
                                println!("read bytes")
                            }
                            SynthesizedResponse::AudioMetadata(_) => {
                                println!("read metadata")
                            }
                        }
                    } else {
                        println!("read None");
                    }
                }
            }
        }
    }

    Async Stream

    Call Sender Async function send to synthesize text to speech. Call Reader Async function readto get data. read return Option<SynthesizedResponse> as above. send and read block as above.
    use msedge_tts::{
        tts::{
            stream::{msedge_tts_split_async, SynthesizedResponse},
            SpeechConfig,
        },
        voice::get_voices_list_async,
    };
    use std::{
        sync::{
            atomic::{AtomicBool, Ordering},
            Arc,
        },
    };
    
    fn main() {
        smol::block_on(async {
            let voices = get_voices_list_async().await.unwrap();
            for voice in &voices {
                if voice.name.contains("YunyangNeural") {
                    let config = SpeechConfig::from(voice);
                    let (mut sender, mut reader) = msedge_tts_split_async().await.unwrap();
    
                    let signal = Arc::new(AtomicBool::new(false));
                    let end = signal.clone();
                    smol::spawn(async move {
                        sender
                            .send("Hello, World! 你好,世界!", &config)
                            .await
                            .unwrap();
                        println!("synthesizing...1");
                        sender
                            .send("Hello, World! 你好,世界!", &config)
                            .await
                            .unwrap();
                        println!("synthesizing...2");
                        sender
                            .send("Hello, World! 你好,世界!", &config)
                            .await
                            .unwrap();
                        println!("synthesizing...3");
                        sender
                            .send("Hello, World! 你好,世界!", &config)
                            .await
                            .unwrap();
                        println!("synthesizing...4");
                        end.store(true, Ordering::Relaxed);
                    })
                    .detach();
    
                    loop {
                        if signal.load(Ordering::Relaxed) && !reader.can_read().await {
                            break;
                        }
                        let audio = reader.read().await.unwrap();
                        if let Some(audio) = audio {
                            match audio {
                                SynthesizedResponse::AudioBytes(_) => {
                                    println!("read bytes")
                                }
                                SynthesizedResponse::AudioMetadata(_) => {
                                    println!("read metadata")
                                }
                            }
                        } else {
                            println!("read None");
                        }
                    }
                }
            }
        });
    }

see all examples.

About

This library is a wrapper of MSEdge Read aloud function API. You can use it to synthesize text to speech with many voices MS provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages