Swift is a fast AI voice assistant.
- Groq is used for fast inference of OpenAI Whisper (for transcription) and Meta Llama 3 (for generating the text response).
- Cartesia's Sonic voice model is used for fast speech synthesis, which is streamed to the frontend.
- VAD is used to detect when the user is talking, and run callbacks on speech segments.
- The app is a Next.js project written in TypeScript and deployed to Vercel.
Thank you to the teams at Groq and Cartesia for providing access to their APIs for this demo!
- Clone the repository
- Copy
.env.exampleto.env.localand fill in the environment variables. - Run
pnpm installto install dependencies. - Run
pnpm devto start the development server.