Have you ever wanted David Attenborough's voice to describe the contents of a webcam photo in pseudo-real time over a secure websocket? Why not indeed!
Live demo up at https://sirdavid.rickt.dev
- takes a snapshot of the browser's webcam
- uploads it over secure websocket to a mini backend
- sets up a secure websocket listener
- looks for a .png sent over the websocket
- if an image arrives, saves it locally and in a GCP bucket
- decodes the image, has it described by OpenAI's
gpt-4omodel in a snarky David Attenborough manner - generates audio file using a custom ElevenLabs David Attenborough voice i created
- sends the URL of the audio file to the browser
- plays the audio in the browser
You'll need to set:
ELEVENLABS_API_KEYapi key for ElevenLabsOPENAI_API_KEYapi key for OpenAISIRDAVID_APIGWURI pointing to (in this case) my Cloudflare AI GatewaySIRDAVID_BUCKETname of GCP bucket to store images, text analyses & audio filesSIRDAVID_PORTport for the websocket listenerSIRDAVID_SERVICEACCOUNT_JSONpath to service account JSON file for auth to GCP bucketSIRDAVID_SSL_CERTpath to the SSL certificate for the secure websocketSIRDAVID_SSL_KEYpath to the SSL key for the secure websocket certificateSIRDAVID_VOICEstring containing the voice ID of the ElevenLabs voice
- You will have to make your own ElevenLabs David Attenborough voice, as I can't share mine