Have you ever wanted David Attenborough's voice to describe the contents of a webcam photo in pseudo-real time over a secure websocket? Why not indeed!
Live demo up at https://sirdavid.rickt.dev
- takes a snapshot of the browser's webcam
- uploads it over secure websocket to a mini backend
- sets up a secure websocket listener
- looks for a .png sent over the websocket
- if an image arrives, saves it locally and in a GCP bucket
- decodes the image, has it described by OpenAI's
gpt-4o
model in a snarky David Attenborough manner - generates audio file using a custom ElevenLabs David Attenborough voice i created
- sends the URL of the audio file to the browser
- plays the audio in the browser
You'll need to set:
ELEVENLABS_API_KEY
api key for ElevenLabsOPENAI_API_KEY
api key for OpenAISIRDAVID_APIGW
URI pointing to (in this case) my Cloudflare AI GatewaySIRDAVID_BUCKET
name of GCP bucket to store images, text analyses & audio filesSIRDAVID_PORT
port for the websocket listenerSIRDAVID_SERVICEACCOUNT_JSON
path to service account JSON file for auth to GCP bucketSIRDAVID_SSL_CERT
path to the SSL certificate for the secure websocketSIRDAVID_SSL_KEY
path to the SSL key for the secure websocket certificateSIRDAVID_VOICE
string containing the voice ID of the ElevenLabs voice
- You will have to make your own ElevenLabs David Attenborough voice, as I can't share mine