This project uses an ESP32 microcontroller to convert AI-generated text into speech. It utilizes the Baidu TTS API for text-to-speech synthesis and plays the generated audio using an I2S-based audio player. The text-to-speech conversion is performed by sending requests to the ZhipuAI API for generating AI responses.
- Connects to a WiFi network using the ESP32.
- Sends a question to the ZhipuAI API and retrieves a response.
- Converts the AI-generated text into speech using the Baidu TTS API.
- Plays the audio using an I2S audio interface with the MAX98357A DAC.
- Handles audio in small chunks for memory efficiency.
-
Hardware:
- ESP32 development board.
- MAX98357A I2S audio DAC or any compatible I2S-based audio output.
- Active WiFi connection.
-
Software:
- MicroPython installed on the ESP32.
- Python packages:
urequests
,ujson
,gc
,network
, andmachine
.
-
APIs:
- ZhipuAI (for AI text generation).
- Baidu TTS (for text-to-speech conversion).
-
Wi-Fi Configuration:
- Replace
'xxx'
in theSSID
andPASSWORD
constants with your actual WiFi credentials.
- Replace
-
API Keys:
- Replace
'xxx'
in theAPI_KEY
variable with your ZhipuAI API key. - Replace
'xxx'
in theapi_key
andsecret_key
fields inside themain()
function with your Baidu TTS API credentials.
- Replace
-
I2S Audio Configuration:
- The code is configured to use an I2S connection to a MAX98357A DAC. You can modify the
SCK_PIN
,WS_PIN
, andSD_PIN
variables to suit your hardware. - The audio sample rate is set to
8000Hz
(you can change it to 16000Hz or 24000Hz depending on your needs).
- The code is configured to use an I2S connection to a MAX98357A DAC. You can modify the
- Flash the MicroPython firmware onto the ESP32 board if you haven't already.
- Upload this Python script to the ESP32 using any appropriate method, such as Thonny or WebREPL.
- Connect the ESP32 to a serial terminal or a REPL interface.
- Run the script. It will first connect to the WiFi and prompt you to enter a question for the AI.
- The ESP32 will:
- Send the question to the ZhipuAI API and receive an AI-generated response.
- Convert the response text to speech using the Baidu TTS API.
- Play the generated audio in chunks through the connected I2S DAC.