This project is an AI model capable of understanding and responding to voice and text inputs related to specific business logic operations. The AI model interacts using HTTP protocol, receiving voice data as arrays of bytes and returning voice responses as arrays of bytes. Additionally, it handles text inputs and outputs in JSON format.
- Overview
- Functional Requirements
- Endpoints
- Setup and Installation
- Running the Application
- Docker Setup
- Testing the Application
- Tasks In Progress
- Next Steps
- Project Structure
- Contributing
- License
Developed to handle business logic operations, this AI model supports voice and text interactions. It converts received voice data to text, understands the user's intent, processes the business logic, and generates an appropriate response.
- Input: Voice data received as an array of bytes.
- Output: Voice response as an array of bytes.
- Process: Convert the received voice data to text, understand the user's intent, and generate an appropriate voice response.
- Input: Text data received as JSON.
- Output: Text response in JSON format.
- Process: Understand the text input, process the business logic, and generate a relevant text response.
- GET
/
: Welcome message. - GET
/health
: Health check endpoint. - POST
/api/voice-input
: Accepts voice data and returns a voice response. - POST
/api/text-input
: Accepts text data and returns a text response.
- Python 3.8 or above
- Docker (optional, for containerization)
-
Clone the repository:
git clone https://github.com/elcaiseri/ai-voice-text-interaction-api.git cd ai-voice-text-interaction-api
-
Create and activate a virtual environment:
python3 -m venv env source env/bin/activate
-
Install the dependencies:
pip install -r requirements.txt
-
Run the FastAPI server:
uvicorn app.main:app --reload
-
Access the application at
http://localhost:8000
.
-
Build the Docker image:
docker build -t my-fastapi-app .
-
Run the Docker container:
docker run -d --name fastapi-container -p 8000:8000 my-fastapi-app
Use the following curl
command to test the /api/voice-input
endpoint (ensure you have a sample.wav
file):
curl -X POST "http://localhost:8000/api/voice-input" \
-H "Content-Type: multipart/form-data" \
-F "file_upload=@sample.wav;type=audio/wav" --output response.wav
Use the following curl
command to test the /api/text-input
endpoint:
curl -X POST "http://localhost:8000/api/text-input" -H "Content-Type: application/json" -d '{"input": "How many items do I have in location X?"}'
-
Dynamic Data Handling
- Working on dynamic data responses for queries like item quantities.
-
Advanced NLP Models
- Exploring advanced NLP models for better query handling.
- Complete dynamic data handling.
- Refine NLP models.
- Further testing and debugging.
- Prepare for final deployment.
ai_model
│
├── app
│ ├── main.py
│ ├── nlp.py
│ ├── voice.py
│ ├── business_logic.py
│ ├── responses.py
│ └── models
│ └── user_input.py
│
├── tests
│ ├── test_main.py
│ └── test_business_logic.py
│
├── requirements.txt
├── Dockerfile
└── .dockerignore
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.