Project Title: Voice Genie

App Logo

Click on a logo to download the latest version of the app apk file:

Project Title: Voice Genie

Description: Voice Genie is a Flutter-based mobile application that acts as a voice-powered AI assistant. It is designed for quick, intuitive, and hands-free interactions. Leveraging the power of Gemini API and Imagine API, this single-screen app can respond to both text-based and art/image-based prompts, making it a unique blend of conversational AI and visual creativity. Through its streamlined interface, users can quickly send queries and receive spoken or visual responses, enhanced by intuitive speech-to-text and text-to-speech capabilities.

Features

Voice Interaction: Users can speak prompts or queries by tapping a button, and the app converts speech to text. This method provides 80–90% accurate responses while avoiding confidential information.
AI-Powered Responses: Text-based prompts are processed using Gemini API (Gemini 1.5 Flash model). Art displaying from prompts is handled through the Imagine API.
Image-Based Prompts: Users can send images as prompts. The AI analyzes the image and provides text-based responses with relevant information or insights about the image.
PDF Querying: Users can upload a single PDF file. The app extracts and processes the content, allowing users to query specific information from the PDF and receive text-based responses.
Speech-to-Text: The app converts spoken queries into text, allowing users to interact hands-free.
Text-to-Speech: Users can listen to AI responses, making information accessible and providing a conversational experience.
Visual Response Display: Responses are presented in rounded, animated containers for a modern and engaging UI. Includes error messages for issues like missing permissions or network errors.
Response Options: After receiving an AI-generated response, users can choose to: Ask another question, Listen to the AI response from start to finish, Clear previous interactions and reset the screen for new prompts.
New & History Prompts: From the new version user can not just send prompts from the single screen but can send prompts from the history prompt section.
Auto Prompt Title: whenever the user sends the new first prompt the app will automatically decide the prompt section title. additionally custom modify the title option available if a user does not like the auto title.

Installation

To run this project locally:

Clone the repository: git clone https://github.com/ArpitAswal/Voice-Genie.git
Navigate to the project directory: cd Voice-Genie
Install dependencies: flutter pub get
Set up API Keys (Optional, depending on external services used): Obtain an API key from Gemini AI Studio & Imagine API.

Create a new file named lib/config.dart in the project directory.

Add the following code, replacing 'YOUR_API_KEY' with your News API key:

class Config { static const String apiKey = 'YOUR_API_KEY'; }

Or directly used in API networking calls.
Run the app: flutter run

Note

Users must grant audio record permission for the app to function, as it uses, the voice speech functionality to record the user prompt.
ImagineAPI service is sometimes unavailable, and for more styling image responses try different style_id parameters, for more detail visit the ImagineAPI site.
Make sure you add your API keys to run this program on your system. If you download the directly apk file from here you can not use the service of Imagine API because my account token is finished.

Tech Stack

Flutter: The primary framework for building the mobile application.

Dart: The programming language used with Flutter.

Imagine API. for generating AI-driven image responses based on user prompts.

Gemini API: Gemini API for processing text-based queries and providing informative answers.

Challenges

Speech-To-Text: Ensuring it listens to all the user's words/sentences and performs well.
Text-To-Speech: Managing the response speech by single response or full messages responses.
Prompt History: Managing the new and old responses for a particular prompt section.
Handling Responses: Managing the UI state whenever the prompt request is successful or failed, data is fetching from API and handling the listener when it speaks all the messages of the prompt.

Future Enhancements

Image-Based Query Analysis: A new feature will allow users to upload images to the app. Gemini AI will then analyze the uploaded image, describing its contents to provide deeper insights or contextual explanations about the image.
Enhanced AI Art Capabilities: Future updates will improve the app’s art-based prompts with more creative or style-based responses to user queries.
Multi-File PDF Analysis: Enable users to upload multiple PDFs and query across all documents.

What's New

Image-Based Query Analysis: A new feature will allow users to upload images to the app. Gemini AI will then analyze the uploaded image, describing its contents to provide deeper insights or contextual explanations about the image.
Improve user experience by enhancing UI, adding animation and utilising boilerplate widgets.
All the previous issues resolved

Contributing

Contributions are always welcome!

Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and commit them (git commit -m 'Add new feature').
Push the changes to your fork (git push origin feature-branch).
Create a pull request.

Usage Flow

Starting the App: Voice Genie opens on a single main screen where users can immediately interact with AI by pressing the microphone button. Providing a Query: Users can speak their questions or prompts. The app detects the type of request:

Text-Based: The app processes queries with Gemini AI to provide textual answers.

Art/Image-Based: Imagine API is used to generate visual answers.
Displaying Results: The response, either in text or image form, appears in an animated container. Users can then: Ask a new question, Listen to the responses through text-to-speech, and Refresh the app to reset for new queries.

Screenrecorder-2024-11-09-14-43-28-776.mp4

Screenrecorder-2024-11-09-14-43-49-346.mp4

Screenrecorder-2024-11-09-14-49-30-711.mp4

Handling Errors: If permissions are missing or connectivity fails, the app displays clear, specific messages to guide users in troubleshooting.

Screenrecorder-2024-11-09-14-50-47-609.mp4

Screenrecorder-2024-11-09-14-50-18-22.0.mp4

Feedback

If you have any feedback, please reach out to me at arpitaswal995@gmail.com

If you face an issue, then open an issue in a GitHub repository.

Design Philosophy

Voice Genie is a sophisticated AI assistant designed to be an intuitive, accessible way for users to explore information and art with minimal effort. With future updates, it aims to become an even more interactive and personalized companion. Voice Genie continues to evolve into a versatile and intelligent assistant, offering seamless interaction across multiple data types while providing users with personalized and accurate responses.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
android		android
assets		assets
lib		lib
test		test
windows		windows
.gitignore		.gitignore
.metadata		.metadata
README.md		README.md
analysis_options.yaml		analysis_options.yaml
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

App Logo

Project Title: Voice Genie

Features

Installation

Note

Tech Stack

Challenges

Future Enhancements

What's New

Contributing

Usage Flow

Feedback

Design Philosophy

About

Releases

Packages

Languages

ArpitAswal/Voice-Genie

Folders and files

Latest commit

History

Repository files navigation

App Logo

Project Title: Voice Genie

Features

Installation

Note

Tech Stack

Challenges

Future Enhancements

What's New

Contributing

Usage Flow

Feedback

Design Philosophy

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages