Flutter MediaPipe Chat enables the Flutter community to implement and test AI models locally on Android and iOS, leveraging Google MediaPipe. This solution performs all processing on the device, allowing optimized models to run natively on smartphones. Additionally, it supports training and deploying custom models, providing developers with greater control and opening the door to innovative applications.
- Android: minSdkVersion 24 (required by MediaPipe).
- iOS: iOS 13.0 or later.
To add the package from the console, run:
flutter pub add flutter_mediapipe_chat
dependencies:
flutter:
sdk: flutter
flutter_mediapipe_chat: ^1.0.0
Then run:
flutter pub get
In your android/app/build.gradle
, make sure to include:
android {
defaultConfig {
minSdkVersion 24
}
}
In AndroidManifest.xml
(usually android/app/src/main/AndroidManifest.xml
), after the </activity>
tag, add:
<uses-native-library android:name="libOpenCL.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>
After installation and setup, you can start using flutter_mediapipe_chat
:
- Loading the Model
import 'package:flutter_mediapipe_chat/flutter_mediapipe_chat.dart';
final chatPlugin = FlutterMediapipeChat();
final config = ModelConfig(
path: "assets/models/gemma-2b-it-gpu-int8.bin",
temperature: 0.7,
maxTokens: 1024,
topK: 50,
randomSeed: 42,
loraPath: null,
);
await chatPlugin.loadModel(config);
- Generate Responses (Synchronous)
String? response = await chatPlugin.generateResponse("Hello, how are you?");
if (response != null) {
print("Model Response: $response");
} else {
print("No response from model.");
}
- Generate Responses (Streaming)
chatPlugin
.generateResponseAsync("Tell me a story about a brave knight.")
.listen((token) {
if (token == null) {
print("Stream ended.");
} else {
print("Token: $token");
}
});
Inside the example/
folder there is a demo project showing model loading, a chat interface, and both synchronous and asynchronous text generation. To run it:
cd example
flutter run
This plugin simplifies on-device local LLM inference (thanks to the MediaPipe framework) in Flutter for Android and iOS, removing the need for cloud services.
Note: The MediaPipe LLM Inference API is experimental and under active development. Use of this API is subject to the Generative AI Prohibited Use Policy.
-
Local Inference
Avoids network dependencies by running models entirely on-device. -
Cross-Platform
Compatible with Android (API 24+) and iOS (13.0+). -
Flexible Generation
Choose between synchronous or asynchronous response modes. -
Advanced Model Customization
Adjust parameters liketemperature
,maxTokens
,topK
,randomSeed
, and optional LoRA configurations. -
GPU/CPU Variants
Select between CPU- or GPU-optimized variants (if the device supports it).
-
Gemma-2 2B (2 billion parameters)
CPU/GPU in int8 variants. -
Gemma 2B (2 billion parameters)
CPU/GPU in int4/int8 variants. -
Gemma 7B (7 billion parameters, Web only on high-end devices)
GPU int8 variant.
Download these .bin
models from Kaggle (Gemma) and load them with FlutterMediapipeChat
.
- Falcon-1B
- StableLM-3B
- Phi-2
They require a script to convert to .bin
or .tflite
. Check out the AI Edge Torch Generative Library for PyTorch conversions.
If you have a custom PyTorch model, convert it using AI Edge Torch Generative:
- Export your PyTorch model to
.tflite
. - Combine the
.tflite
file with tokenizer parameters into a single.task
. - Provide the path in
ModelConfig.path
.
LoRA allows inexpensive training of large models by only modifying certain internal ranks. It’s available on GPU backends for:
- Gemma (2B, Gemma-2 2B)
- Phi-2
- Train LoRA weights for your base model.
- Convert them with the MediaPipe library specifying the LoRA checkpoint, rank, and GPU backend.
ModelConfig(
path: "assets/models/base_model_gpu.bin",
loraPath: "assets/models/lora_model_gpu.bin",
temperature: 0.8,
maxTokens: 1024,
topK: 40,
);
Note: LoRA is only supported in
.bin
or.tflite
GPU models, not CPU.
Field | Type | Default | Description |
---|---|---|---|
path |
String | Required | Path to the base file (.bin or .task ). |
temperature |
double | 0.8 | Controls randomness/creativity. |
maxTokens |
int | 1024 | Maximum number of tokens (input + output). |
topK |
int | 40 | Limits predictions to the K most probable tokens. |
randomSeed |
int | 0 | Seed for random text generation. |
loraPath |
String? | null | Path to LoRA weights, GPU models only. |
supportedLoraRanks |
List? | null | For specific LoRA ranks (advanced usage). |
- Gemma-2 2B (8-bit, CPU/GPU)
- Gemma 2B (int4/int8, CPU/GPU)
- Gemma 7B (int8, GPU, Web Only)
Consider also Phi-2, StableLM-3B, Falcon-1B after conversion.
Contributions are welcome! Send pull requests on GitHub. For questions or feature requests, open an issue in the repository’s tracker.
Licensed under MIT (see LICENSE). Third-party models (e.g., Falcon, StableLM, Phi-2) are not Google services. Make sure to comply with their licenses.
Use of MediaPipe LLM Inference is governed by the Generative AI Prohibited Use Policy.
Gemma is an open family of models derived from the same research as Gemini, subject to licensing terms on Kaggle.