flutter_gemma 0.10.1 copy "flutter_gemma: ^0.10.1" to clipboard
flutter_gemma: ^0.10.1 copied to clipboard

The plugin allows running the Gemma AI model locally on a device from a Flutter application. Includes support for Gemma 3 Nano models with optimized MediaPipe GenAI v0.10.24.

# Flutter Gemma

The plugin supports not only Gemma, but also other models. Here's the full list of supported models: Gemma 2B & Gemma 7B, Gemma-2 2B, Gemma-3 1B, Gemma 3 Nano 2B, Gemma 3 Nano 4B, Phi-2, Phi-3 , Phi-4, DeepSeek, Qwen2.5-1.5B-Instruct, Falcon-RW-1B, StableLM-3B.

*Note: Currently, the flutter_gemma plugin supports Gemma-3, Gemma 3 Nano (with multimodal vision support), Phi-4, DeepSeek and Qwen2.5.

Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models

gemma_github_cover

Bring the power of Google's lightweight Gemma language models directly to your Flutter applications. With Flutter Gemma, you can seamlessly incorporate advanced AI capabilities into your iOS and Android apps, all without relying on external servers.

There is an example of using:

gemma_github_gif

Features #

  • Local Execution: Run Gemma models directly on user devices for enhanced privacy and offline functionality.
  • Platform Support: Compatible with iOS, Android, and Web platforms.
  • πŸ–ΌοΈ Multimodal Support: Text + Image input with Gemma 3 Nano vision models (NEW!)
  • πŸ› οΈ Function Calling: Enable your models to call external functions and integrate with other services (supported by select models)
  • 🧠 Thinking Mode: View the reasoning process of DeepSeek models with
  • LoRA Support: Efficient fine-tuning and integration of LoRA (Low-Rank Adaptation) weights for tailored AI behavior.

Model Feature Support #

Model Family Function Calling Thinking Mode Multimodal (Vision) Notes
Gemma 3 Nano βœ… ❌ βœ… Full vision + function calling support
Gemma-3 1B ❌ ❌ ❌ Text-only models
Gemma-2 ❌ ❌ ❌ Text-only models
DeepSeek βœ… βœ… ❌ Both function calling and thinking mode
Qwen2.5 βœ… ❌ ❌ Function calling support

Installation #

  1. Add flutter_gemma to your pubspec.yaml:

    dependencies:
      flutter_gemma: latest_version
    
  2. Run flutter pub get to install.

Setup #

  1. Download Model and optionally LoRA Weights: Obtain a pre-trained Gemma model (recommended: 2b or 2b-it) from Kaggle
  1. Platform specific setup:

iOS

  • Set minimum iOS version in Podfile:
platform :ios, '16.0'  # Required for MediaPipe GenAI
  • Enable file sharing in Info.plist:
<key>UIFileSharingEnabled</key>
<true/>
  • Add network access description in Info.plist (for development):
<key>NSLocalNetworkUsageDescription</key>
<string>This app requires local network access for model inference services.</string>
  • Enable performance optimization in Info.plist (optional):
<key>CADisableMinimumFrameDurationOnPhone</key>
<true/>
  • Add memory entitlements in Runner.entitlements (for large models):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>com.apple.developer.kernel.extended-virtual-addressing</key>
	<true/>
	<key>com.apple.developer.kernel.increased-memory-limit</key>
	<true/>
	<key>com.apple.developer.kernel.increased-debugging-memory-limit</key>
	<true/>
</dict>
</plist>
  • Change the linking type of pods to static in Podfile:
use_frameworks! :linkage => :static

Android

  • If you want to use a GPU to work with the model, you need to add OpenGL support in the manifest.xml. If you plan to use only the CPU, you can skip this step.

Add to 'AndroidManifest.xml' above tag </application>

 <uses-native-library
     android:name="libOpenCL.so"
     android:required="false"/>
 <uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
 <uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>

Web

  • Web currently works only GPU backend models, CPU backend models are not supported by MediaPipe yet

  • Multimodal support (images) is in development for web platform

  • Add dependencies to index.html file in web folder

  <script type="module">
  import { FilesetResolver, LlmInference } from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai';
  window.FilesetResolver = FilesetResolver;
  window.LlmInference = LlmInference;
  </script>

Usage #

The new API splits functionality into two parts:

  • ModelFileManager: Manages model and LoRA weights file handling.
  • InferenceModel: Handles model initialization and response generation.

The updated API splits the functionality into two main parts:

  • Import and access the plugin:
import 'package:flutter_gemma/flutter_gemma.dart';

final gemma = FlutterGemmaPlugin.instance;
  • Managing Model Files with ModelFileManager
final modelManager = gemma.modelManager;

Place the model in the assets or upload it to a network drive, such as Firebase.

ATTENTION!! You do not need to load the model every time the application starts; it is stored in the system files and only needs to be done once. Please carefully review the example application. You should use loadAssetModel and loadNetworkModel methods only when you need to upload the model to device

Usage #

1.Loading Models from assets (available only in debug mode):

Don't forget to add your model to pubspec.yaml

  1. Loading from assets (loraUrl is optional)
    await modelManager.installModelFromAsset('model.bin', loraPath: 'lora_weights.bin');
  1. Loading from assets with Progress Status (loraUrl is optional)
    modelManager.installModelFromAssetWithProgress('model.bin', loraPath: 'lora_weights.bin').listen(
    (progress) {
      print('Loading progress: $progress%');
    },
    onDone: () {
      print('Model loading complete.');
    },
    onError: (error) {
      print('Error loading model: $error');
    },
  );

2.Loading Models from network:

  • For web usage, you will also need to enable CORS (Cross-Origin Resource Sharing) for your network resource. To enable CORS in Firebase, you can follow the guide in the Firebase documentation: Setting up CORS

    1. Loading from the network (loraUrl is optional).
   await modelManager.downloadModelFromNetwork('https://example.com/model.bin', loraUrl: 'https://example.com/lora_weights.bin');
  1. Loading from the network with Progress Status (loraUrl is optional)
    modelManager.downloadModelFromNetworkWithProgress('https://example.com/model.bin', loraUrl: 'https://example.com/lora_weights.bin').listen(
    (progress) {
      print('Loading progress: $progress%');
    },
    onDone: () {
      print('Model loading complete.');
    },
    onError: (error) {
      print('Error loading model: $error');
    },
);
  1. Loading LoRA Weights
  1. Loading LoRA weight from the network.
await modelManager.downloadLoraWeightsFromNetwork('https://example.com/lora_weights.bin');
  1. Loading LoRA weight from assets.
await modelManager.installLoraWeightsFromAsset('lora_weights.bin');
  1. Model Management You can set model and weights paths manually
await modelManager.setModelPath('model.bin');
await modelManager.setLoraWeightsPath('lora_weights.bin');

You can delete the model and weights from the device. Deleting the model or LoRA weights will automatically close and clean up the inference. This ensures that there are no lingering resources or memory leaks when switching models or updating files.

await modelManager.deleteModel();
await modelManager.deleteLoraWeights();

5.Initialize:

Before performing any inference, you need to create a model instance. This ensures that your application is ready to handle requests efficiently.

Text-Only Models:

final inferenceModel = await FlutterGemmaPlugin.instance.createModel(
  modelType: ModelType.gemmaIt, // Required, model type to create
  preferredBackend: PreferredBackend.gpu, // Optional, backend type, default is PreferredBackend.gpu
  maxTokens: 512, // Optional, default is 1024
  loraRanks: [4, 8], // Optional, LoRA rank configuration for fine-tuned models
);

πŸ–ΌοΈ Multimodal Models (NEW!):

final inferenceModel = await FlutterGemmaPlugin.instance.createModel(
  modelType: ModelType.gemmaIt, // Required, model type to create
  preferredBackend: PreferredBackend.gpu, // Optional, backend type
  maxTokens: 4096, // Recommended for multimodal models
  supportImage: true, // Enable image support
  maxNumImages: 1, // Optional, maximum number of images per message
  loraRanks: [4, 8], // Optional, LoRA rank configuration for fine-tuned models
);

6.Using Sessions for Single Inferences:

If you need to generate individual responses without maintaining a conversation history, use sessions. Sessions allow precise control over inference and must be properly closed to avoid memory leaks.

  1. Text-Only Session:
final session = await inferenceModel.createSession(
  temperature: 1.0, // Optional, default: 0.8
  randomSeed: 1, // Optional, default: 1
  topK: 1, // Optional, default: 1
  // topP: 0.9, // Optional nucleus sampling parameter
  // loraPath: 'path/to/lora.bin', // Optional LoRA weights path
  // enableVisionModality: true, // Enable vision for multimodal models
);

await session.addQueryChunk(Message.text(text: 'Tell me something interesting', isUser: true));
String response = await session.getResponse();
print(response);

await session.close(); // Always close the session when done
  1. πŸ–ΌοΈ Multimodal Session (NEW!):
import 'dart:typed_data'; // For Uint8List

final session = await inferenceModel.createSession(
  enableVisionModality: true, // Enable image processing
);

// Text + Image message
final imageBytes = await loadImageBytes(); // Your image loading method
await session.addQueryChunk(Message.withImage(
  text: 'What do you see in this image?',
  imageBytes: imageBytes,
  isUser: true,
));

// Note: session.getResponse() returns String directly
String response = await session.getResponse();
print(response);

await session.close();
  1. Asynchronous Response Generation:
final session = await inferenceModel.createSession();
await session.addQueryChunk(Message.text(text: 'Tell me something interesting', isUser: true));

// Note: session.getResponseAsync() returns Stream<String>
session.getResponseAsync().listen((String token) {
  print(token);
}, onDone: () {
  print('Stream closed');
}, onError: (error) {
  print('Error: $error');
});

await session.close(); // Always close the session when done

7.Chat Scenario with Automatic Session Management

For chat-based applications, you can create a chat instance. Unlike sessions, the chat instance manages the conversation context and refreshes sessions when necessary.

Text-Only Chat:

final chat = await inferenceModel.createChat(
  temperature: 0.8, // Controls response randomness, default: 0.8
  randomSeed: 1, // Ensures reproducibility, default: 1
  topK: 1, // Limits vocabulary scope, default: 1
  // topP: 0.9, // Optional nucleus sampling parameter
  // tokenBuffer: 256, // Token buffer size, default: 256
  // loraPath: 'path/to/lora.bin', // Optional LoRA weights path
  // supportImage: false, // Enable image support, default: false
  // tools: [], // List of available tools, default: []
  // supportsFunctionCalls: false, // Enable function calling, default: false
  // isThinking: false, // Enable thinking mode, default: false
  // modelType: ModelType.gemmaIt, // Model type, default: ModelType.gemmaIt
);

πŸ–ΌοΈ Multimodal Chat (NEW!):

final chat = await inferenceModel.createChat(
  temperature: 0.8, // Controls response randomness
  randomSeed: 1, // Ensures reproducibility
  topK: 1, // Limits vocabulary scope
  supportImage: true, // Enable image support in chat
  // tokenBuffer: 256, // Token buffer size for context management
);

🧠 Thinking Mode Chat (DeepSeek Models):

final chat = await inferenceModel.createChat(
  temperature: 0.8,
  randomSeed: 1,
  topK: 1,
  isThinking: true, // Enable thinking mode for DeepSeek models
  modelType: ModelType.deepSeek, // Specify DeepSeek model type
  // supportsFunctionCalls: true, // Enable function calling for DeepSeek models
);
  1. Synchronous Chat:
await chat.addQueryChunk(Message.text(text: 'User: Hello, who are you?', isUser: true));
ModelResponse response = await chat.generateChatResponse();
if (response is TextResponse) {
  print(response.token);
}

await chat.addQueryChunk(Message.text(text: 'User: Are you sure?', isUser: true));
ModelResponse response2 = await chat.generateChatResponse();
if (response2 is TextResponse) {
  print(response2.token);
}
  1. πŸ–ΌοΈ Multimodal Chat Example:
// Add text message
await chat.addQueryChunk(Message.text(text: 'Hello!', isUser: true));
ModelResponse response1 = await chat.generateChatResponse();
if (response1 is TextResponse) {
  print(response1.token);
}

// Add image message
final imageBytes = await loadImageBytes();
await chat.addQueryChunk(Message.withImage(
  text: 'Can you analyze this image?',
  imageBytes: imageBytes,
  isUser: true,
));
ModelResponse response2 = await chat.generateChatResponse();
if (response2 is TextResponse) {
  print(response2.token);
}

// Add image-only message
await chat.addQueryChunk(Message.imageOnly(imageBytes: imageBytes, isUser: true));
ModelResponse response3 = await chat.generateChatResponse();
if (response3 is TextResponse) {
  print(response3.token);
}
  1. Asynchronous Chat (Streaming):
await chat.addQueryChunk(Message.text(text: 'User: Hello, who are you?', isUser: true));

chat.generateChatResponseAsync().listen((ModelResponse response) {
  if (response is TextResponse) {
    print(response.token);
  } else if (response is FunctionCallResponse) {
    print('Function call: ${response.name}');
  } else if (response is ThinkingResponse) {
    print('Thinking: ${response.content}');
  }
}, onDone: () {
  print('Chat stream closed');
}, onError: (error) {
  print('Chat error: $error');
});
  1. πŸ› οΈ Function Calling

Enable your models to call external functions and integrate with other services. Note: Function calling is only supported by specific models - see the Model Support section below.

Step 1: Define Tools

Tools define the functions your model can call:

final List<Tool> _tools = [
  const Tool(
    name: 'change_background_color',
    description: "Changes the background color of the app. The color should be a standard web color name like 'red', 'blue', 'green', 'yellow', 'purple', or 'orange'.",
    parameters: {
      'type': 'object',
      'properties': {
        'color': {
          'type': 'string',
          'description': 'The color name',
        },
      },
      'required': ['color'],
    },
  ),
  const Tool(
    name: 'show_alert',
    description: 'Shows an alert dialog with a custom message and title.',
    parameters: {
      'type': 'object',
      'properties': {
        'title': {
          'type': 'string',
          'description': 'The title of the alert dialog',
        },
        'message': {
          'type': 'string',
          'description': 'The message content of the alert dialog',
        },
      },
      'required': ['title', 'message'],
    },
  ),
];

Step 2: Create Chat with Tools

final chat = await inferenceModel.createChat(
  temperature: 0.8,
  randomSeed: 1,
  topK: 1,
  tools: _tools, // Pass your tools
  supportsFunctionCalls: true, // Enable function calling (required for tools)
  // tokenBuffer: 256, // Adjust if needed for function calling
);

Step 3: Handle Different Response Types

The model can now return two types of responses:

// Add user message
await chat.addQueryChunk(Message.text(text: 'Change the background to blue', isUser: true));

// Handle async responses
chat.generateChatResponseAsync().listen((response) {
  if (response is TextResponse) {
    // Regular text token from the model
    print('Text: ${response.token}');
    // Update your UI with the text
  } else if (response is FunctionCallResponse) {
    // Model wants to call a function
    print('Function Call: ${response.name}(${response.args})');
    _handleFunctionCall(response);
  }
});

Step 4: Execute Function and Send Response Back

Future<void> _handleFunctionCall(FunctionCallResponse functionCall) async {
  // Execute the requested function
  Map<String, dynamic> toolResponse;
  
  switch (functionCall.name) {
    case 'change_background_color':
      final color = functionCall.args['color'] as String?;
      // Your implementation here
      toolResponse = {'status': 'success', 'message': 'Color changed to $color'};
      break;
    case 'show_alert':
      final title = functionCall.args['title'] as String?;
      final message = functionCall.args['message'] as String?;
      // Show alert dialog
      toolResponse = {'status': 'success', 'message': 'Alert shown'};
      break;
    default:
      toolResponse = {'error': 'Unknown function: ${functionCall.name}'};
  }
  
  // Send the tool response back to the model
  final toolMessage = Message.toolResponse(
    toolName: functionCall.name,
    response: toolResponse,
  );
  await chat.addQueryChunk(toolMessage);
  
  // The model will then generate a final response explaining what it did
  final finalResponse = await chat.generateChatResponse();
  if (finalResponse is TextResponse) {
    print('Model: ${finalResponse.token}');
  }
}

Function Calling Best Practices:

  • Use descriptive function names and clear descriptions
  • Specify required vs optional parameters
  • Always handle function execution errors gracefully
  • Send meaningful responses back to the model
  • The model will only call functions when explicitly requested by the user
  1. 🧠 Thinking Mode (DeepSeek Models)

DeepSeek models support "thinking mode" where you can see the model's reasoning process before it generates the final response. This provides transparency into how the model approaches problems.

Enable Thinking Mode:

final chat = await inferenceModel.createChat(
  temperature: 0.8,
  randomSeed: 1,
  topK: 1,
  isThinking: true, // Enable thinking mode
  modelType: ModelType.deepSeek, // Required for DeepSeek models
  supportsFunctionCalls: true, // DeepSeek also supports function calls
  tools: _tools, // Optional: add tools for function calling
  // tokenBuffer: 256, // Token buffer for context management
);

Handle Thinking Responses:

chat.generateChatResponseAsync().listen((response) {
  if (response is ThinkingResponse) {
    // Model's reasoning process
    print('Model is thinking: ${response.content}');
    // Show thinking bubble in UI
    _showThinkingBubble(response.content);
    
  } else if (response is TextResponse) {
    // Final response after thinking
    print('Final answer: ${response.token}');
    _updateFinalResponse(response.token);
    
  } else if (response is FunctionCallResponse) {
    // DeepSeek can also call functions while thinking
    print('Function call: ${response.name}');
    _handleFunctionCall(response);
  }
});

Thinking Mode Features:

  • βœ… Transparent Reasoning: See how the model thinks through problems
  • βœ… Interactive UI: Show/hide thinking bubbles with expandable content
  • βœ… Streaming Support: Thinking content streams in real-time
  • βœ… Function Integration: Models can think before calling functions
  • βœ… DeepSeek Optimized: Designed specifically for DeepSeek model architecture

Example Thinking Flow:

  1. User asks: "Change the background to blue and explain why blue is calming"

  2. Model thinks: "I need to change the color first, then explain the psychology"

  3. Model calls: change_background_color(color: 'blue')

  4. Model explains: "Blue is calming because it's associated with sky and ocean..."

  5. Checking Token Usage You can check the token size of a prompt before inference. The accumulated context should not exceed maxTokens to ensure smooth operation.

int tokenCount = await session.sizeInTokens('Your prompt text here');
print('Prompt size in tokens: $tokenCount');
  1. Closing the Model

When you no longer need to perform any further inferences, call the close method to release resources:

await inferenceModel.close();

If you need to use the inference again later, remember to call createModel again before generating responses.

πŸ–ΌοΈ Message Types (NEW!) #

The plugin now supports different types of messages:

// Text only
final textMessage = Message.text(text: "Hello!", isUser: true);

// Text + Image
final multimodalMessage = Message.withImage(
  text: "What's in this image?",
  imageBytes: imageBytes,
  isUser: true,
);

// Image only
final imageMessage = Message.imageOnly(imageBytes: imageBytes, isUser: true);

// Tool response (for function calling)
final toolMessage = Message.toolResponse(
  toolName: 'change_background_color',
  response: {'status': 'success', 'color': 'blue'},
);

// System information message
final systemMessage = Message.systemInfo(text: "Function completed successfully");

// Thinking content (for DeepSeek models)
final thinkingMessage = Message.thinking(text: "Let me analyze this problem...");

// Check if message contains image
if (message.hasImage) {
  print('This message contains an image');
}

// Create a copy of message
final copiedMessage = message.copyWith(text: "Updated text");

πŸ’¬ Response Types (NEW!) #

The model can return different types of responses depending on capabilities:

// Handle different response types
chat.generateChatResponseAsync().listen((response) {
  if (response is TextResponse) {
    // Regular text token from the model
    print('Text token: ${response.token}');
    // Use response.token to update your UI incrementally
    
  } else if (response is FunctionCallResponse) {
    // Model wants to call a function (Gemma 3 Nano, DeepSeek, Qwen2.5)
    print('Function: ${response.name}');
    print('Arguments: ${response.args}');
    
    // Execute the function and send response back
    _handleFunctionCall(response);
  } else if (response is ThinkingResponse) {
    // Model's reasoning process (DeepSeek models only)
    print('Thinking: ${response.content}');
    
    // Show thinking process in UI
    _showThinkingBubble(response.content);
  }
});

Response Types:

  • TextResponse: Contains a text token (response.token) for regular model output
  • FunctionCallResponse: Contains function name (response.name) and arguments (response.args) when the model wants to call a function
  • ThinkingResponse: Contains the model's reasoning process (response.content) for DeepSeek models with thinking mode enabled

🎯 Supported Models #

Text-Only Models #

πŸ–ΌοΈ Multimodal Models (Vision + Text) #

πŸ› οΈ Model Function Calling Support #

Function calling is currently supported by the following models:

βœ… Models with Function Calling Support #

  • Gemma 3 Nano models (E2B, E4B) - Full function calling support
  • DeepSeek models - Function calling + thinking mode support
  • Qwen models - Full function calling support

❌ Models WITHOUT Function Calling Support #

  • Gemma 3 1B models - Text generation only
  • Phi models - Text generation only

Important Notes:

  • When using unsupported models with tools, the plugin will log a warning and ignore the tools
  • Models will work normally for text generation even if function calling is not supported
  • Check the supportsFunctionCalls property in your model configuration

🌐 Platform Support #

Feature Android iOS Web
Text Generation βœ… βœ… βœ…
Image Input βœ… βœ… ⚠️
Function Calling βœ… βœ… βœ…
GPU Acceleration βœ… βœ… βœ…
Streaming Responses βœ… βœ… βœ…
LoRA Support βœ… βœ… βœ…
  • βœ… = Fully supported
  • ⚠️ = In development

The full and complete example you can find in example folder

Important Considerations #

  • Model Size: Larger models (such as 7b and 7b-it) might be too resource-intensive for on-device inference.
  • Function Calling Support: Gemma 3 Nano and DeepSeek models support function calling. Other models will ignore tools and show a warning.
  • Thinking Mode: Only DeepSeek models support thinking mode. Enable with isThinking: true and modelType: ModelType.deepSeek.
  • Multimodal Models: Gemma 3 Nano models with vision support require more memory and are recommended for devices with 8GB+ RAM.
  • iOS Memory Requirements: Large models require memory entitlements in Runner.entitlements and minimum iOS 16.0.
  • LoRA Weights: They provide efficient customization without the need for full model retraining.
  • Development vs. Production: For production apps, do not embed the model or LoRA weights within your assets. Instead, load them once and store them securely on the device or via a network drive.
  • Web Models: Currently, Web support is available only for GPU backend models. Multimodal support is in development.
  • Image Formats: The plugin automatically handles common image formats (JPEG, PNG, etc.) when using Message.withImage().

πŸ›Ÿ Troubleshooting #

Multimodal Issues:

  • Ensure you're using a multimodal model (Gemma 3 Nano E2B/E4B)
  • Set supportImage: true when creating model and chat
  • Check device memory - multimodal models require more RAM

Performance:

  • Use GPU backend for better performance with multimodal models
  • Consider using CPU backend for text-only models on lower-end devices

Memory Issues:

  • iOS: Ensure Runner.entitlements contains memory entitlements (see iOS setup)
  • iOS: Set minimum platform to iOS 16.0 in Podfile
  • Reduce maxTokens if experiencing memory issues
  • Use smaller models (1B-2B parameters) for devices with <6GB RAM
  • Close sessions and models when not needed
  • Monitor token usage with sizeInTokens()

iOS Build Issues:

  • Ensure minimum iOS version is set to 16.0 in Podfile
  • Use static linking: use_frameworks! :linkage => :static
  • Clean and reinstall pods: cd ios && pod install --repo-update
  • Check that all required entitlements are in Runner.entitlements

Advanced Usage #

ModelThinkingFilter (Advanced) #

For advanced users who need to manually process model responses, the ModelThinkingFilter class provides utilities for cleaning model outputs:

import 'package:flutter_gemma/core/extensions.dart';

// Clean response based on model type
String cleanedResponse = ModelThinkingFilter.cleanResponse(
  rawResponse, 
  ModelType.deepSeek
);

// The filter automatically removes model-specific tokens like:
// - <end_of_turn> tags (Gemma models)
// - Special DeepSeek tokens
// - Extra whitespace and formatting

This is automatically handled by the chat API, but can be useful for custom inference implementations.

πŸš€ What's New #

βœ… πŸ› οΈ Advanced Function Calling - Enable your models to call external functions and integrate with other services (Gemma 3 Nano, DeepSeek, and Qwen2.5 models)
βœ… 🧠 Thinking Mode - View the reasoning process of DeepSeek models with interactive thinking bubbles
βœ… πŸ’¬ Enhanced Response Types - New TextResponse, FunctionCallResponse, and ThinkingResponse types for better handling
βœ… πŸ–ΌοΈ Multimodal Support - Text + Image input with Gemma 3 Nano models
βœ… πŸ“¨ Enhanced Message API - Support for different message types including tool responses
βœ… βš™οΈ Simplified Setup - Automatic vision modality configuration
βœ… 🌐 Cross-Platform - Works on Android, iOS, and Web (text-only)
βœ… πŸ’Ύ Memory Optimization - Better resource management for multimodal models

Coming Soon:

  • Enhanced web platform support for images
  • More multimodal model support
  • Video/Audio input capabilities
227
likes
0
points
4.59k
downloads

Publisher

verified publishermobilepeople.dev

Weekly Downloads

The plugin allows running the Gemma AI model locally on a device from a Flutter application. Includes support for Gemma 3 Nano models with optimized MediaPipe GenAI v0.10.24.

Repository (GitHub)
View/report issues

License

unknown (license)

Dependencies

flutter, flutter_web_plugins, large_file_handler, path, path_provider, plugin_platform_interface, shared_preferences

More

Packages that depend on flutter_gemma

Packages that implement flutter_gemma