Image feature extraction. #213

NOSCOPEdev · 2024-08-20T14:46:52Z

mmproj model loading and image feature extraction update. You will need to load a vision model and its mmproj file. The settings are in the "LLM.cs" script under the "Advanced Options". You will also need llamalib 1.17 or higher.

You will need to load a vision model and its mmproj file. The settings are in the "LLM.cs" script under the "Advanced Options". You will also need llamalib 1.17 or higher.

Model used: llava-v1.6-mistral-7b.Q4_K_M, mmproj-model-f16

amakropoulos

Thanks a lot for this PR!!
It needs some work before it is merged, I have left some comments.

amakropoulos · 2024-08-20T16:35:19Z

ImageReciver.cs

@@ -0,0 +1,38 @@
+using UnityEngine;


This file should be moved in a sample dir inside the Samples~ folder e.g. Samples~/ImageReceiver/ImageReceiver.cs
Also rename to ImageReceiver.cs :)

The same with the AndroidLlava.unity above.
Also rename to Scene.unity similarly to the other samples.

amakropoulos · 2024-08-20T16:38:06Z

ImageReciver.cs

+
+    //This field is used to relay the image to the AI, this can be done by both a URL or a file in your system.
+
+    public TextMeshProUGUI AnyImageData;


Please use a Text instead of TextMeshProUGUI element.
TextMeshProUGUI requires the TMP assets which vary between different Unity versions.

amakropoulos · 2024-08-20T16:39:13Z

ImageReciver.cs

+    public TextMeshProUGUI AnyImageData;
+
+    // Should work with any script that calls the Chat function on the LLMCharacter script. 
+    public AndroidDemo AD;


Copy and paste the SimpleInteraction.cs code and modify it.
This ensures that samples are independent from each other and users can install whichever they want.

amakropoulos · 2024-08-20T17:09:39Z

ImageReciver.cs

+
+    public void SendImageToAI()  
+    {
+        AD.onInputFieldSubmit(" [\r\n        {\"role\": \"system\", \"content\": \"You are an assistant who perfectly describes images.\"},\r\n        {\r\n            \"role\": \"user\",\r\n            \"content\": [\r\n                {\"type\" : \"text\", \"text\": \"What's in this image?\"},\r\n                {\"type\": \"image_url\", \"image_url\": {\"url\":" + AnyImageData.text + "\" } }\r\n            ]");


I see what you do here, it's better to define a function in Runtime/LLMCharacter.cs that takes over this part and can be reused e.g.:

public async Task<string> ChatWithImage(string query, Uri url, Callback<string> callback = null, EmptyCallback completionCallback = null, bool addToHistory = true) { URLContent urlText = new URLContent(){url = url.ToString() }; ImageURLContent urlContent = new ImageURLContent(type="image_url", image_url = urlText) TextContent message = new TextContent(){type = "text", text = query}; string queryWithImage = "[" + JsonUtility.ToJson(message) + "," + JsonUtility.ToJson(urlContent) + "]"; return await Chat(queryWithImage, callback, completionCallback, addToHistory); } public async Task<string> ChatWithImage(string query, Path path, Callback<string> callback = null, EmptyCallback completionCallback = null, bool addToHistory = true) { string queryWithImage = ... return await Chat(queryWithImage, callback, completionCallback, addToHistory); }

and inside the Runtime/LLMInterface.cs

[Serializable] public struct TextContent { public string type; public string text; } [Serializable] public struct ImageURLContent { public string type; public URLContent image_url; } [Serializable] public struct URLContent { public string url; }

Instead of manually defining the "What's in this image?" text, you can use the existing text box in the SimpleInteraction sample.

amakropoulos · 2024-08-20T17:12:28Z

LLM.cs

+            if (remote) arguments += $" --port {port} --host 0.0.0.0";
+            if (numThreadsToUse > 0) arguments += $" -t {numThreadsToUse}";
+            if (loraPath != "") arguments += $" --lora \"{loraPath}\"";
+            if (MMPROJmodel != "") arguments += $" --mmproj \"{MMPROJmodel}\"";


Instead of copying a new LLM.cs file modify the Runtime/LLM.cs to add the MMPROJmodel.
The MMPROJmodel needs to be treated similar to e.g. the loras, rather than providing this as a text, it needs some additional functionality to load it and make sure it is added inside the builds.
I will take over this part because it is quite involved.

NOSCOPEdev added 3 commits August 20, 2024 17:26

Add files via upload

161a608

You will need to load a vision model and its mmproj file. The settings are in the "LLM.cs" script under the "Advanced Options". You will also need llamalib 1.17 or higher.

LastFix.

5fd70b4

Last Last fix.

e39e4f0

Model used: llava-v1.6-mistral-7b.Q4_K_M, mmproj-model-f16

amakropoulos self-requested a review August 20, 2024 16:33

amakropoulos requested changes Aug 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image feature extraction. #213

Image feature extraction. #213

NOSCOPEdev commented Aug 20, 2024

amakropoulos left a comment

amakropoulos Aug 20, 2024

amakropoulos Aug 20, 2024

amakropoulos Aug 20, 2024

amakropoulos Aug 20, 2024

amakropoulos Aug 20, 2024

amakropoulos Aug 20, 2024

amakropoulos Aug 20, 2024


		//This field is used to relay the image to the AI, this can be done by both a URL or a file in your system.

		public TextMeshProUGUI AnyImageData;

Image feature extraction. #213

Are you sure you want to change the base?

Image feature extraction. #213

Conversation

NOSCOPEdev commented Aug 20, 2024

amakropoulos left a comment

Choose a reason for hiding this comment

amakropoulos Aug 20, 2024

Choose a reason for hiding this comment

amakropoulos Aug 20, 2024

Choose a reason for hiding this comment

amakropoulos Aug 20, 2024

Choose a reason for hiding this comment

amakropoulos Aug 20, 2024

Choose a reason for hiding this comment

amakropoulos Aug 20, 2024

Choose a reason for hiding this comment

amakropoulos Aug 20, 2024

Choose a reason for hiding this comment

amakropoulos Aug 20, 2024

Choose a reason for hiding this comment