Added real-time video capture from camera

szczyglis-dev · Dec 10, 2023 · d3abf37 · d3abf37
1 parent 611f9bd
commit d3abf37
Show file tree

Hide file tree

Showing 33 changed files with 607 additions and 32 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # PYGPT v2
 
-Release: **2.0.13** build: **2023.12.10** | Official website: https://pygpt.net | Docs: https://pygpt.readthedocs.io
+Release: **2.0.14** build: **2023.12.10** | Official website: https://pygpt.net | Docs: https://pygpt.readthedocs.io
 
 PyPi: https://pypi.org/project/pygpt-net
 
@@ -29,6 +29,7 @@ You can download compiled version for Windows and Linux here: https://pygpt.net/
 - 6 modes of operation: Assistant, Chat, Vision, Completion, Image generation, Langchain.
 - Supports multiple models: `GPT-4`, `GPT-3.5`, and `GPT-3`, including any model accessible through `Langchain`.
 - Handles and stores the full context of conversations (short-term memory).
+- Real-time video camera capture in Vision mode
 - Internet access via `Google Custom Search API`.
 - Speech synthesis via `Microsoft Azure TTS` and `OpenAI TTS`.
 - Speech recognition via `OpenAI Whisper`.
@@ -225,13 +226,24 @@ can be sent to the OpenAI API.
 
 This mode enables image analysis using the `GPT-4 Vision` model. Functioning much like the chat mode, 
 it also allows you to upload images or provide URLs to images. The vision feature can analyze both local 
-images and those found online.
+images and those found online. 
 
-**1) you can provide an image URL**
+Vision mode also includes real-time video capture from camera. To enable capture check the option "Camera" on the right-bottom corner. It will enable real-capture from your camera. To capture image from camera and append it to chat just click on video at left side. You can also enable "Auto capture" mode - image will be captured and appended to chat message every time you send message.
+
+![v2_capture_enable](https://github.com/szczyglis-dev/py-gpt/assets/61396542/f2a29c21-caa7-4a77-a36e-951824415736)
+
+
+**1) Video camera real-time image capture:**
+
+![v2_capture1](https://github.com/szczyglis-dev/py-gpt/assets/61396542/7092fc58-d8eb-4d23-aa4c-8686eb3efdb0)
+
+![v2_capture_result](https://github.com/szczyglis-dev/py-gpt/assets/61396542/fff7e72d-3427-4dc2-b204-750d792d1782)
+
+**2) you can also provide an image URL:**
 
 ![v2_mode_vision](https://github.com/szczyglis-dev/py-gpt/assets/61396542/1e618d68-6c60-4826-82c5-87149523e989)
 
-**2) you can also upload your local images**
+**3) you can also upload your local images:**
 
 ![v2_mode_vision_upload](https://github.com/szczyglis-dev/py-gpt/assets/61396542/ee796ef5-706d-4dd8-bb02-dd28b7042a12)
 ## Langchain
@@ -954,6 +966,14 @@ brought up in the conversation.
 
 - `Auto-summary instruction`: Summary prompt for context auto-summary (GPT-3.5 is used for this)
 
+- `Vision: Camera`: Enables camera in Vision mode
+
+- `Vision: Auto capture`: Enables auto-capture on message send in Vision mode
+
+- `Vision: Camera capture width (px)`: Video capture resolution (width)
+
+- `Vision: Camera capture height (px)`: Video capture resolution (heigth)
+
 ## JSON files
 
 The configuration is stored in JSON files for easy manual modification outside of the application. 
@@ -986,6 +1006,7 @@ You can manually edit the configuration files in this directory:
 - `models.json` - stores models configurations.
 - `context.json` - maintains an index of contexts.
 - `context` - a directory for context files in `.json` format.
+- `capture` - a directory for captured images from camera
 - `history` - a directory for history logs in `.txt` format.
 - `img` - a directory for images generated with `DALL-E 3` and `DALL-E 2`, saved as `.png` files.
 - `output` - a directory for output files and files downloaded/generated by GPT.
@@ -1040,6 +1061,10 @@ may consume additional tokens that are not displayed in the main window.
 
 # CHANGELOG
 
+## 2.0.14 (2023-12-10)
+
+- Added real-time video capture from camera in "Vision" mode
+
 ## 2.0.13 (2023-12-10)
 
 - Fixed path resolving in "open in directory" option on Windows OS

diff --git a/docs/source/advanced.rst b/docs/source/advanced.rst
@@ -15,6 +15,7 @@ You can manually edit the configuration files in this directory:
 * ``models.json`` - stores models configurations.
 * ``context.json`` - maintains an index of contexts.
 * ``context`` - a directory for context files in `.json` format.
+* ``capture`` - a directory for captured images from camera
 * ``history`` - a directory for history logs in `.txt` format.
 * ``img`` - a directory for images generated with `DALL-E 3` and `DALL-E 2`, saved as `.png` files.
 * ``output`` - a directory for output files and files downloaded/generated by GPT.

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -9,7 +9,7 @@
 project = 'PYGPT'
 copyright = '2023, pygpt.net'
 author = 'szczyglis-dev, Marcin Szczygliński'
-release = '2.0.13'
+release = '2.0.14'
 
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

diff --git a/docs/source/images/v2_capture1.png b/docs/source/images/v2_capture1.png
diff --git a/docs/source/images/v2_capture_enable.png b/docs/source/images/v2_capture_enable.png
diff --git a/docs/source/images/v2_capture_result.png b/docs/source/images/v2_capture_result.png
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -6,11 +6,11 @@
 PYGPT v2 - pygpt.net
 ====================
 
-| **Last update:** 2023-12-10 13:00
+| **Last update:** 2023-12-10 18:00
 | **Project website:** https://pygpt.net
 | **GitHub:** https://github.com/szczyglis-dev/py-gpt
 | **PyPI:** https://pypi.org/project/pygpt-net
-| **Release:** 2.0.13 (2023-12-10)
+| **Release:** 2.0.14 (2023-12-10)
 
 .. toctree::
    :maxdepth: 3

diff --git a/docs/source/intro.rst b/docs/source/intro.rst
@@ -24,6 +24,7 @@ Features
 * 6 modes of operation: Assistant, Chat, Vision, Completion, Image generation, Langchain.
 * Supports multiple models: ``GPT-4``, ``GPT-3.5``, and ``GPT-3``, including any model accessible through ``Langchain``.
 * Handles and stores the full context of conversations (short-term memory).
+* Real-time video camera capture in Vision mode
 * Internet access via ``Google Custom Search API``.
 * Speech synthesis via ``Microsoft Azure TTS`` and ``OpenAI TTS``.
 * Speech recognition through ``OpenAI Whisper``.

diff --git a/docs/source/modes.rst b/docs/source/modes.rst
@@ -48,12 +48,25 @@ This mode enables image analysis using the ``GPT-4 Vision`` model. Functioning m
 it also allows you to upload images or provide URLs to images. The vision feature can analyze both local 
 images and those found online.
 
-**1) you can provide an image URL**
+Vision mode also includes real-time video capture from camera. To enable capture check the option "Camera" on the right-bottom corner. It will enable real-capture from your camera. To capture image from camera and append it to chat just click on video at left side. You can also enable "Auto capture" mode - image will be captured and appended to chat message every time you send message.
+
+.. image:: images/v2_capture_enable.png
+   :width: 400
+
+**1) Video camera real-time image capture:**
+
+.. image:: images/v2_capture1.png
+   :width: 800
+
+.. image:: images/v2_capture_result.png
+   :width: 800
+
+**2) you can also provide an image URL:**
 
 .. image:: images/v2_mode_vision.png
    :width: 800
 
-**2) you can also upload your local images**
+**3) you can also upload your local images:**
 
 .. image:: images/v2_mode_vision_upload.png
    :width: 800

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "pygpt-net"
-version = "2.0.13"
+version = "2.0.14"
 description = "Desktop AI Assistant powered by GPT-4, GPT-4V, GPT-3, Whisper, TTS and DALL-E 3 with chatbot, assistant, text completion, vision and image generation, real-time internet access, commands and code execution, files upload and download and more"
 readme = "README.md"
 authors = [{ name = "Marcin Szczygliński", email = "info@pygpt.net" }]
@@ -23,6 +23,7 @@ dependencies = [
     'langchain>=0.0.345',
     'langchain-experimental>=0.0.44',
     'openai>=1.3.7',
+    'opencv-python>=4.8.1.78',
     'packaging>=23.0',
     'PyAudio>=0.2.14',
     'pydub>=0.25.1',

diff --git a/requirements.txt b/requirements.txt
@@ -45,6 +45,7 @@ multidict==6.0.4
 mypy-extensions==1.0.0
 numpy==1.26.2
 openai==1.3.7
+opencv-python==4.8.1.78
 packaging==23.1
 pip-tools==7.3.0
 pkginfo==1.9.6

diff --git a/setup.py b/setup.py
@@ -1,6 +1,6 @@
 from setuptools import setup, find_packages
 
-VERSION = '2.0.13'
+VERSION = '2.0.14'
 DESCRIPTION = 'Desktop AI Assistant powered by GPT-4, GPT-4V, GPT-3, Whisper, TTS and DALL-E 3 with chatbot, assistant, text completion, ' \
               'vision and image generation, real-time internet access, commands and code execution, files upload and download and more'
 LONG_DESCRIPTION = 'Package containing a GPT-4, GPT-4V, GPT-3, Whisper, TTS and DALL-E 3 AI desktop assistant with chatbot, ' \
@@ -30,6 +30,7 @@
         'langchain>=0.0.345',
         'langchain-experimental>=0.0.44',
         'openai>=1.3.7',
+        'opencv-python>=4.8.1.78',
         'packaging>=23.0',
         'PyAudio>=0.2.14',
         'pydub>=0.25.1',

diff --git a/src/pygpt_net/CHANGELOG.txt b/src/pygpt_net/CHANGELOG.txt
@@ -1,3 +1,7 @@
+2.0.14 (2023-12-10)
+
+- Added real-time video capture from camera in "Vision" mode
+
 2.0.13 (2023-12-10)
 
 - Fixed path resolving in "open in directory" option on Windows OS

diff --git a/src/pygpt_net/__init__.py b/src/pygpt_net/__init__.py
@@ -6,14 +6,14 @@
 # GitHub:  https://github.com/szczyglis-dev/py-gpt   #
 # MIT License                                        #
 # Created By  : Marcin Szczygliński                  #
-# Updated Date: 2023.12.10 13:00:00                  #
+# Updated Date: 2023.12.10 17:00:00                  #
 # ================================================== #
 
 __author__ = "Marcin Szczygliński"
 __copyright__ = "Copyright 2023, Marcin Szczygliński"
 __credits__ = ["Marcin Szczygliński"]
 __license__ = "MIT"
-__version__ = "2.0.13"
+__version__ = "2.0.14"
 __build__ = "2023.12.10"
 __maintainer__ = "Marcin Szczygliński"
 __github__ = "https://github.com/szczyglis-dev/py-gpt"

diff --git a/src/pygpt_net/core/app.py b/src/pygpt_net/core/app.py
@@ -159,6 +159,7 @@ def post_setup(self):
     def update(self):
         """Called on update"""
         self.debugger.update()
+        self.controller.update()
 
     def set_status(self, text):
         """

diff --git a/src/pygpt_net/core/attachments.py b/src/pygpt_net/core/attachments.py
@@ -169,6 +169,18 @@ def clear_all(self):
         """
         self.items = {}
 
+    def has(self, mode):
+        """
+        Checks id mode has attachments
+
+        :param mode: mode
+        :return: True if exists
+        """
+        if mode not in self.items:
+            self.items[mode] = {}
+
+        return len(self.items[mode]) > 0
+
     def new(self, mode, name=None, path=None, auto_save=True):
         """
         Creates new attachment

diff --git a/src/pygpt_net/core/camera.py b/src/pygpt_net/core/camera.py
@@ -0,0 +1,72 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# ================================================== #
+# This file is a part of PYGPT package               #
+# Website: https://pygpt.net                         #
+# GitHub:  https://github.com/szczyglis-dev/py-gpt   #
+# MIT License                                        #
+# Created By  : Marcin Szczygliński                  #
+# Updated Date: 2023.12.10 16:00:00                  #
+# ================================================== #
+
+import cv2
+
+from PySide6.QtCore import QObject, Signal
+
+
+class Camera:
+    def __init__(self, config=None):
+        """
+        Camera
+
+        :param config: config object
+        """
+        self.config = config
+        self.capture = None
+        self.current = None
+
+
+class CameraThread(QObject):
+    finished = Signal(object)
+    destroyed = Signal()
+    started = Signal()
+    stopped = Signal()
+
+    def __init__(self, window=None):
+        """
+        Camera capture thread
+        """
+        super().__init__()
+        self.window = window
+        self.initialized = False
+        self.capture = None
+        self.frame = None
+
+    def setup_camera(self):
+        """Initialize camera.
+        """
+        try:
+            self.capture = cv2.VideoCapture(0)
+            self.capture.set(cv2.CAP_PROP_FRAME_WIDTH, self.window.config.data['vision.capture.width'])
+            self.capture.set(cv2.CAP_PROP_FRAME_HEIGHT, self.window.config.data['vision.capture.height'])
+        except Exception as e:
+            print("Camera thread exception:", e)
+            self.finished.emit(e)
+
+    def run(self):
+        try:
+            if not self.initialized:
+                self.setup_camera()
+                self.initialized = True
+
+            print("Starting video capture thread....")
+            while True:
+                if self.window.is_closing or self.capture is None:
+                    break
+                _, frame = self.capture.read()
+                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+                frame = cv2.flip(frame, 1)
+                self.window.controller.camera.frame = frame  # update frame
+        except Exception as e:
+            print("Camera thread exception:", e)
+            self.finished.emit(e)
diff --git a/src/pygpt_net/core/command.py b/src/pygpt_net/core/command.py
@@ -42,13 +42,12 @@ def get_prompt(self):
         8) Commands are listed one command per line and every command is described with syntax: "<name>": <action>, params: <params>
         9) Always use correct command name, e.g. if command name is "sys_exec" then use "sys_exec" and don't imagine other names, like "run" or something.
         10) With those commands you are allowed to run external commands and apps in user's system (environment)
-        11) Do not ask for command execution, just do it.
-        12) Always use defined syntax to prevent errors
-        13) Always choose the most appropriate command from list to perform the task, based on the description of the action performed by a given comment
-        14) Reply to the user in the language in which he started the conversation with you
-        15) Use ONLY params described in command definition, do NOT use any additional params not described on list
-        16) ALWAYS remember that any text content must appear at the beginning of your response and commands must only be included at the end.
-        17) Try to run commands executed in the user's system in the background if running them may prevent receiving a response (e.g. when it is a desktop application)
+        11) Always use defined syntax to prevent errors
+        12) Always choose the most appropriate command from list to perform the task, based on the description of the action performed by a given comment
+        13) Reply to the user in the language in which he started the conversation with you
+        14) Use ONLY params described in command definition, do NOT use any additional params not described on list
+        15) ALWAYS remember that any text content must appear at the beginning of your response and commands must only be included at the end.
+        16) Try to run commands executed in the user's system in the background if running them may prevent receiving a response (e.g. when it is a desktop application)
 
         Commands list:
         '''

diff --git a/src/pygpt_net/core/config.py b/src/pygpt_net/core/config.py
@@ -599,5 +599,10 @@ def install(self):
             if not os.path.exists(files_dir):
                 os.mkdir(files_dir)
 
+            # create img capture directory
+            capture_dir = os.path.join(self.path, 'capture')
+            if not os.path.exists(capture_dir):
+                os.mkdir(capture_dir)
+
         except Exception as e:
             print(e)
diff --git a/src/pygpt_net/core/controller/attachment.py b/src/pygpt_net/core/controller/attachment.py
@@ -215,6 +215,14 @@ def import_from_assistant(self, mode, assistant):
             return
         self.attachments.from_files(mode, assistant.files)
 
+    def has_attachments(self, mode):
+        """
+        Returns True if has attachments
+
+        :return: True if has attachments
+        """
+        return self.attachments.has(mode)
+
     def download(self, file_id):
         """
         Downloads file