Skip to content

Commit

Permalink
Added real-time video capture from camera
Browse files Browse the repository at this point in the history
  • Loading branch information
szczyglis-dev committed Dec 10, 2023
1 parent 611f9bd commit d3abf37
Show file tree
Hide file tree
Showing 33 changed files with 607 additions and 32 deletions.
33 changes: 29 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# PYGPT v2

Release: **2.0.13** build: **2023.12.10** | Official website: https://pygpt.net | Docs: https://pygpt.readthedocs.io
Release: **2.0.14** build: **2023.12.10** | Official website: https://pygpt.net | Docs: https://pygpt.readthedocs.io

PyPi: https://pypi.org/project/pygpt-net

Expand Down Expand Up @@ -29,6 +29,7 @@ You can download compiled version for Windows and Linux here: https://pygpt.net/
- 6 modes of operation: Assistant, Chat, Vision, Completion, Image generation, Langchain.
- Supports multiple models: `GPT-4`, `GPT-3.5`, and `GPT-3`, including any model accessible through `Langchain`.
- Handles and stores the full context of conversations (short-term memory).
- Real-time video camera capture in Vision mode
- Internet access via `Google Custom Search API`.
- Speech synthesis via `Microsoft Azure TTS` and `OpenAI TTS`.
- Speech recognition via `OpenAI Whisper`.
Expand Down Expand Up @@ -225,13 +226,24 @@ can be sent to the OpenAI API.

This mode enables image analysis using the `GPT-4 Vision` model. Functioning much like the chat mode,
it also allows you to upload images or provide URLs to images. The vision feature can analyze both local
images and those found online.
images and those found online.

**1) you can provide an image URL**
Vision mode also includes real-time video capture from camera. To enable capture check the option "Camera" on the right-bottom corner. It will enable real-capture from your camera. To capture image from camera and append it to chat just click on video at left side. You can also enable "Auto capture" mode - image will be captured and appended to chat message every time you send message.

![v2_capture_enable](https://github.com/szczyglis-dev/py-gpt/assets/61396542/f2a29c21-caa7-4a77-a36e-951824415736)


**1) Video camera real-time image capture:**

![v2_capture1](https://github.com/szczyglis-dev/py-gpt/assets/61396542/7092fc58-d8eb-4d23-aa4c-8686eb3efdb0)

![v2_capture_result](https://github.com/szczyglis-dev/py-gpt/assets/61396542/fff7e72d-3427-4dc2-b204-750d792d1782)

**2) you can also provide an image URL:**

![v2_mode_vision](https://github.com/szczyglis-dev/py-gpt/assets/61396542/1e618d68-6c60-4826-82c5-87149523e989)

**2) you can also upload your local images**
**3) you can also upload your local images:**

![v2_mode_vision_upload](https://github.com/szczyglis-dev/py-gpt/assets/61396542/ee796ef5-706d-4dd8-bb02-dd28b7042a12)
## Langchain
Expand Down Expand Up @@ -954,6 +966,14 @@ brought up in the conversation.

- `Auto-summary instruction`: Summary prompt for context auto-summary (GPT-3.5 is used for this)

- `Vision: Camera`: Enables camera in Vision mode

- `Vision: Auto capture`: Enables auto-capture on message send in Vision mode

- `Vision: Camera capture width (px)`: Video capture resolution (width)

- `Vision: Camera capture height (px)`: Video capture resolution (heigth)

## JSON files

The configuration is stored in JSON files for easy manual modification outside of the application.
Expand Down Expand Up @@ -986,6 +1006,7 @@ You can manually edit the configuration files in this directory:
- `models.json` - stores models configurations.
- `context.json` - maintains an index of contexts.
- `context` - a directory for context files in `.json` format.
- `capture` - a directory for captured images from camera
- `history` - a directory for history logs in `.txt` format.
- `img` - a directory for images generated with `DALL-E 3` and `DALL-E 2`, saved as `.png` files.
- `output` - a directory for output files and files downloaded/generated by GPT.
Expand Down Expand Up @@ -1040,6 +1061,10 @@ may consume additional tokens that are not displayed in the main window.

# CHANGELOG

## 2.0.14 (2023-12-10)

- Added real-time video capture from camera in "Vision" mode

## 2.0.13 (2023-12-10)

- Fixed path resolving in "open in directory" option on Windows OS
Expand Down
1 change: 1 addition & 0 deletions docs/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ You can manually edit the configuration files in this directory:
* ``models.json`` - stores models configurations.
* ``context.json`` - maintains an index of contexts.
* ``context`` - a directory for context files in `.json` format.
* ``capture`` - a directory for captured images from camera
* ``history`` - a directory for history logs in `.txt` format.
* ``img`` - a directory for images generated with `DALL-E 3` and `DALL-E 2`, saved as `.png` files.
* ``output`` - a directory for output files and files downloaded/generated by GPT.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
project = 'PYGPT'
copyright = '2023, pygpt.net'
author = 'szczyglis-dev, Marcin Szczygliński'
release = '2.0.13'
release = '2.0.14'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
Binary file added docs/source/images/v2_capture1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/images/v2_capture_enable.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/images/v2_capture_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
PYGPT v2 - pygpt.net
====================

| **Last update:** 2023-12-10 13:00
| **Last update:** 2023-12-10 18:00
| **Project website:** https://pygpt.net
| **GitHub:** https://github.com/szczyglis-dev/py-gpt
| **PyPI:** https://pypi.org/project/pygpt-net
| **Release:** 2.0.13 (2023-12-10)
| **Release:** 2.0.14 (2023-12-10)
.. toctree::
:maxdepth: 3
Expand Down
1 change: 1 addition & 0 deletions docs/source/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Features
* 6 modes of operation: Assistant, Chat, Vision, Completion, Image generation, Langchain.
* Supports multiple models: ``GPT-4``, ``GPT-3.5``, and ``GPT-3``, including any model accessible through ``Langchain``.
* Handles and stores the full context of conversations (short-term memory).
* Real-time video camera capture in Vision mode
* Internet access via ``Google Custom Search API``.
* Speech synthesis via ``Microsoft Azure TTS`` and ``OpenAI TTS``.
* Speech recognition through ``OpenAI Whisper``.
Expand Down
17 changes: 15 additions & 2 deletions docs/source/modes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,12 +48,25 @@ This mode enables image analysis using the ``GPT-4 Vision`` model. Functioning m
it also allows you to upload images or provide URLs to images. The vision feature can analyze both local
images and those found online.

**1) you can provide an image URL**
Vision mode also includes real-time video capture from camera. To enable capture check the option "Camera" on the right-bottom corner. It will enable real-capture from your camera. To capture image from camera and append it to chat just click on video at left side. You can also enable "Auto capture" mode - image will be captured and appended to chat message every time you send message.

.. image:: images/v2_capture_enable.png
:width: 400

**1) Video camera real-time image capture:**

.. image:: images/v2_capture1.png
:width: 800

.. image:: images/v2_capture_result.png
:width: 800

**2) you can also provide an image URL:**

.. image:: images/v2_mode_vision.png
:width: 800

**2) you can also upload your local images**
**3) you can also upload your local images:**

.. image:: images/v2_mode_vision_upload.png
:width: 800
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "pygpt-net"
version = "2.0.13"
version = "2.0.14"
description = "Desktop AI Assistant powered by GPT-4, GPT-4V, GPT-3, Whisper, TTS and DALL-E 3 with chatbot, assistant, text completion, vision and image generation, real-time internet access, commands and code execution, files upload and download and more"
readme = "README.md"
authors = [{ name = "Marcin Szczygliński", email = "info@pygpt.net" }]
Expand All @@ -23,6 +23,7 @@ dependencies = [
'langchain>=0.0.345',
'langchain-experimental>=0.0.44',
'openai>=1.3.7',
'opencv-python>=4.8.1.78',
'packaging>=23.0',
'PyAudio>=0.2.14',
'pydub>=0.25.1',
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ multidict==6.0.4
mypy-extensions==1.0.0
numpy==1.26.2
openai==1.3.7
opencv-python==4.8.1.78
packaging==23.1
pip-tools==7.3.0
pkginfo==1.9.6
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from setuptools import setup, find_packages

VERSION = '2.0.13'
VERSION = '2.0.14'
DESCRIPTION = 'Desktop AI Assistant powered by GPT-4, GPT-4V, GPT-3, Whisper, TTS and DALL-E 3 with chatbot, assistant, text completion, ' \
'vision and image generation, real-time internet access, commands and code execution, files upload and download and more'
LONG_DESCRIPTION = 'Package containing a GPT-4, GPT-4V, GPT-3, Whisper, TTS and DALL-E 3 AI desktop assistant with chatbot, ' \
Expand Down Expand Up @@ -30,6 +30,7 @@
'langchain>=0.0.345',
'langchain-experimental>=0.0.44',
'openai>=1.3.7',
'opencv-python>=4.8.1.78',
'packaging>=23.0',
'PyAudio>=0.2.14',
'pydub>=0.25.1',
Expand Down
4 changes: 4 additions & 0 deletions src/pygpt_net/CHANGELOG.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
2.0.14 (2023-12-10)

- Added real-time video capture from camera in "Vision" mode

2.0.13 (2023-12-10)

- Fixed path resolving in "open in directory" option on Windows OS
Expand Down
4 changes: 2 additions & 2 deletions src/pygpt_net/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@
# GitHub: https://github.com/szczyglis-dev/py-gpt #
# MIT License #
# Created By : Marcin Szczygliński #
# Updated Date: 2023.12.10 13:00:00 #
# Updated Date: 2023.12.10 17:00:00 #
# ================================================== #

__author__ = "Marcin Szczygliński"
__copyright__ = "Copyright 2023, Marcin Szczygliński"
__credits__ = ["Marcin Szczygliński"]
__license__ = "MIT"
__version__ = "2.0.13"
__version__ = "2.0.14"
__build__ = "2023.12.10"
__maintainer__ = "Marcin Szczygliński"
__github__ = "https://github.com/szczyglis-dev/py-gpt"
Expand Down
1 change: 1 addition & 0 deletions src/pygpt_net/core/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ def post_setup(self):
def update(self):
"""Called on update"""
self.debugger.update()
self.controller.update()

def set_status(self, text):
"""
Expand Down
12 changes: 12 additions & 0 deletions src/pygpt_net/core/attachments.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,18 @@ def clear_all(self):
"""
self.items = {}

def has(self, mode):
"""
Checks id mode has attachments
:param mode: mode
:return: True if exists
"""
if mode not in self.items:
self.items[mode] = {}

return len(self.items[mode]) > 0

def new(self, mode, name=None, path=None, auto_save=True):
"""
Creates new attachment
Expand Down
72 changes: 72 additions & 0 deletions src/pygpt_net/core/camera.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# ================================================== #
# This file is a part of PYGPT package #
# Website: https://pygpt.net #
# GitHub: https://github.com/szczyglis-dev/py-gpt #
# MIT License #
# Created By : Marcin Szczygliński #
# Updated Date: 2023.12.10 16:00:00 #
# ================================================== #

import cv2

from PySide6.QtCore import QObject, Signal


class Camera:
def __init__(self, config=None):
"""
Camera
:param config: config object
"""
self.config = config
self.capture = None
self.current = None


class CameraThread(QObject):
finished = Signal(object)
destroyed = Signal()
started = Signal()
stopped = Signal()

def __init__(self, window=None):
"""
Camera capture thread
"""
super().__init__()
self.window = window
self.initialized = False
self.capture = None
self.frame = None

def setup_camera(self):
"""Initialize camera.
"""
try:
self.capture = cv2.VideoCapture(0)
self.capture.set(cv2.CAP_PROP_FRAME_WIDTH, self.window.config.data['vision.capture.width'])
self.capture.set(cv2.CAP_PROP_FRAME_HEIGHT, self.window.config.data['vision.capture.height'])
except Exception as e:
print("Camera thread exception:", e)
self.finished.emit(e)

def run(self):
try:
if not self.initialized:
self.setup_camera()
self.initialized = True

print("Starting video capture thread....")
while True:
if self.window.is_closing or self.capture is None:
break
_, frame = self.capture.read()
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = cv2.flip(frame, 1)
self.window.controller.camera.frame = frame # update frame
except Exception as e:
print("Camera thread exception:", e)
self.finished.emit(e)
13 changes: 6 additions & 7 deletions src/pygpt_net/core/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,12 @@ def get_prompt(self):
8) Commands are listed one command per line and every command is described with syntax: "<name>": <action>, params: <params>
9) Always use correct command name, e.g. if command name is "sys_exec" then use "sys_exec" and don't imagine other names, like "run" or something.
10) With those commands you are allowed to run external commands and apps in user's system (environment)
11) Do not ask for command execution, just do it.
12) Always use defined syntax to prevent errors
13) Always choose the most appropriate command from list to perform the task, based on the description of the action performed by a given comment
14) Reply to the user in the language in which he started the conversation with you
15) Use ONLY params described in command definition, do NOT use any additional params not described on list
16) ALWAYS remember that any text content must appear at the beginning of your response and commands must only be included at the end.
17) Try to run commands executed in the user's system in the background if running them may prevent receiving a response (e.g. when it is a desktop application)
11) Always use defined syntax to prevent errors
12) Always choose the most appropriate command from list to perform the task, based on the description of the action performed by a given comment
13) Reply to the user in the language in which he started the conversation with you
14) Use ONLY params described in command definition, do NOT use any additional params not described on list
15) ALWAYS remember that any text content must appear at the beginning of your response and commands must only be included at the end.
16) Try to run commands executed in the user's system in the background if running them may prevent receiving a response (e.g. when it is a desktop application)
Commands list:
'''
Expand Down
5 changes: 5 additions & 0 deletions src/pygpt_net/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,5 +599,10 @@ def install(self):
if not os.path.exists(files_dir):
os.mkdir(files_dir)

# create img capture directory
capture_dir = os.path.join(self.path, 'capture')
if not os.path.exists(capture_dir):
os.mkdir(capture_dir)

except Exception as e:
print(e)
8 changes: 8 additions & 0 deletions src/pygpt_net/core/controller/attachment.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,14 @@ def import_from_assistant(self, mode, assistant):
return
self.attachments.from_files(mode, assistant.files)

def has_attachments(self, mode):
"""
Returns True if has attachments
:return: True if has attachments
"""
return self.attachments.has(mode)

def download(self, file_id):
"""
Downloads file
Expand Down
Loading

0 comments on commit d3abf37

Please sign in to comment.