Skip to content

Latest commit

 

History

History
60 lines (45 loc) · 5.56 KB

README.md

File metadata and controls

60 lines (45 loc) · 5.56 KB

vitrivr-VR

This repository contains the source code for vitrivr-VR, a Unity Engine based VR interface for multimedia retrieval using the Cineast retrieval engine.

Speech-to-text Word-Gesture Keyboard Immersive Results View Intuitive Results Exploration
vitrivr-VR-speech-to-text word-gesture-keyboard vitrivr-VR-view-results vitrivr-VR-drawer

Setup

Setup is very easy and should not involve much more than having a working OpenXR runtime and a compatible version of the Unity engine installed. There are a few things to be aware of:

  • Errors on first import: During first import there may be errors because Unity incorrectly loads the different versions of certain libraries included in itself and different packages. Simply close the editor and reopen the project to fix this issue.
  • MapBox: To use the map query formulation method using MapBox, follow the MapBox popup instructions to acquire an API-key. If you do not intend to use the map this step is not required.
  • DeepSpeech: To use the DeepSpeech speech-to-text functionality, follow the instructions on the DeepSpeech UPM repository to download and correctly place the required model file.
  • Whisper: To use the Whisper speech-to-text functionality (better and supporting more languages than DeepSpeech, but slower) download the ggml-tiny model weights file from Hugging Face (download) and place it in the directory Assets/StreamingAssets/Whisper/.

Usage

Configuration

vitrivr-VR relies on an instance of Cineast for feature transformation and retrieval. To configure this connection, create a JSON configuration file at Assets/cineastapi.json. A documentation of the parameters is available in the Cineast Unity Interface package.

Configuration of vitrivr-VR itself is done through Assets/vitrivr-vr.json, which is documented in the respective class.

Interaction

All UI objects (other than the cylindrical results displays) can be grabbed and moved using the grip button on a standard XR controller. Other interactions typically use the trigger on a standard XR controller. To open the settings menu within vitrivr-VR use the menu button (small button above trackpad on VIVE wands) on the left-hand XR controller. To use speech-to-text, have a text-field selected and press and hold the menu button on the right-hand XR controller.

VR Input

Due to the still rapidly evolving landscape of OpenXR plugins, libraries and backends, this project attempts to separate input logic from interaction logic wherever possible.

Currently, the following input and interaction setup is used:

  • Unity OpenXR Plugin for VR input from any OpenXR compliant backend
  • Custom Interaction System for direct interaction consisting of Interactors and Interactables

Contributing

Basic interactions should be implemented with the custom interaction system, conventional 2D UI interactions through the Unity UI.

Raw device input should be implemented using input actions from the new input system.

System Structure & Data Flow

For increased flexibility, vitrivr-VR is structured to allow easy switching of individual components.

Control Flow

At the core of vitrivr-VR is the QueryController, which sends a query to Cineast when the asynchronous function RunQuery is invoked. QueryTermProviders are required to provide the query terms for the query. Once the query results arrive, the QueryController will instantiate the provided type of QueryDisplay with the scored result list. The QueryDisplay will then instantiate the results in the form of MediaItemDisplays. Ultimately, individual MediaItemDisplays should also provide functionality for a detailed media view, but this has not yet been formalized into an interface.

Once a new query is started or the current query should be cleared, the QueryController initiates the required changes in the scene.

Component Responsibilities

  • QueryController: Sends queries to Cineast, instantiates QueryDisplays from query results and manages QueryDisplay instances.
  • QueryTermProvider: Provides the QueryController with query terms and UI components (or the user directly) with methods to specify these query terms.
  • QueryDisplay: Instantiates and arranges MediaItemDisplays in 3D space. May (or should) provide functionality to explore / browse query results.
  • MediaItemDisplay: Displays and allows detailed inspection of a scored media item.