Skip to content

jianzhi-1/LLMaOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMaOS

A voice-controlled operating system that is general-purposed, low-latency, transparent, user-friendly and possesses search and screen-analysis capabilities.

Set Up

conda activate berkos
export OPENAI_API_KEY=<OPENAI_API_KEY>
export MISTRAL_API_KEY=<MISTRAL_API_KEY>
export NVIDIA_API_KEY=<NVIDIA_API_KEY>

Architecture

  1. User interface: This (application) layer is for voice control and is powered by OpenAI's RealTime API. In the RealTime session, user transcripts were decoded and sent to the Assembler for code generation.

  2. Assembler: The purpose of this (operating system-compiler) layer is to generate "assembly-like" instructions for the processor. Such instructions include any non-dangerous UNIX commands, device commands (LEFT_CLICK x y, KEYBOARD string) and the screen-processing command ANALYSIS.

  3. Processor: The processor executes the instructions generated by the assembler. For example, the special instruction ANALYSIS takes a screenshot and uses 3 AI models (NVIDIA's NeVA, Mistral AI's Pixtral, OpenAI's GPT) in parallel (Python's asyncio) to extract information from it. The collated information is fed back to both the assembler and RealTime. This layer of LLMaOS deviates from the traditional computer architecture in the sense that instructions are generated on the fly. For example, ANALYSIS on an image must be done first before determining the x and y arguments for the next LEFT_CLICK instruction. Much like a motherboard, the processor can offload tasks to large models' API endpoints, analogous to specialised hardware accelerators.

Supported Use Cases

(1) "Play me the song Espresso"

  • LLMaOS launches Chrome, enters youtube.com in the URL bar, enters Espresso in the YouTube search bar, clicks the first non-ad entry, clicks Skip Ads, enters fullscreen

(2) "What is the score between Manchester City and Real Madrid?"

  • LLMaOS launches Chrome, enters google.com in the URL bar, enters "Man City vs Real Madrid" in the search bar, analyses the screen, and tells you the score.

(3) "When is the next Codeforces contest?"

  • LLMaOS launches Chrome, enters codeforces.com in the URL bar, analyses the screen, and tells you the time of the next contest.

LLMaOS is voice controlled, transparent (you keep a log of its "assembly-level" instructions and can see what it is doing), and possesses screen-processing capabilities.