This LLM chat app is an open-source app developed using Streamlit and Docker. It is powered by various LLM APIs and has extra features to customize the user experience. It supports image and text prompt input and online search citations with URL links. A chat session is persistent in a MySQL database for retrieval and search. A retrieved chat session can be saved to an .HTML file locally or deleted from the database.
The short video below demonstrates some of the features. https://youtu.be/cHsequP0Wsw
Version 2.15 of this APP has upgrade gpt-5.1-2025-11-13 and gpt-5.1-2025-11-13-thinking to gpt-5.2-2025-12-11 and gpt-5.2-2025-12-11-thinking, respectively, which in the former the thinking effort is set to low while in the latter high. GPT-5.2 is better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects.
Version 2.14 of this APP has made the following changes:
- Upgrade both
gpt-5-mini-2025-08-07andgpt-5-mini-2025-08-07-thinkingtogpt-5.1-2025-11-13andgpt-5.1-2025-11-13-thinking, where in the former the reasoning effort is set tolowwhile in the latter tohigh. GPT-5.1 is a smarter, more conversational model that is warmer, more intelligent, and better at following your instructions. It is also faster on simple tasks, more persistent on complex ones. - Upgrade both
gemini-2.0-flashandgemini-2.5-protogemini-3-pro-previewandgemini-3-pro-preview-thinking, where in the former the thinking level is set tolowwhile in the latter tohigh. Gemini-3-Pro is best for multimodal understanding and agentic and vibe coding, delivering richer visualizations and deeper interactivity — all built on a foundation of state-of-the-art reasoning. - Implement streaming mode in both
gpt-5.1-2025-11-13andgpt-5.1-2025-11-13-thinkingmodels. - Add web-search capabillity to the
gemini-3-pro-preview-thinkingmodel. - Upgrade
google-genaipackage to version1.52.0andmistralaito1.9.11.
At present, this APP has six models that support web-search result citations:
gpt-5.2-2025-12-11gpt-5.2-2025-12-11-thinkingclaude-sonnet-4-5-20250929gemini-3-pro-previewgemini-3-pro-preview-thinkingperplexity-sonar-pro
Version 2.13 of this APP has replaced the o3-mini-high reasoning model with the gpt-5-mini-2025-08-07 model with its reasoning effort level set to high (listed as gpt-5-mini-2025-08-07-thinking in the model options).
Version 2.12 of this APP has upgraded the Claude model from claude-sonnet-4-20250514 to claude-sonnet-4-5-20250929, which is the strongest model for coding, building complex agent, and using computers. And it shows substantial gains in reasoning and math.
Version 2.11 of this APP has made the following two changes:
- Upgrade the Gemini model
gemini-2.5-pro-preview-05-06to the more stablegemini-2.5-pro. - Upgrade the Gwen model
Qwen3-235b-a22btoqwen3-235b-a22b-2507, which supports a native 262K context length and does not implement "thinking mode". Compared to its base variant, this version delivers significant gains in knowledge coverage, long-context reasoning, coding benchmarks, and alignment with open-ended tasks.
Version 2.10.1 of this APP improves the display of nested ordered or unordered lists in the saved .HTML file for most models.
Version 2.10 of this APP has made the following two changes:
- Upgrades the OpenAI model from
gpt-4.1-2025-04-14togpt-5-mini-2025-08-07, which is proficient in code generation, bug fixing, and refactoring, instruction following, and tool calling.gpt-5-mini-2025-08-07is a more cost-efficient and faster version of GPT-5 with a larger token limit per minute. The response is in NON-STREAMING mode to potentially prvent organization/identity verification. - Further improve the distinction between inline math vs currency.
Version 2.9.3 of this APP has made the following four improvements:
- Make URL of a web page in the
perplexity-sonar-promodel's resourcs of search results also CLICKABLE in the saved .HTML file. - Add handling of
executable codein thegemini-2.0-flashmodel's response. - Try extracting the actual HTML <title> of a web page and display it as the title of a CLICKABLE link (rather than just using a simplified label) in the
gemini-2.0-flashmodel's resourcs of search results. - Correctly display dollar amounts (e.g., $2.82 and $2.77) in chat session HTML output as well as in the saved .HTML file.
Version 2.9.2 of this APP adds a "Copied!" feedback when the copy button of a code block is clicked.
Version 2.9.1 of this APP improves source citation in gemini-2.0-flash and claude-sonnet-4-20250514 models.
Version 2.9 of this APP has made the following four changes:
- Upgrade the Claude model
claude-3-7-sonnet-20250219toclaude-sonnet-4-20250514, which is a significant upgrade to Claude Sonnet 3.7. It delivers superior coding while responding more precisely to your instructions. - Upgrade the Claude thinking model
claude-3-7-sonnet-20250219-thinkingtoclaude-sonnet-4-20250514-thinking, which returns a summary of Claude’s full thinking process and leverages a more sophisticated reasoning strategy. - Upgrade the Deepseek reasoning model to
DeepSeek-R1-0528, which improves benchmark performance with enhanced front-end capabilities and reduced hallucinations. - Add a spinner when waiting for the response of a thinking model and further improve nested outpur in a .HTML file.
Version 2.8 of this APP has made the following four changes:
- Upgrade the thinking model
gemini-2.5-pro-preview-03-25togemini-2.5-pro-preview-05-06, which marks best-in-class frontend web developmenta and major leap in video understanding with knowledge cutoff in January 2025 ahd a context window of 1M tokens. - Add the web search tool to the model
claude-3-7-sonnet-20250219, which gives Claude direct access to real-time web content, allowing it to answer questions with up-to-date information. - Improve web-search result citation of the model
gemini-2.0-flashby directly extracting the URLs and titles from model response rather than relying on prompt engineering. Same practice is used in the newly added search tool of the modelclaude-3-7-sonnet-20250219. - Improve the rendering of mathematical expressions in the model
gpt-4.1-2025-04-14.
At present, this APP has four models that support web-search result citations:
gpt-4.1-2025-04-14claude-3-7-sonnet-20250219gemini-2.0-flashperplexity-sonar-pro
Version 2.7 of this APP has made the following three changes:
- Improve the rendering of mathematical expressions for all models in chat messages and in saved .HTML files.
- Improve the rendering of nested lists in saved .HTML files.
- Upgrade the model
Qwen2.5-MaxtoQwen3-235b-a22b, which supports prompt-driven seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within a single model. For maximum control, use explicit prompting patterns rather than relying on automatic mode switching. - Improve the mathmatical
At present, this APP has three models that support web-search result citations:
gpt-4o-2024-11-20gemini-2.0-flashperplexity-sonar-pro
Version 2.6 of this APP has made the following two changes:
- Upgrade the model
gpt-4o-2024-11-20togpt-4.1-2025-04-14,which makes significant advancements in coding, instruction following, and long-context processing with a 1-million-token context window. - Upgrade the thinking model
gemini-2.0-flash-thinking-exp-01-21togemini-2.5-pro-preview-03-25, which shows advanced coding capability and is state-of-the-art across a range of benchmarks requiring enhanced reasoning.
At present, this APP has four models with thinking/reasoning capability:
o3-mini-highclaude-3-7-sonnet-20250219-thinkinggemini-2.5-pro-preview-03-25DeepSeek-R1
Version 2.5 of this APP has made the following changes:
- Add web search capability to the OpenAI model
gpt-4o-2024-11-20. Thesearch_context_sizeis set tohighto provide the most comprehensive context retrieved from the web to help the tool formulate a response. Need to upgrade the openai package to enable this feature. - Change the prompts to the model
gemini-2.0-flashandperplexity-sonar-proto improve the format of web-search result citation. - Add a MIT license file.
At present, this APP has three models that support web-search result citations:
gpt-4o-2024-11-20gemini-2.0-flashperplexity-sonar-pro
Version 2.4 of this APP has made the following changes:
-
Add the OpenAI reasoning model
o3-mini-high, which is hosted by OpenRouter.ai. With its reasoning effort set to high, this model delivers exceptional STEM capabilities - with particular strength in science, math, and coding. -
Add the Anthropic thinking model
claude-3-7-sonnet-20250219-thinking. Extended thinking gives Claude 3.7 Sonnet enhanced reasoning capabilities for complex tasks, while also providing transparency into its step-by-step thought process. The maximum token limit for this model is 20,000, with a thinking token budget of 16,000. -
Upgrade the Antropic model
claude-3-5-sonnet-20241022toclaude-3-7-sonnet-20250219, which shows particularly strong improvements in coding and front-end web development. -
Upgrade the Alibaba model
Qwen2.5-Coder-32B-InstructtoQwen2.5-Max, a large-scale Mixture-of-Expert (MoE) model that outperforms DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond. This model is hosted by OpenRouter.ai.
In version 2.4 of this APP, you can use the following 6 multimodal LLM models with both image and text input:
gpt-4o-2024-11-20claude-3-7-sonnet-20250219claude-3-7-sonnet-20250219-thinkingpixtral-large-latestgemini-2.0-flashgemini-2.0-flash-thinking-exp-01-21
Version 2.3 of this APP has made the following changes:
- Add the reasoning model
Deepseek-R1, which is hosted by Together.ai. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. - Upgrade the
gemini-2.0-flash-expto the latest production readygemini-2.0-flash, which delivers next-gen features and improved capabilities, including superior speed, native tool use, multimodal generation, and a 1M token context window. - upgrade the
gemini-2.0-flash-thinking-expto the latestgemini-2.0-flash-thinking-exp-01-21, which delivers enhanced abilities across math, science, and multimodal reasoning. - Upgrde the
perplexity-llama-3.1-sonar-huge-128k-onlinetoperplexity-sonar-pro, which offers premier search offering with search grounding that supports advanced queries and follow-ups. The legacy modelllama-3.1-sonar-huge-128k-onlinewill be deprecated and will no longer be available to use after 2/22/2025. - Increase the default max number of tokens a model can generate from 4000 to 6000.
Version 2.2 of this APP has made one change:
- Show the model name of an assistant when saving a chat session to an .HTML file.
Version 2.1.1 of this APP has added the capability to prompt the following 5 multimodal LLM models with both image and text:
gpt-4o-2024-11-20claude-3-5-sonnet-20241022pixtral-large-latestgemini-2.0-flash-expgemini-2.0-flash-thinking-exp
- To include an image in a prompt, follow the steps below:
- Click the
From Clipboardbutton in the left pane to show theClick to Paste from Clipboardbutton in the central pane. - Use the screen captioning tool of your computer to capture an image from your screen.
- Click the
Click to Paste from Clipboardbutton in the central pane to paste the image into the chat window (after browser permission is granted). This function is tested in Chrome and Edge. - Type your question and click the
Sendbutton to submit the question.
-
A session that contains both image and text can be saved to a local .HTML file (after first loading the session) by clicking the
Save it to a .html file. If this APP is run in thepersonal_chatgptfolder by typping the commandstreamlit run personal_chatgpt.py, the associated images will be saved to a newly created folderimagesin thepersonal_chatgptfolder. If this APP is run in Docker, the images will be saved to theDownloadsfolder of your computer. -
To get the summary title of a session, an image is first sent to the
pixtral-large-latestmodel (as an OCR model) to extract its text content. Starting from this version the free API of OCRSpace'sOCR engine2is no longer used as the OCR model. -
At present, this APP has two models that support web-search result citations:
gemini-2.0-flash-expperplexity-llama-3.1-sonar-huge-128k-online
Version 1.10.0 of this APP has made two changes:
- Add a new model
gemini-2.0-flash-thinking-expthat's trained to generate the "thinking process" the model goes through as part of its response. As a result, Thinking Mode is capable of stronger reasoning capabilities in its responses than the Gemini 2.0 Flash Experimental model. This model currenlty does not support tool usage like Google Search (from the Gemini 2.0 Flash web site). - Prompt the
gemini-2.0-flash-expmodel to provide web-link citations of its Google Search results. Citation format is similar to that from theperplexity-llama-3.1-sonar-huge-128k-onlinemodel.
Version 1.9.0 of this APP has made one change:
- Leverage the
GoogleSearchtool ofgemini-2.0-flash-expto improve the accuracy and recency of responses from the model. The model can decide when to use Google Search. Install a newgoogle-genaipackage from Pypi.
Version 1.8.0 of this APP has made one change:
- Change the Gemini model to
gemini-2.0-flash-expthat delivers improvements to multimodal understanding, coding, complex instruction following, and function calling.
Version 1.7.0 of this APP has made two changes:
- Change the OpenAI model to
gpt-4o-2024-11-20with better creative writing ability and better at working with uploaded files, providing deeper insights & more thorough responses (from an OpenAI X post). - Change the Gemini model to
gemini-exp-1121with significant gains on coding performance, stronger reasoning capabilities and improved visual understanding (from a Google AI X post). This model currently does not support Grounding with Google Search (as of Nov 28, 2024).
Version 1.6.0 of this APP has made one change:
- Add a new model
Qwen2.5-Coder-32B-Instruct. This 32B model is developed by Alibaba Cloud and is available at together.
Version 1.5.0 of this APP has made one change:
- Add a new model
llama-3.1-nemotron-70b-instructfrom Nvidia. A free API key is avaiiable at https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-instruct by clicking "Build with this NIM".
Version 1.4.0 of this APP has made two changes:
- Change the perplexity model to
llama-3.1-sonar-huge-128k-onlinewith the features of source citation and clickable URL links. - Change the Gemini model to
gemini-1.5-pro-002with the feature of Grounding with Google Search.
This version currently also has the following features:
- Switch between the LLM model of
claude-3-5-sonnet-20241022andmistral-large-latestanytime in a chat session. - Extract text from a screenshot image. This feature uses the Streamlit component "streamlit-paste-button" to paste an image from the clipboard after user consent (tested on Google Chrome and Microsoft Edge). Then, the image is sent to the free API of OCRSpace's OCR engine2 to extract the text with automatic Western language detection.
- Extract the folder structure and file contents of a .zip file.
- Show the name of a model in the response of a LLM API call.
- Select the behavior of your model as either deterministic, conservative, balanced, diverse, or creative.
- Select the maximum number of tokens the model creates for each API call.
- Select a date range and a previous chat session in that range (by choosing from a session summary) and reload the messages of this chat session from a local MySQL database.
- Search a previous chat session using the MySQL full-text boolean search with keywords to reload a chat session.
- Save the messages of a session as an HTML file on the local computer.
- Upload a file (or multiple files) from the local computer with a question (optional) and send it to an API call.
- Delete the messages of a loaded chat session from the database, and
- Finally, delete the contents in all tables in the database, if that is what you want to do.
To run this app, create a secrets.toml file under the directory .streamlit in the project folder Personal-ChatGPT and enter your LLM API keys and MySQL database credentials as shown in the code block below. The database credentials can be left as is to match those in the Docker compose.py file in the project folder.
# .streamlit/secrets.toml
OPENAI_API_KEY = "my_openai_key"
GOOGLE_API_KEY = "my_gemini_key"
MISTRAL_API_KEY = "my_mistral_key"
ANTHROPIC_API_KEY = "my_claude_key"
TOGETHER_API_KEY = "my_together_key"
PERPLEXITY_API_KEY = "my_perplexity_key"
NVIDIA_API_KEY = "my_nvidia_key"
OPENROUTER_API_KEY = "my_openrouter_key"
[mysql]
host = "mysql"
user = "root"
password = "my_password"
database = "chat"
If you have a MySQL database on your computer, another way to run this app without Docker is to cd into the personal-chatgpt directory, where there is a separate .streamlit folder, and create a secrets.toml file in this .streamlit folder. Put in the password of your local MySQL database rather than the "my_password" as shown in the previous code block. Remember first to create a database with the name chat. The app can be run by typing the following command in the sub-directory personal-chatgpt in your virtual environment.
streamlit run personal_chatgpt.py
To clone the GitHub directory type the command as follows.
git clone https://github.com/tonypeng1/Personal-ChatGPT.git
To create a Python virtual environment, check out version 2.15 of this APP, and install the project,
cd Personal-ChatGPT
python3 -m venv .venv
source .venv/bin/activate
git checkout v2.15
python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install -e .
To create and run a Docker image, type the following commands in the project directory Personal-ChatGPT where there is a file called Dockerfile.
docker build -t streamlit-mysql:2.15 .
docker compose up
More descriptions of this APP can be found in the Medium article, https://medium.com/@tony3t3t/personal-llm-chat-app-using-streamlit-e3996312b744