Streamlit UI for redacting personally identifiable information (PII) from emails by proxying requests to a remote vLLM server. The frontend sends each email to the model with a built‑in system prompt that replaces sensitive spans with [redacted] while keeping all other text verbatim.
- Anaconda or Miniconda
- Python 3.11 (installed via Conda)
- (Optional) NVIDIA GPU + CUDA drivers if you want to host your own vLLM inference server
- Hugging Face access token with permission to download the target model
# 1. Clone the repo and enter it
git clone <your fork or repo URL>
cd "C:\Users\johna\OneDrive\Documents\Brainqub3\Inference Engine\inference_engine"
# 2. Create/activate the Conda env (only once)
conda create -n private-inference python=3.11 -y
conda activate private-inference
# 3. Install app dependencies
pip install -r requirements.txt
# 4. Configure environment variables
copy .env.example .env # edit the values to point at your vLLM endpoint + key
# 5. Run the Streamlit UI
streamlit run app/streamlit_app.py --logger.level infoThe Makefile mirrors those steps if you prefer make env, make deps, and make chat (Make sure conda run finds the llm-chat environment or override with ENV_NAME=private-inference make chat).
VLLM_BASE_URL– OpenAI-compatible base URL (e.g.,https://xxxx.proxy.runpod.net/v1)OPENAI_API_KEY– Any non-empty string; vLLM just requires the headerMODEL_ID– Model name known to the vLLM server (e.g.,ibm-granite/granite-4.0-h-1b)
Copy .env.example to .env and edit those values before starting Streamlit.
- Activate the Conda environment:
conda activate private-inference - Optional:
pip install -r requirements.txtif dependencies changed - Launch Streamlit:
streamlit run app/streamlit_app.py --logger.level info - Watch the PowerShell window for logs confirming each request (HTTP 200 / errors)
Use the official RunPod vLLM inference template (preloaded with GPU drivers and vLLM):
- Template link: RunPod vLLM Inference Server
Steps:
- Click the template link and choose an appropriate GPU pod type.
- Provide your
HUGGING_FACE_HUB_TOKENas an environment variable so the model can download. - Set container command/args similar to:
python3 -m vllm.entrypoints.openai.api_server \ --model ibm-granite/granite-4.0-h-1b \ --host 0.0.0.0 \ --port 8000 - Start the pod and wait for the health indicator to turn green.
- Copy the forwarded URL (
https://<pod-id>-8000.proxy.runpod.net/v1) into your local.envVLLM_BASE_URL.
You can also build/push your own container via the provided Dockerfile if you need custom dependencies, then point RunPod at that image instead of the template default.