Streamlit tool for keyword & semantic search over a digitized archival corpus in a low-resource classical language, with Möllendorff transliteration and AI-assisted sentence-level translation.
- BM25 keyword search (script / English / transliteration aware)
- Optional semantic search (small 384-dim embeddings)
- Möllendorff transliteration (š/č/ž + ASCII option)
- Batch translate + CSV export
- Caching for speed & lower translation cost
conda create -n manwen311 python=3.11 -y
conda activate manwen311
python -m pip install -r requirements.txt # or import environment.yml if you prefer
python preprocess.py # builds .cache from your CSV
python -m streamlit run app.py