Merge branch 'main' into vector-database

milistu · May 27, 2024 · e0cbd6c · e0cbd6c
2 parents 2989a18 + e53e3be
commit e0cbd6c
Show file tree

Hide file tree

Showing 10 changed files with 218 additions and 290 deletions.
diff --git a/.github/workflows/tests.yaml b/.github/workflows/tests.yaml
@@ -6,7 +6,7 @@ on:
       - main
 
 jobs:
-  changed_files:
+  test_router:
     runs-on: ubuntu-latest  # windows-latest || macos-latest
     name: Test Router
     steps:
@@ -19,28 +19,42 @@ jobs:
         with:
           files: router/**
           # files_ignore: docs/static.js
+
+      - name: Install Dependencies
+        if: steps.changed-files-router.outputs.any_changed == 'true'
+        run: pip install -r requirements.txt
 
-      - name: Run step if any file(s) in the router folder change
+      - name: Run Tests if any file(s) in the router folder change
         if: steps.changed-files-router.outputs.any_changed == 'true'
         env:
           ALL_CHANGED_FILES: ${{ steps.changed-files-router.outputs.all_changed_files }}
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-        run: |
-          pip install -r requirements.txt 
-          python -m unittest tests/test_router.py
-        
+          LANGFUSE_SECRET_KEY: ${{ secrets.LANGFUSE_SECRET_KEY }}
+          LANGFUSE_PUBLIC_KEY: ${{ secrets.LANGFUSE_PUBLIC_KEY }}
+          LANGFUSE_HOST: ${{ secrets.LANGFUSE_HOST }}
+        run: python -m unittest tests/test_router.py
+
+  test_database:
+    runs-on: ubuntu-latest  # windows-latest || macos-latest
+    name: Test Database
+    steps:
+      - uses: actions/checkout@v4
       # Test Database
       - name: Get changed files in the database folder
         id: changed-files-database
         uses: tj-actions/changed-files@v44
         with:
           files: database/**
+
+      - name: Install Dependencies
+        if: steps.changed-files-database.outputs.any_changed == 'true'
+        run: pip install -r requirements.txt
 
-      - name: Run step if any file(s) in the database folder change
+      - name: Run Tests if any file(s) in the database folder change
         if: steps.changed-files-database.outputs.any_changed == 'true'
         env:
           ALL_CHANGED_FILES: ${{ steps.changed-files-database.outputs.all_changed_files }}
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-        run: |
-          pip install -r requirements.txt 
-          python -m unittest tests/test_database.py
+          QDRANT_API_KEY: ${{ secrets.QDRANT_API_KEY }}
+          QDRANT_CLUSTER_URL: ${{ secrets.QDRANT_CLUSTER_URL }}
+        run: python -m unittest tests/test_database.py
diff --git a/README.md b/README.md
@@ -1,4 +1,10 @@
-# Legal ChatBot Documentation
+# Legal ChatBot 👩‍⚖️
+
+Legal ChatBot is an innovative project designed to assist users in navigating the complex world of legal documents. 
+
+Utilizing a combination of RAG (Retrieval-Augmented Generation) technology and a deep knowledge base of law articles, this bot can intelligently reference relevant legal texts during interactions. It offers an interactive platform for querying legal information, making it a valuable tool for professionals, students, and anyone needing quick insights into legal matters. 
+
+Setup involves **Poetry** for dependency management, **Qdrant** for vector database functionality, and **Langfuse** for enhancing chatbot performance, ensuring a robust and efficient user experience.
 
 ## Setting Up the Project
 

diff --git a/app.py b/app.py
@@ -56,6 +56,7 @@ def response_generator(query: str):
     # Rout query
     collections = semantic_query_router(
         client=openai_client,
+        model=config["openai"]["gpt_model"]["router"],
         query=query,
         prompt=ROUTER_PROMPT,
         temperature=config["openai"]["gpt_model"]["temperature"],
@@ -82,7 +83,7 @@ def response_generator(query: str):
 
     stream = get_answer(
         client=openai_client,
-        model=config["openai"]["gpt_model"]["name"],
+        model=config["openai"]["gpt_model"]["llm"],
         temperature=config["openai"]["gpt_model"]["temperature"],
         messages=get_messages(
             context=context, query=query, conversation=st.session_state.messages

diff --git a/chat-dev.ipynb b/chat-dev.ipynb
@@ -225,7 +225,7 @@
    "outputs": [],
    "source": [
     "response = openai_client.chat.completions.create(\n",
-    "    model=config[\"openai\"][\"gpt_model\"][\"name_light\"],\n",
+    "    model=config[\"openai\"][\"gpt_model\"][\"router\"],\n",
     "    temperature=config[\"openai\"][\"gpt_model\"][\"temperature\"],\n",
     "    messages=messages,\n",
     ")"

diff --git a/llm/prompts.py b/llm/prompts.py
@@ -17,6 +17,7 @@
 Tvoj zadatak je da identifikuješ potrebe klijenta i na osnovu toga pružite najrelevantnije informacije. 
 Kada pružaš odgovore ili savete, naglasiti iz kojeg tačno pravnog člana dolazi informacija i obavezno obezbedi link ka tom članu kako bi klijent mogao dodatno da se informiše. 
 Cilj je da komunikacija bude efikasna i da klijent oseti da je u dobrim rukama.
+Korisnik može da postavi pitanje na bilo kom jeziku i tvoj zadatak je da na pitanje odgovriš na istom jeziku kao i pitanje korisnika.
 
 Format odgovora:
 - Ispod naslova **Sažetak** prvo odgovori kratko i direktno na pitanje klijenta koristeći laičke izraze bez složene pravne terminologije.

diff --git a/router/router_prompt.py b/router/router_prompt.py
@@ -11,15 +11,17 @@
  - Zakon o zaštiti potrošača osigurava da potrošači u Srbiji imaju prava na sigurnost i kvalitet proizvoda i usluga. Zakon propisuje obaveze trgovaca u pogledu pravilnog informisanja potrošača o proizvodima, uslugama, cenama i pravu na reklamaciju. Takođe, uključuje prava potrošača na odustanak od kupovine unutar određenog roka i prava u slučaju neispravnosti proizvoda. 
 - porodicni_zakon
  - Porodični zakon reguliše pravne odnose unutar porodice, uključujući brak, roditeljstvo, starateljstvo, hraniteljstvo i usvojenje. Zakon definiše prava i obaveze bračnih partnera, kao i prava dece i roditeljske odgovornosti. Takođe se bavi pitanjima nasleđivanja i alimentacije. 
+- nema_zakona
+ - Korisnikovo pitanje ne odgovara ni jednom zakonu.
 
 **FORMAT ODGOVORA:**
 - Odgovor vratiti u JSON formatu.
 - Odgovor treba da sadrzi samo JSON output, bez dodataka.
 - Odgovor mora da bude string koji moze da se ucita uz pmoc komande json.loads().
-- Imena zakona mogu biti samo sledeca: zakon_o_radu, zakon_o_porezu_na_dohodak_gradjana, zakon_o_zastiti_podataka_o_licnosti, zakon_o_zastiti_potrosaca, porodicni_zakon.
+- Imena zakona mogu biti samo sledeca: zakon_o_radu, zakon_o_porezu_na_dohodak_gradjana, zakon_o_zastiti_podataka_o_licnosti, zakon_o_zastiti_potrosaca, porodicni_zakon, nema_zakona.
 - Jedno pitanje korisnika moze da se odnosi na vise zakona.
 - Ukoliko mislis da zakon odgovara korisnikovom pitanju ali nisi 100% siguran onda ga svakako stavi u odgovor.
-- Ukoliko korisnikovo pitanje ne odgovara ni jednom zakonu vrati genericki string: "nema_zakona".
+- Ukoliko korisnikovo pitanje ne odgovara ni jednom zakonu vrati listu sa generickim stringom: ["nema_zakona"].
 - Zakone uvek moras vracati kao listu stringova bez obzira da li ih je 1 ili vise.
 - Primer JSON odgovora:
 

diff --git a/scraper/README.md b/scraper/README.md
@@ -0,0 +1,31 @@
+# Scraper
+
+This script scrapes law articles from a list of URLs and saves them as JSON files.
+
+## Usage
+
+To run the script, use the following command:
+
+```bash
+python scraper/scraper.py --file scraper/urls.txt --output-dir laws_test
+```
+
+## Arguments
+- `--url`: A single URL to scrape.
+- `--file`: Path to a text file containing URLs separated by newlines.
+- `--output-dir`: Directory to save the JSON files (default is scraper/laws).
+
+## Example
+To scrape law articles from a single URL (example: Serbian Labor Law) and save the output in the `scraper/laws` directory:
+```bash
+python scraper/scraper.py --url "https://www.paragraf.rs/propisi/zakon_o_radu.html" --output-dir scraper/laws
+```
+
+To scrape law articles from a list of URLs in urls.txt and save the output in the `scraper/laws` directory:
+```bash
+python scraper/scraper.py --file scraper/urls.txt --output-dir scraper/laws
+```
+> ⚠️ _**Note**: Ensure you are in the root directory of the project before running the script._
+
+## Output
+The output JSON files will be saved in the specified output directory, with each file named after the corresponding URL's stem.