Skip to content

Latest commit

 

History

History
133 lines (94 loc) · 8.21 KB

README.md

File metadata and controls

133 lines (94 loc) · 8.21 KB

A Tchoung té

All Contributors Commitizen friendly

Yemba language meaning association/group in French

The objective of the project is to federate the metadata of all Cameroonian associations in France to make them more accessible to the community.

Functional Context

Presentation video (in French)

If you want to do data analysis, the raw latest database of cameroonian association is accessible here.

We also maintained a public dashboard to visualize associations here

Technical context

If you are here, it means that you are interested in an in-house deployment of the solution. Follow the guide :) !

Prerequisites

  • Create a Sourcegraph account and get credentials to use CodyAI
  • Devspace installed locally
  • Have admin access on a Gogocarto
  • Go through the Gogocarto tutorials
  • Locally install all tools ( init and command scripts from the .gitpod.yml file or use a ready-made development environment on gitpod :

Open in Gitpod

Deployment

Execute filter-cameroon.ipynb et enrich-database.ipynb notebooks :

  pipenv shell
  secretsfoundry run --script 'python filter-cameroon.py'

Finally use the resulting csv file as a data source in Gogocarto and customize it. You can for example define icons by category (social object); ours are in html/icons.

These have been built from these basic icons https://thenounproject.com/behanzin777/kit/favorites/

Update database

  csvdiff ref-rna-real-mars-2022.csv rna-real-mars-2022-new.csv -p 1 --columns 1 --format json | jq '.Additions' > experiments/update-database/diff.csv
  python3 main.py

Start the chatbot

  cd etl/
  secretsfoundry run --script "chainlit run experiments/ui.py"

Deploy the chatbot

 devspace deploy

Evaluation

RAG base evaluation dataset

The list of runs runs.csv has been built by getting all the runs from the beginning using:

export LANGCHAIN_API_KEY=<key>
cd evals/
python3 rag-evals.py save_runs --days 400

Then we use lilac to get the most interesting questions by clustering them per topic/category. "Associations in France" was the one chosen, and we also deleted some rows due to irrelevance.

The clustering repartition is available here: Clustering Repartition

Finally, you just need to do:

export LANGCHAIN_API_KEY=<key>
cd evals/
python3 rag.py ragas_eval tchoung-te --run_ids_file=runs.csv
python3 rag.py deepeval tchoung-te --run_ids_file=runs.csv

RAG offline evaluation

Whenever you change a parameter that can affect RAG, you can execute all inputs present in evals/base_ragas_evaluation.csv using langsmith to track them. Then you just have to get the runs and execute above command. As it's just 27 elements, you will be able to compare results manually.

Backtesting the prompt

 cd etl/
 python3 backtesting_prompt.py

Create the dataset on which you want to test the new prompt on langSmith. Then run the file above to backtest and see the results of the new prompt on the dataset. You would specify in the file the name of the dataset before running

Contributors ✨

Thanks goes to these wonderful people (emoji key):


Ghislain TAKAM

🔣

pdjiela


DimitriTchapmi


GNOKAM

🔣

fabiolatagne97

🔣

hsiebenou

🔣 ⚠️

Flomin TCHAWE

💻 🔣

Bill Metangmo

💻 🔣 🤔 ⚠️

dimitrilexi

🔣

ngnnpgn

🔣

Tchepga Patrick

🔣

This project follows the all-contributors specification. Contributions of any kind welcome!!