Maya is a virtual assistant chatbot built to provide assistance to users of the Child Protection Information Management Service (CPIMS). It is designed to provide users with quick and accurate responses to common queries and questions related to child protection.
- Oriel Kiplangat - kiplangatoriel@gmail.com
- Gilks Moseti - gilksmoseti@gmail.com
- Aloys Aboge Jr - junioraboge@gmail.com
- Bildad Moses Okoth - bildadmoses8@gmail.com
The client in a CPIMS user support virtual assistant project is likely an organization working with children and using CPIMS. The client needs an automated solution to handle repetitive user support requests, which are time-consuming for their service desk staff. The virtual assistant should be user-friendly, accessible, available 24/7, handle a variety of requests, and provide fast and accurate answers. Input and feedback from CPIMS users will be critical to ensure the virtual assistant meets the client's needs.
We identified client needs by gathering data from a whatsapp chat that contained inputs and feedback from CPIMS users. The inputs had similar questions from CPIMS users making the help desk repeat the same answers overtime. We ensured the virtual assistant met the client needs by aiding in answering the common questions.
Data was collected from exported whatsapp chats from the following whatsapp groups
• Nairobi County CPIMS
• Msa CCI’s & DCS CPMIS WhatsApp Chat with CPIMS WAJIR TEAM
• WhatsApp Chat with INSTITUTIONS CPIMS GROUP
• WhatsApp Chat with BUNGOMA CPIMS GROUP
The data acquisition was quite simple as the dataset was readily provided by the CPIMS team.
The cleaned and prepared data was analyzed to identify● patterns
● Trends
● insights
that can help inform the development of the virtual assistant.
This analysis could include techniques such as:-
● data visualization
● Clustering
● natural language processing (NLP)
to identify common themes and patterns in user queries.
The data was cleaned and prepared for analysis to ensure that it is consistent, accurate, and complete. It involved
The data for the chatbot was sourced from an exported WhatsApp chat, which was then sorted into an Excel spreadsheet using Pandas. After removing duplicates, the data was divided among four team members for cleaning. Each member was responsible for cleaning a portion of the data, and the resulting clean data was then sorted into queries and responses.
❖ removing duplicates
❖ fixing errors
❖ standardizing data formats.
- We used a cleaning tool called pushbullet which converted the whatsapp.txt file to excel format
- We removed text messages with empty cells and irrelevant data e.g , “This message was deleted” which were repeated multiple times.
- We then read the 4,321 rows of data as we filtered the most frequent questions
- We then created new data sets of the questions with their respective answers.
This provided us with a clear view of the question that was repeatedly asked and the ones which weren’t. Afterwards we converted the excel folder to JSON which we used for the subsequent processes e.g model training.
1.The Whatsapp txt file contained alot of dirty data initially. containing date,time,username and messages.
2.We specified each column with their respective tags.3.We trimmed the irrelevant columns we didn't need and remained with the messages column
4.removing duplicates, messages with empty cells and irrelevant data e.g , “This message was deleted” which were repeated multiple times. This method of standardizing helped to reduce the number of data to a number that can be handled easily
5.We then divided our datasets into categories based on the same queries asked and the consequetive responses from te admin
- We converted our data format to JSON data form, which is an easier form needed to feed our model, Maya
We employed discretization as we took the messy data that we had, which was pure text and emojis,emoticons alike and created two separate features possessing similar characteristics i.e. queries and responses which we again filtered based on what was relevant to us. We transformed our raw data into features that were used to build a machine learning mode in the following steps
I. Data selection- we selected the most relevant questions from our cleaned dataset. The messages included queries and possible responses.
II. Feature Extraction- We extracted the queries from the dataset depending on the nature of the queries. We then transformed the extracted queries into Json format that could be used to train the machine learning model.
III. Feature Engineering- Based on the insights gained from data exploration and analysis, we created a data model for the virtual assistant that includes a list of common user queries and the appropriate responses. This model would be designed using machine learning (ML) and NLP algorithms to enable the virtual assistant to recognize and respond to user queries accurately and efficiently.
III. Feature Selection- The queries were selected and grouped based on how they related with each other. We finally had categories such as greetings, resetting passwords etc.
IV. Training the Model- The data model was trained and tested using historical data and simulated user queries to ensure that it accurately recognizes and responds to common user queries. The model would need to be continually refined and improved based on feedback from user interactions with the virtual assistant.
V. Model Evaluation- After training the model using the labeled dataset , it achieved an accuracy of 94.5%, which indicates that it is able to correctly classify support requests and problems for the users
VI. Model Deployment-
- Python
- Pandas
- TensorFlow
- Flask
- Tailwind CSS
The clean data was used to train the chatbot using TensorFlow. The resulting model was then integrated into the web-based chatbot using FastAPI and Flask.
The user interface for the chatbot was designed using Tailwind CSS, a utility-first CSS framework. The interface is simple and easy to use, with a chat window for entering queries and receiving responses from the chatbot.
After forking the repository to your account:
- Clone the project from Github:
git clone https://github.com/your-username/repository_name.git
- Navigate to the project directory:
cd repository_name
- Navigate to backend directory
cd backend
- Install the required dependencies:
pip install -r requirements.txt
- Run the backend application:
python3 app.py
The Flask server will be running on:
http://localhost:5000/api
Feel free to test the server using PostMan or the frontend application.
- Using a POST request, set the url to http://localhost:5000/api
- Set the data type to raw.
- Set the body to JSON and enter the below object:
{ "text": "hello" }
- Await response from server
To use the chatbot, simply type a question or query into the chat window and hit "Enter" or click the "Send" button. The chatbot will respond with a relevant answer or suggestion.
- A lot of the data provided was similar in terms of the questions asked and so was the answers given e.g. resetting the password. This greatly reduced the variability of questions and responses we could feed our chatbot.
- The data we were provided with proved to be real difficult to work with as the format (exported WhatsApp chat) needed a lot of time to clean.
- Conversion of the file format to suitable format e.g. txt to xlsx to csv to json.
- Transferring the output of the bot to the web needs alot of time and proper implementation.
- Deployment costs given the use of Tensorflow which contributed to the difficulty. Alternative is to pay for hosting.