The presented examples demonstrate how LLM can be utilized for:
- Extracting the brief essence from texts
- Clustering texts into categories based on their content
- Forming descriptions and characteristics of categories
The results obtained can be leveraged by businesses, for instance, to understand the most common inquiries made to customer service centers or technical support by clients and company employees.
GPT 3.5 and GPT 4 were used depending on the volume of texts and the complexity of the task, as well as the final processing cost.
Additionally, on large datasets, KMeans was employed for clustering and RuBERT tiny 2 was used for generating text embeddings.
To get image descriptions from your chat, first, you need to set your OpenAI API key environment variable on your OS. Just run the following script in your command line and specify your API key:
bash setup_openai_key.sh
To retrieve your chat history in Telegram, go to the chat interface, click on the three dots for options at the top right corner, and select "Export chat history". Next, make sure to select "Format": JSON and other necessary parameters as needed. Specify the save path as "Path" to the root of this project, and you will have a similar folder named source with chat data.
Then, you can run qa_extract.py:
python3 qa_extract.py
and the resulting qa.json file will appear in the data folder.