Welcome to the repository of the BUGA 23 Chatbot Interaction Dataset. This dataset comprises conversations recorded from March to September during the Bundesgartenschau (BUGA) 2023 event in Mannheim, Germany. It features 4.423 interactions between visitors and a virtual medical chatbot presented through an avatar within a telephone booth setup.
The dataset includes:
- CSV Format Files: Each conversation is structured in CSV format, detailing the dialogue between the chatbot and users.
- Timeframe: Data was collected from March to September 2023. Focus on Medical Queries: The chatbot was primarily prompted for medical questions, offering insights into public health-related inquiries and responses.
- Languages: The users had the coice to lead their conversation in German, English or Spanish. The languge is marked at the end of the file name.
The data was collected in a controlled environment where participants interacted with a virtual doctor-avatar through a specially designed telephone booth. This unique setup provided a semi-private space, encouraging more open and detailed conversations.
Anonymization: All personal information has been removed or anonymized to protect the privacy of the participants. Consent: Participants were informed about the data collection, and consent was obtained prior to the interactions. Compliance: The dataset adheres to applicable privacy and data protection laws.
This dataset is invaluable for research in fields such as:
Natural Language Processing (NLP) Human-Computer Interaction (HCI) Medical Informatics Conversational AI Analysis
Clone the Repository: git clone https://github.com/fabianslife/BUGA23_Chatbot_Dataset.git Navigate to the Data Folder. Data Analysis: Import the CSV files into your preferred data analysis tool. Contributing
We welcome contributions to enhance the dataset's quality and documentation. Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.
We used microsoft Azure to detect spoken languge. Unfortunatly this frequently lead to missunderstanding of cutoffs of the spoken user dialogue. We are currently working on recreating parts of this dataset that might be unstructured.
This project is licensed under the MIT License
Bundesgartenschau (BUGA) 2023 Mannheim Participants and volunteers Insitute for AI in Medicine of the Uiniversity Hospital Gießen and Marburg (UKGM)