The 'Tage der deutschsprachigen Literatur' in Klagenfurt, Austria, is a major literary festival and basically the only one that is being televised on German television. This event consists of readings of 30 minutes by a number of invited writers who are being evaluated by a circle of critics. The culmination of the event is a live voting by the critics to determine which writers receive an award, most notably the Ingeborg-Bachmann-Preis. For several germanspeaking writers this event has been a starting or ending point for their literary career.
This project consists of five parts:
- Scrape data about the event (Wikipedia, official website of Bachmannpreis) using BeautifulSoup
- Use Goodreads API and geopy to acquire additional data
- Data Wrangling with pandas
- Set up database with PostgreSQL
- Sentiment analysis for jury discussion (with spacy and SentiWS)
- NER and POS-tagging for texts using flair
- Topic modeling with NMF and TFIDF
- Feature engineering for prediction within pandas
- Predict winning authors with Random Forest model using AutoSKLearn
- Deploy website using Flask and chart.js
- Live website: http://countbachmann.herokuapp.com
- Deploy chatbot for user interaction (Rasa)