Code for tfidf and topic modelling
The code in this repo was written to allow engineers to explore the impact of parameter selection, such as the number of topics, on the output of tf-idf pipelines. We adapt code from - Topic extraction via Scikit-learn web site using ChatGPT.
Users can run the code with one of two provided .txt files
- An extract of accident reports (documents with multiple sentences)
- Short text work orders
We encourage users to play with parameters in the pipeline specifically n_features, n_topics, n_top_words.
We demonstrate the results of the output of NMF and LDA models using some easy-to-interpret graphics.