tfidf

Code for tfidf and topic modelling

The code in this repo was written to allow engineers to explore the impact of parameter selection, such as the number of topics, on the output of tf-idf pipelines. We adapt code from - Topic extraction via Scikit-learn web site using ChatGPT.

Users can run the code with one of two provided .txt files

An extract of accident reports (documents with multiple sentences)
Short text work orders

We encourage users to play with parameters in the pipeline specifically n_features, n_topics, n_top_words.

We demonstrate the results of the output of NMF and LDA models using some easy-to-interpret graphics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

tfidf

Files

README.md

Latest commit

History

README.md

File metadata and controls

tfidf