tfidf

Code for tfidf and topic modelling

The code in this repo was written to allow engineers to explore the impact of parameter selection, such as the number of topics, on the output of tf-idf pipelines. We adapt code from - Topic extraction via Scikit-learn web site using ChatGPT.

Users can run the code with one of two provided .txt files

An extract of accident reports (documents with multiple sentences)
Short text work orders

We encourage users to play with parameters in the pipeline specifically n_features, n_topics, n_top_words.

We demonstrate the results of the output of NMF and LDA models using some easy-to-interpret graphics.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_safety_records.txt		example_safety_records.txt
shorttext.txt		shorttext.txt
tfidf.ipynb		tfidf.ipynb
wrangling.ipynb		wrangling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tfidf

About

Releases

Packages

Languages

License

nlp-tlp/tfidf

Folders and files

Latest commit

History

Repository files navigation

tfidf

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages