Skip to content

Latest commit

 

History

History
21 lines (20 loc) · 692 Bytes

README.md

File metadata and controls

21 lines (20 loc) · 692 Bytes

Text-Analyzer-using-NLTK

Its a python scripts(class task) which contain two part:

the indexing phase:

  • word tokenization
  • line tokenization
  • deleting stop words
  • word racinisation
  • word lemmatisation
  • word labeling

the research phase:

  • getting The list of documents containing a given word
  • getting The number of occurrences of a given word in each returned document
  • getting The weight of a given word in each returned document
  • getting The tf-idf of a given word in each returned document
  • getting The most relevant document for a given word