Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 493 Bytes

README.md

File metadata and controls

9 lines (8 loc) · 493 Bytes

This is the code that extracts sentences containing the keyword 'thời gian' from news articles. Change the keyword to anything you like.

The methodology of my dissertation:

  1. Sample the news articles from Google
  2. Retrieve the articles by Lancsbox (http://corpora.lancs.ac.uk/lancsbox/)
  3. Use this script to extract the sentences with the keyword (i.e. thoi gian)
  4. Use VNTagger by Le Hong Phuong (https://github.com/scorpion1206/VnTagger) to POS tag the sentences.
  5. Analyse manually