This is the code that extracts sentences containing the keyword 'thời gian' from news articles. Change the keyword to anything you like.
The methodology of my dissertation:
- Sample the news articles from Google
- Retrieve the articles by Lancsbox (http://corpora.lancs.ac.uk/lancsbox/)
- Use this script to extract the sentences with the keyword (i.e. thoi gian)
- Use VNTagger by Le Hong Phuong (https://github.com/scorpion1206/VnTagger) to POS tag the sentences.
- Analyse manually