Hyperpartisan-News-Detection

This repository is for the paper in SemEval 2019 Task 4: Hyperpartisan News Detection (Hyperpartisan News Detection by de-noising weakly-labeled data)

This code has been written using PyTorch >= 0.4.1. If you find our idea or the resources in this repository very useful, please cite the following paper. The bibtex is listed below:

@inproceedings{lee2019team,
  title={Team yeon-zi at SemEval-2019 Task 4: Hyperpartisan News Detection by De-noising Weakly-labeled Data},
  author={Lee, Nayeon and Liu, Zihan and Fung, Pascale},
  booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation},
  pages={1052--1056},
  year={2019}
}

Abstract

This paper describes our system that has been submitted to SemEval-2019 Task 4: Hyperpartisan News Detection. We focus on removing the noise inherent in the hyperpartisanship dataset from both data-level and model-level by leveraging semi-supervised pseudo-labels and the state-of-the-art BERT model. Our model achieves 75.8% accuracy in the final by-article dataset without ensemble learning.

Model Architecture

Getting Started

The following scripts describe how to train and test our model. This repository also contains character based feature and url based feature for further research.

Fine-tune BERT Language Model

We fine-tune BERT language model on the large amount of hyperpartisan news dataset.

First, process hyperpartisan news dataset.

python process_data_for_bert_training.py

Second, use processed hyperpartisan news articles to train BERT language model.

(run_lm_finetuning.py comes from https://github.com/huggingface/pytorch-pretrained-BERT)

python run_lm_finetuning.py --train_file=data_new/article_corpus.txt --output_dir=bert_model --bert_model=bert-base-uncased --do_train --on_memory

Train our model

The following scripts describe the two steps shown in the architecture.

Step1: Train BERT + Classifier for denoising

Use by-article data to train Classifier (BERT LM model is freezed) for denoising by-publisher data

python main --do_train --use_bert --batch_size=16

Step2: Train BERT + LSTM + Classifier by denoised by-publisher data

python main.py --do_train --train_cleaner_dataset --hidden_dim=300 --hidden_dim_tit=100 --batch_size=16 --weight_decay=1e-6

Test our model

Test model on by-article data

python main.py --do_eval_bert_plus_lstm --train_cleaner_dataset --hidden_dim=300 --hidden_dim_tit=100 --batch_size=16

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
models		models
plot		plot
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.png		architecture.png
baseline.py		baseline.py
main.py		main.py
process_data_for_bert_training.py		process_data_for_bert_training.py
run_lm_finetuning.py		run_lm_finetuning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyperpartisan-News-Detection

Abstract

Model Architecture

Getting Started

Fine-tune BERT Language Model

Train our model

Step1: Train BERT + Classifier for denoising

Step2: Train BERT + LSTM + Classifier by denoised by-publisher data

Test our model

Test model on by-article data

About

Releases

Packages

Languages

License

HLTCHKUST/hyperpartisan-news-detection

Folders and files

Latest commit

History

Repository files navigation

Hyperpartisan-News-Detection

Abstract

Model Architecture

Getting Started

Fine-tune BERT Language Model

Train our model

Step1: Train BERT + Classifier for denoising

Step2: Train BERT + LSTM + Classifier by denoised by-publisher data

Test our model

Test model on by-article data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages