GitHub - HyunLee8/FakeNewsDetector: A deep learning-based binary classifier that detects fake news articles using Long Short-Term Memory (LSTM) networks with word embeddings.

Model Architecture

Network Structure:

Embedding Layer: Converts tokenized text into dense vectors (40 dimensions)
Stacked LSTM Layers:
- First LSTM: 100 units with sequence return for temporal feature extraction
- Second LSTM: 100 units for final sequence encoding
Dropout Regularization: 30% dropout to prevent overfitting
Output Layer: Sigmoid activation for binary classification

Training Configuration:

Loss Function: Binary cross-entropy
Optimizer: Adam
Metrics: Accuracy

Key Features

1. Sequence Padding

Normalizes variable-length text inputs to fixed max_length for batch processing.

2. Threshold Optimization

Implements custom threshold tuning to maximize classification accuracy beyond default 0.5 cutoff:

def best_threshold_value(model, thresholds, X_test, y_test):
    # Tests multiple classification thresholds
    # Returns accuracy metrics for each threshold
    # Enables optimal decision boundary selection

3. Model Callbacks

EarlyStopping: Prevents overfitting by monitoring validation performance
ModelCheckpoint: Saves best model weights during training

Technical Specifications

Embedding Dimension: 40 features per word
LSTM Hidden Units: 100 per layer
Dropout Rate: 0.3
Vocabulary Size: Variable (based on dataset)
Sequence Length: Variable (padded to max_length)

Installation

pip install tensorflow scikit-learn pandas numpy

Usage

# Build the model
model = build_model(vocab_size=10000, max_length=500)

# Train with callbacks
model.fit(X_train, y_train, 
          validation_data=(X_val, y_val),
          epochs=20,
          callbacks=[EarlyStopping(patience=3), 
                     ModelCheckpoint('best_model.h5')])

# Optimize classification threshold
thresholds = np.arange(0.3, 0.8, 0.05)
results = best_threshold_value(model, thresholds, X_test, y_test)

Model Performance

The threshold optimization function enables fine-tuning of the decision boundary, typically improving accuracy over the default 0.5 threshold by accounting for class imbalance or misclassification costs.

Future Improvements

Implement attention mechanisms for better interpretability
Add bidirectional LSTM layers for context from both directions
Experiment with pre-trained embeddings (GloVe, Word2Vec)
Incorporate metadata features (source, publication date)

Dependencies

TensorFlow/Keras
NumPy
Pandas
Scikit-learn

Note: This implementation focuses on binary classification (real vs. fake) and can be extended for multi-class fake news categorization.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
documents		documents
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Architecture

Key Features

1. Sequence Padding

2. Threshold Optimization

3. Model Callbacks

Technical Specifications

Installation

Usage

Model Performance

Future Improvements

Dependencies

About

Uh oh!

Releases

Packages

Languages

HyunLee8/FakeNewsDetector

Folders and files

Latest commit

History

Repository files navigation

Model Architecture

Key Features

1. Sequence Padding

2. Threshold Optimization

3. Model Callbacks

Technical Specifications

Installation

Usage

Model Performance

Future Improvements

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages