This project demonstrates a binary sarcasm classifier for news headlines using TensorFlow and Keras. The repository contains three different model implementations:
- Basic Embedding with Global Average Pooling - A simple and efficient model for baseline performance
- Bidirectional LSTM - An advanced model architecture that captures sequence context in both directions
- 1D Convolutional Neural Network - A model that extracts local patterns and features from text
All models process raw text headlines, convert them into numerical sequences using text vectorization, and predict whether a headline is sarcastic 😏 or not sarcastic 📰.
- Overview
- Features
- Dataset
- Model Architectures
- Getting Started
- Code Structure
- Results and Performance Comparison
- TF.Data Pipeline Optimization
- Key Insights
- TensorFlow Embedding Projector
- Future Work
- Acknowledgements
- Contact
- Source: News Headlines Dataset for Sarcasm Detection
- Description: News headlines labeled for sarcasm detection
- Format: JSON file with headlines and binary labels (
0 = not sarcastic
,1 = sarcastic
) - Training Split: 20,000 samples for training, remainder for validation
- Preprocessing: Text standardization, tokenization, and padding to fixed length
Input Layer (32 tokens max)
↓
TextVectorization (10,000 vocab)
↓
Embedding Layer (16 dimensions)
↓
GlobalAveragePooling1D
↓
Dense Layer (24 units, ReLU)
↓
Dense Layer (1 unit, Sigmoid)
↓
Binary Classification Output
Input Layer (32 tokens max)
↓
TextVectorization (10,000 vocab)
↓
Embedding Layer (16 dimensions)
↓
Bidirectional LSTM (32 units)
↓
Dense Layer (24 units, ReLU)
↓
Dense Layer (1 unit, Sigmoid)
↓
Binary Classification Output
Input Layer (32 tokens max)
↓
TextVectorization (10,000 vocab)
↓
Embedding Layer (16 dimensions)
↓
Conv1D (128 filters, kernel size 5, ReLU)
↓
GlobalMaxPooling1D
↓
Dense Layer (6 units, ReLU)
↓
Dense Layer (1 unit, Sigmoid)
↓
Binary Classification Output
Key Parameters:
- Vocabulary Size: 10,000 tokens
- Max Sequence Length: 32 tokens
- Embedding Dimensions: 16
- LSTM Units: 32 (bidirectional, resulting in 64-dimensional output)
- Conv1D Filters: 128 with kernel size 5
- Training Examples: 20,000
- Padding Type: 'pre'
- Truncation Type: 'post'
- Python 3.x
- TensorFlow 2.x
- NumPy
- Matplotlib
- Jupyter Notebook
git clone https://github.com/yourusername/sarcasm-detection
cd sarcasm-detection
pip install -r requirements.txt
- Download the sarcasm dataset from Kaggle or use the provided
sarcasm.json
file. - Open and run the desired notebook:
C3_W2_Lab_2_sarcasm_classifier.ipynb
- Basic model with GlobalAveragePoolingC3_W3_Lab_5_sarcasm_with_bi_LSTM.ipynb
- Advanced model with Bidirectional LSTMC3_W3_Lab_6_sarcasm_with_1D_convolutional.ipynb
- Model with 1D Convolutional layer
- Follow the notebook steps to:
- Load and preprocess the data
- Build and compile the model
- Train with customizable parameters
- Evaluate performance and visualize results
- Export embedding weights for visualization in the TensorFlow Embedding Projector.
C3_W2_Lab_2_sarcasm_classifier.ipynb
- Basic model implementation with GlobalAveragePoolingC3_W3_Lab_5_sarcasm_with_bi_LSTM.ipynb
- Bidirectional LSTM model implementationC3_W3_Lab_6_sarcasm_with_1D_convolutional.ipynb
- 1D Convolutional model implementationsarcasm.json
- Dataset file (download separately)requirements.txt
- List of dependenciesvecs.tsv
- Exported word vectors (generated after training)meta.tsv
- Exported metadata (generated after training)
- Training Accuracy: ~96% after 10 epochs
- Validation Accuracy: ~84% (with some overfitting observed)
- Model Size: Fewer parameters, computationally efficient
- Training Speed: Fastest training time
- Advantages: Simple architecture, good baseline performance
- Training Accuracy: ~97% after 10 epochs
- Validation Accuracy: ~84-85%
- Model Size: More parameters (174,129) but still lightweight
- Training Speed: Slower than the basic model but captures sequential information
- Advantages: Better captures word order and context in both directions
- Training Accuracy: ~98% after 10 epochs
- Validation Accuracy: ~85-86%
- Model Size: 139,399 parameters (less than LSTM, more than basic model)
- Training Speed: Faster than LSTM but slower than the basic model
- Advantages: Captures local n-gram patterns effectively, good at detecting key phrases
The notebooks generate visualizations showing:
- Accuracy progression over epochs
- Loss reduction during training
- Training vs. validation performance comparison
Both models use TensorFlow's efficient data pipeline (tf.data.Dataset
) with optimizations:
- Data caching
- Prefetching
- Shuffling with buffer
- Batching
- Efficient sequence padding
This results in faster training times and better resource utilization.
- Bidirectional LSTM captures word order and context in both directions, potentially improving performance on sequence-sensitive tasks
- 1D Convolutional layers with GlobalMaxPooling effectively identify the most important n-gram features in the text
- Global Average Pooling model provides a simpler architecture with fewer parameters
- Text Vectorization layer provides efficient preprocessing integrated into the model
- All models show signs of overfitting after several epochs
- Validation accuracy is similar across all three models (~84-86%), suggesting that for this specific dataset, the architecture choices provide marginal improvements
- Hyperparameter tuning opportunities exist for vocabulary size, embedding dimensions, LSTM units, convolutional filters, and dense layer architecture
Visualize the learned word embeddings with the TensorFlow Embedding Projector:
- After training, export embedding weights and metadata to
vecs.tsv
andmeta.tsv
. - Upload these files to the Embedding Projector.
- Explore word relationships and clusters in the learned embedding space.
- Discover how the model represents sarcastic vs. non-sarcastic language patterns.
- Experiment with different architectures (GRU, Transformer-based models)
- Implement regularization techniques to reduce overfitting (dropout, L2 regularization)
- Stack multiple Bidirectional LSTM layers for deeper context understanding
- Try hybrid models combining CNN and RNN features
- Experiment with different Conv1D filter sizes and number of filters
- Try different vocabulary sizes and embedding dimensions
- Add attention mechanisms for better context understanding
- Explore transfer learning with pre-trained embeddings (Word2Vec, GloVe)
- Multi-class classification for different types of sarcasm
- Ensemble multiple model types for potentially better performance
Special thanks to:
- Andrew Ng for creating the Deep Learning AI curriculum
- Laurence Moroney for excellent instruction and developing the course materials
- The creators of the News Headlines Dataset for Sarcasm Detection
- This notebook was created as part of the TensorFlow Developer Certificate program by DeepLearning.AI
For inquiries about this project:
© 2025 Melissa Slawsky. All Rights Reserved.