Skip to content

SATHEESH-MEADI/DATA_690_NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌟 NLP Coursework Repository

📌 Table of Contents

  1. Introduction
  2. Coursework Highlights
  3. Features and Achievements
  4. Visual Showcase
  5. Packages and Tools
  6. Future Work
  7. Versioning
  8. Contributing
  9. Author

📝 Introduction

This repository documents my journey through 10 weeks of NLP assignments as part of my DATA 690 course. Covering a broad spectrum of NLP concepts and applications, it showcases theoretical explorations, practical implementations, and creative experiments with text processing, feature engineering, embeddings, and advanced topics like sentiment analysis and topic modeling.

🎯 Mission: To make complex NLP concepts accessible and implementable through well-organized code and insightful results.


🔑 Coursework Highlights

Week 1: Introduction to NLP

  • Explored foundational concepts and real-world applications.

Week 2: Text Mining & Analytics

  • Techniques for mining insights from raw text.
  • Key focus: Preprocessing and understanding textual patterns.

Week 3: Preparing Text for Analysis

  • Implemented tokenization, lemmatization, and text cleaning pipelines.
  • Specialized in feature extraction techniques.

Week 4: Feature Engineering & Word Embeddings

  • Created embeddings and measured text similarity using cosine distance.

Week 5: Parsing & POS Tagging

  • Built custom parsers and trained POS tagging models.

Week 6: Text Summarization

  • Developed extractive and abstractive summarization systems.

Week 7: Generating Text

  • Utilized Markov Chains and neural networks for text generation.

Week 8: Sentiment Analysis

  • Analyzed sentiments with supervised models and fine-tuned transformers.

Week 9: Text Similarity & Clustering

  • Conducted clustering of documents and semantic similarity scoring.

Week 10: Text Classification

  • Built models to classify text into categories using embeddings and traditional ML methods.

🎉 Features and Achievements

  • Interactive Experiments: Implemented live demos of NLP models in Google Colab Notebooks.
  • Custom Pipelines: Designed modular workflows for text processing.
  • Comprehensive Examples: Each assignment includes examples and results for reproducibility.
  • Visualization Tools: Used Matplotlib and Plotly for visual representations.

📸 Visual Showcase

Sample Workflow Output

  • Preprocessed data visualization.
  • Embedding plots.
  • Sentiment analysis heatmaps.

🛠️ Packages and Tools

  • Core Libraries: NLTK, SpaCy, TextBlob, Pandas, NumPy.
  • Advanced Models: Hugging Face Transformers, BERT, PubMedBERT.
  • Visualization: Matplotlib, Seaborn, Plotly.
  • Development Tools: Jupyter Notebook, VS Code, Streamlit.

🚀 Future Work

  • Integrate advanced LLMs for deeper contextual understanding.
  • Experiment with real-world datasets from APIs like Twitter or Wikipedia.
  • Build a Streamlit-based interface for showcasing NLP models interactively.

📌 Versioning

  • Version 1.0: Includes completed coursework from weeks 1–10 with all implemented assignments.
  • Future Updates: Planned enhancements for interactivity and visualization.

🤝 Contributing

Contributions are welcome! Submit your pull requests or raise issues for suggestions. Together, we can enhance this repository to benefit NLP enthusiasts.


👩‍💻 Author

Satheesh Meadi
Data Science Master’s Student | NLP Enthusiast
📧 Email: smeadi1@umbc.edu
📚 LinkedIn: https://www.linkedin.com/in/satheesh-meadi/


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published