Big-Data-Subject

Projects done while studying big data at IUH

Projects Done While Studying Big Data at IUH

This repository contains various projects completed while studying Big Data at Ho Chi Minh City University of Industry (IUH). Each project applies different big data technologies, frameworks, and methodologies to solve real-world problems.

Overview

During my studies at IUH, I explored multiple aspects of Big Data, including data collection, storage, processing, analysis, and visualization. These projects demonstrate my understanding and application of big data tools and concepts.

Projects

Web Crawling & Data Scraping
- Built a crawler to extract data from various news websites using BeautifulSoup and Scrapy.
- Processed and stored data in CSV and MongoDB for further analysis.
Data Processing with Hadoop & Spark
- Used Apache Hadoop (HDFS, MapReduce) and Apache Spark (PySpark) to process large datasets.
- Performed ETL (Extract, Transform, Load) operations on structured and unstructured data.
Real-Time Data Streaming
- Implemented a real-time data processing pipeline using Apache Kafka and Spark Streaming.
- Analyzed and visualized live data streams from Twitter and IoT sensors.
Machine Learning on Big Data
- Built predictive models using ML algorithms in Scikit-Learn and Spark MLlib.
- Applied classification, regression, and clustering on large datasets.
Big Data Visualization
- Created interactive dashboards with Tableau and Power BI.
- Used Matplotlib, Seaborn, and D3.js for data visualization.

Technologies Used

Programming Languages: Python, Java, Scala
Big Data Frameworks: Hadoop, Spark, Kafka
Databases: MongoDB, MySQL, PostgreSQL
Web Scraping Tools: BeautifulSoup, Scrapy
Machine Learning: Scikit-learn, TensorFlow, Spark MLlib
Visualization Tools: Tableau, Power BI, Matplotlib, Seaborn, D3.js

Installation & Setup

To run the projects locally:

Clone the repository:

git clone https://github.com/your-username/big-data-projects.git
cd big-data-projects

Set up a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Usage

Each project has its own folder with detailed instructions. Navigate to a specific project and follow the README inside for setup and execution.

Contributing

Contributions are welcome! Feel free to submit issues or pull requests to improve the projects.

License

This repository is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
(Personal)W1-Crawler_&_Docker		(Personal)W1-Crawler_&_Docker
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big-Data-Subject

Projects Done While Studying Big Data at IUH

Table of Contents

Overview

Projects

Technologies Used

Installation & Setup

Usage

Contributing

License

About

Releases

Packages

Languages

tbm077861/Big-Data-Subject

Folders and files

Latest commit

History

Repository files navigation

Big-Data-Subject

Projects Done While Studying Big Data at IUH

Table of Contents

Overview

Projects

Technologies Used

Installation & Setup

Usage

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages