Shipment-Analysis

Sentiment Analysis and Feature Extraction with Web Scraping

This project demonstrates a pipeline for scraping text data from web pages, cleaning the data, extracting features using TF-IDF, and performing sentiment analysis using the TextBlob and nltk libraries. The results are then saved into a CSV file for further analysis.

Overview

The script performs the following tasks:

Scrapes data from a given website.
Cleans the text data by removing special characters, punctuation, and extra spaces.
Extracts features from the cleaned data using the TfidfVectorizer.
Analyzes the sentiment of the cleaned text data.
Saves the cleaned data, sentiment scores, and feature extraction results into a CSV file.

Dependencies

numpy
pandas
requests
BeautifulSoup4
re
nltk
sklearn
textblob

Make sure to install the required libraries using the following command:

pip install numpy pandas requests beautifulsoup4 nltk scikit-learn textblob

Installation

Clone the repository:

https://github.com/Sherryyy00/Shipment-Analysis.git

Navigate to the project directory:
```
 cd sentiment-feature-extraction
```
Install the required Python packages:
```
pip install -r requirements.txt
```

Code Explanation

Web Scraping The script uses the requests library to fetch the content of a webpage and BeautifulSoup for parsing HTML data. It collects all hyperlinks from the page and then retrieves the text content from each link.
```
 url = 'https://www.sciencedirect.com/science/article/abs/pii/S1361920999000309'
 response = requests.get(url).text
 soup = bs(response, "html.parser")
 
 link = [a['href'] for a in soup.find_all('a', href=True)]
```
Data Cleaning The scraped text is cleaned by removing non-alphanumeric characters, punctuation, and extra spaces. The cleaned data is then stored in a pandas DataFrame.
```
 for j in range(len(data_text)):
     data_text[j] = re.sub(r'\W', " ", data_text[j])
     # Further cleaning steps
```
Feature Extraction TF-IDF (Term Frequency-Inverse Document Frequency) is used for feature extraction. The TfidfVectorizer transforms the cleaned text into numerical feature vectors for use in machine learning or other analysis.
```
 tfidfconverter = TfidfVectorizer(max_features=1500, min_df=5, max_df=0.7, stop_words=stopwords.words('English'))
 x = tfidfconverter.fit_transform(df['cleaned']).toarray()
```
Sentiment Analysis The sentiment of the cleaned text is analyzed using two methods:
TextBlob for polarity and subjectivity scores.

SentimentIntensityAnalyzer from the nltk library for detailed sentiment scores (negative, positive, neutral, compound).

 df['Polarity'] = df["cleaned"].apply(lambda x: TextBlob(x).sentiment.polarity)
 df['Subjectivity'] = df["cleaned"].apply(lambda x: TextBlob(x).sentiment.subjectivity)

The SentimentIntensityAnalyzer is used to calculate various sentiment metrics.

    for i in range(len(df.index)):
      score = SentimentIntensityAnalyzer().polarity_scores(df['cleaned'][i])
      neg.append(score['neg'])
      pos.append(score['pos'])
      neu.append(score['neu'])
      com.append(score['compound'])

Saving to CSV The results, including cleaned text, sentiment analysis scores, and extracted features, are saved into a CSV file.
```
 df.to_csv('Feature Extraction.csv')
```

Usage

To run the script:

Make sure you have installed the required dependencies.

Run the Python script in your terminal:

    python sentiment_analysis.py

The results will be saved in a file named Feature Extraction.csv.

Output

Feature Extraction.csv: A CSV file containing:

Cleaned text
Sentiment scores (polarity, subjectivity, positive, negative, neutral, compound)
Extracted TF-IDF features
plaintext

This project provides a complete pipeline from web scraping to text processing and sentiment analysis, ideal for applications in natural language processing and data mining.

This README.md provides a comprehensive overview of the project, its setup, and functionality. You can adapt it to your project structure and repository details as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Code.ipynb		Code.ipynb
Feature Extraction.csv		Feature Extraction.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Shipment-Analysis

Sentiment Analysis and Feature Extraction with Web Scraping

Table of Contents

Overview

Dependencies

Installation

Code Explanation

Usage

To run the script:

Run the Python script in your terminal:

Output

Feature Extraction.csv: A CSV file containing:

About

Uh oh!

Releases

Packages

Languages

Sherryyy00/Shipment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Shipment-Analysis

Sentiment Analysis and Feature Extraction with Web Scraping

Table of Contents

Overview

Dependencies

Installation

Code Explanation

Usage

To run the script:

Run the Python script in your terminal:

Output

Feature Extraction.csv: A CSV file containing:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages