Skip to content

TianyiZ0829/Text-Classification-with-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Embedding_AI-in-Finance

Author: Tianyi Zhu

Project Overview

This project focuses on fine-tuning the BERT model for text classification tasks, specifically using business descriptions as input data. The implementation includes data preprocessing, tokenization, dataset splitting, and fine-tuning BERT for classification.


Table of Contents

  1. Project Workflow
  2. Requirements
  3. Installation
  4. Usage
  5. File Structure
  6. Results
  7. License

Project Workflow

Step 1: Data Preprocessing

  • Clean and process the input data (business descriptions).
  • Tokenize text using Hugging Face Transformers library.

Step 2: Dataset Splitting

  • Split the dataset into training, validation, and test sets for model evaluation.

Step 3: Fine-Tuning BERT

  • Load the pretrained BERT model.
  • Fine-tune the model for sequence classification tasks using PyTorch and Hugging Face.

Step 4: Evaluation and Results

  • Evaluate the fine-tuned model on test data.
  • Visualize the results using matplotlib and seaborn.

Requirements

Ensure you have the following Python libraries installed:

  • transformers
  • torch
  • numpy
  • pandas
  • matplotlib
  • seaborn
  • tqdm

Installation

  1. Clone this repository:

    git clone https://github.com/your-username/AS12-BERT-Classification.git
    cd AS12-BERT-Classification
  2. Install the required dependencies:

    pip install -r requirements.txt

Usage

  1. Run the notebook:
    Open the Jupyter Notebook AS12.ipynb in a Jupyter environment:

    jupyter notebook AS12.ipynb
  2. Steps in the Notebook:

    • Data preprocessing
    • Tokenization
    • Fine-tuning BERT
    • Model evaluation and results
  3. Input Data: Ensure you have the business description data in the correct format.


File Structure

AS12-BERT-Classification/
│-- AS12.ipynb              # Main Jupyter Notebook for implementation
│-- data/                   # Folder to store input data
│-- results/                # Folder to save outputs and visualizations
│-- models/                 # Folder to save fine-tuned models
│-- README.md               # Project documentation
│-- requirements.txt        # Required dependencies

Results

  • Results of fine-tuning the BERT model, including evaluation metrics, are documented in the notebook.
  • Visualizations include confusion matrices and classification accuracy plots.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published