🧠 Text Quantization,NER,Standardization Pipeline

This repository implements a multi-step pipeline for understanding and structuring raw text (reviews, in this case). The goal is to move from raw text to structured insights through sentiment analysis, keyphrase extraction, and label standardization. The entire workflow is carried out using Qwen3 LLM via Ollama, enabling tool call, ensuring structured output at the end.
The workflow is implemented on Open Source "yelp-restaurant-reviews" data, but can be extended to convert any raw text to quantified standardized structured output.

📋 Pipeline Overview

The processing pipeline operates in three main stages:

1️⃣ Basic Sentiment Analysis

Goal: Rate the entire review on a scale from 0 (very negative) to 5 (very positive).

Theoretical Context:

Sentiment Analysis: Assigning an overall polarity score to a review.
Common techniques include zero-shot classification, review embedding, or trained sentiment classifiers.

2️⃣ Quantitative Named Entity Recognition (NER)

Goal: Extract opinionated phrases like "slow service" or "great food", and identify references to specific entities (e.g., "ice cream", "burger").

Theoretical Context:

Keyphrase Extraction / Aspect Extraction: Pulling out meaningful aspects from text.
Aspect-Based Sentiment Analysis (ABSA): Attaching sentiment to specific targets like "food", "service".
Opinion Mining: Identifying subjective expressions.
Named Entity Recognition (NER): Capturing references to named items (e.g., dishes, products).

3️⃣ Standardization / Canonicalization

Goal: Map diverse phrasings (e.g., "the service was slow" vs "slow service") to a single standardized label.

Theoretical Context:

Text Normalization / Canonicalization: Reducing linguistic variance across extracted phrases.
Synonym Resolution / Clustering: Grouping semantically similar expressions under unified tags.

📋 Pipeline Flow

flowchart TD
    A[Start] --> B[Receive Review Input]
    B --> C[Ollama Tool Call: Quantification & NER]
    C --> Pos[Positive Points]
    C --> Neg[Pain Points]

    Pos --> PosDishes[Positive Dishes]
    Pos --> PosOther[Other Positive Points]
    Neg --> NegDishes[Negative Dishes]
    Neg --> NegOther[Other Negative Points]

    PosOther --> CollectPos[Collect All Positive Points]
    NegOther --> CollectNeg[Collect All Negative Points]

    CollectPos --> SPos[Ollama Tool Call: Label Standardization - Positive]
    CollectNeg --> SNeg[Ollama Tool Call: Label Standardization - Negative]

    SPos --> StdPos[Standardized Positive Points]
    StdPos --> MapPos[Reduced/Standarized Positive Points]
    CollectPos --> MapPos
    MapPos --> CatPos[Categorized Positive Points]

    SNeg --> StdNeg[Standardized Negative Points]
    StdNeg --> MapNeg[Reduced/Standarized Negative Points]
    CollectNeg --> MapNeg
    MapNeg --> CatNeg[Categorized Negative Points]

    %% Dotted box: NER phase
    subgraph NER [ Named Entity Recognition ]
        style NER stroke-dasharray: 5
        Pos
        Neg
        PosDishes
        PosOther
        NegDishes
        NegOther
    end

    %% Encircle Standardization & Categorization process
    subgraph Standardization [ Point Standardization & Categorization ]
        style Standardization stroke: #999,stroke-width:2px
        PosOther
        NegOther
        CollectPos
        CollectNeg
        SPos
        SNeg
        StdPos
        StdNeg
        MapPos
        MapNeg
        CatPos
        CatNeg
    end

    %% Apply class to important nodes
    class C,SPos,SNeg highlight;

    %% Define highlight class with dark grey background and white text
    classDef highlight fill:#333,stroke:#fff,stroke-width:2px,color:#fff,font-weight:bold;

Sample Output:

A structured JSON representation of the review, including:

{
  "rating": 4,
  "positive_points": ["delicious food", "friendly staff"],
  "pain_points": ["slow service"],
  "categories": {
    "food": ["delicious food"],
    "service": ["slow service"]
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Quantitative_Named_Entity_Recognition		Quantitative_Named_Entity_Recognition
Standardization_Canonicalization		Standardization_Canonicalization
basic_sentiment_analysis		basic_sentiment_analysis
End_2_End_Quantification_Development_GoogleCollab.ipynb		End_2_End_Quantification_Development_GoogleCollab.ipynb
README.md		README.md
Yelp Restaurant Reviews.csv		Yelp Restaurant Reviews.csv
requirements.txt		requirements.txt
sample_word_cloud.jpg		sample_word_cloud.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Text Quantization,NER,Standardization Pipeline

📋 Pipeline Overview

1️⃣ Basic Sentiment Analysis

Theoretical Context:

2️⃣ Quantitative Named Entity Recognition (NER)

Theoretical Context:

3️⃣ Standardization / Canonicalization

Theoretical Context:

📋 Pipeline Flow

Sample Output:

About

Uh oh!

Releases

Packages

Languages

devanjanmishra/Canonicalization-via-Ollama-LLMs

Folders and files

Latest commit

History

Repository files navigation

🧠 Text Quantization,NER,Standardization Pipeline

📋 Pipeline Overview

1️⃣ Basic Sentiment Analysis

Theoretical Context:

2️⃣ Quantitative Named Entity Recognition (NER)

Theoretical Context:

3️⃣ Standardization / Canonicalization

Theoretical Context:

📋 Pipeline Flow

Sample Output:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages