Skip to content

Using Ollama llm to reduce diverse phrasings like “slow service” and “the service was slow”, to a single standardized label.

Notifications You must be signed in to change notification settings

devanjanmishra/Canonicalization-via-Ollama-LLMs

Repository files navigation

🧠 Text Quantization,NER,Standardization Pipeline

This repository implements a multi-step pipeline for understanding and structuring raw text (reviews, in this case). The goal is to move from raw text to structured insights through sentiment analysis, keyphrase extraction, and label standardization. The entire workflow is carried out using Qwen3 LLM via Ollama, enabling tool call, ensuring structured output at the end.
The workflow is implemented on Open Source "yelp-restaurant-reviews" data, but can be extended to convert any raw text to quantified standardized structured output.


📋 Pipeline Overview

The processing pipeline operates in three main stages:

Goal: Rate the entire review on a scale from 0 (very negative) to 5 (very positive).

Theoretical Context:

  • Sentiment Analysis: Assigning an overall polarity score to a review.
  • Common techniques include zero-shot classification, review embedding, or trained sentiment classifiers.

Goal: Extract opinionated phrases like "slow service" or "great food", and identify references to specific entities (e.g., "ice cream", "burger").

Theoretical Context:

  • Keyphrase Extraction / Aspect Extraction: Pulling out meaningful aspects from text.
  • Aspect-Based Sentiment Analysis (ABSA): Attaching sentiment to specific targets like "food", "service".
  • Opinion Mining: Identifying subjective expressions.
  • Named Entity Recognition (NER): Capturing references to named items (e.g., dishes, products).

Goal: Map diverse phrasings (e.g., "the service was slow" vs "slow service") to a single standardized label.

Theoretical Context:

  • Text Normalization / Canonicalization: Reducing linguistic variance across extracted phrases.
  • Synonym Resolution / Clustering: Grouping semantically similar expressions under unified tags.

📋 Pipeline Flow

flowchart TD
    A[Start] --> B[Receive Review Input]
    B --> C[Ollama Tool Call: Quantification & NER]
    C --> Pos[Positive Points]
    C --> Neg[Pain Points]

    Pos --> PosDishes[Positive Dishes]
    Pos --> PosOther[Other Positive Points]
    Neg --> NegDishes[Negative Dishes]
    Neg --> NegOther[Other Negative Points]

    PosOther --> CollectPos[Collect All Positive Points]
    NegOther --> CollectNeg[Collect All Negative Points]

    CollectPos --> SPos[Ollama Tool Call: Label Standardization - Positive]
    CollectNeg --> SNeg[Ollama Tool Call: Label Standardization - Negative]

    SPos --> StdPos[Standardized Positive Points]
    StdPos --> MapPos[Reduced/Standarized Positive Points]
    CollectPos --> MapPos
    MapPos --> CatPos[Categorized Positive Points]

    SNeg --> StdNeg[Standardized Negative Points]
    StdNeg --> MapNeg[Reduced/Standarized Negative Points]
    CollectNeg --> MapNeg
    MapNeg --> CatNeg[Categorized Negative Points]

    %% Dotted box: NER phase
    subgraph NER [ Named Entity Recognition ]
        style NER stroke-dasharray: 5
        Pos
        Neg
        PosDishes
        PosOther
        NegDishes
        NegOther
    end

    %% Encircle Standardization & Categorization process
    subgraph Standardization [ Point Standardization & Categorization ]
        style Standardization stroke: #999,stroke-width:2px
        PosOther
        NegOther
        CollectPos
        CollectNeg
        SPos
        SNeg
        StdPos
        StdNeg
        MapPos
        MapNeg
        CatPos
        CatNeg
    end

    %% Apply class to important nodes
    class C,SPos,SNeg highlight;

    %% Define highlight class with dark grey background and white text
    classDef highlight fill:#333,stroke:#fff,stroke-width:2px,color:#fff,font-weight:bold;

Loading

Sample Output:

A structured JSON representation of the review, including:

{
  "rating": 4,
  "positive_points": ["delicious food", "friendly staff"],
  "pain_points": ["slow service"],
  "categories": {
    "food": ["delicious food"],
    "service": ["slow service"]
  }
}

Sample Output

About

Using Ollama llm to reduce diverse phrasings like “slow service” and “the service was slow”, to a single standardized label.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published