Skip to content

A FastAPI-powered REST API offering a comprehensive suite of natural language processing services using machine learning models with PyTorch and Transformers, packaged in a Docker container to run efficiently.

License

Notifications You must be signed in to change notification settings

samestrin/llm-services-api

Repository files navigation

LLM Services API

Star on GitHubFork on GitHubWatch on GitHub

Version 0.0.4 License: MITBuilt with Python

LLM Services API is a FastAPI-based application that provides a suite of natural language processing services using various machine learning models from Hugging Face's transformers library through a REST API interface. The application is designed to run in a Docker container, providing endpoints for text summarization, sentiment analysis, named entity recognition, paraphrasing, keyword extraction, and embedding generation. The entire API is secured using an API key with Bearer <token> format, ensuring that only authorized users can access the endpoints.

The service allows flexibility in model selection through command-line arguments and a configuration file, models_config.json, enabling users to specify different Hugging Face models for various NLP tasks. This flexibility allows users to select lightweight models for lower-resource environments or more powerful models for advanced tasks.

Updates

0.0.4

  • Tokenization: Convert input text into a list of token IDs, allowing you to process and manipulate text at the token level, default model all-MiniLM-L6-v2.
  • Detokenization: Reconstruct original text from a list of token IDs, allowing you to reverse the tokenization process, default model all-MiniLM-L6-v2.

0.0.3

  • Adaptive Throttling: Implemented an adaptive throttling mechanism that delays requests using the Retry-After header when errors are encountered due to high request frequency or processing failures. The delay is dynamically adjusted based on the client’s request rate and error occurrences.

0.0.2

  • OpenAI-Compatible Embeddings: Provides an endpoint that mimics the OpenAI embedding API, allowing easy integration with existing systems expecting OpenAI-like responses.
  • Configurable Model Loading: Customize which Hugging Face NLP models are loaded by providing command-line arguments or configuring the models_config.json file. This flexibility allows the application to adapt to different resource environments or use cases.

Features

  • Text Summarization: Generate concise summaries of long texts, default model BART.
  • Sentiment Analysis: Determine the sentiment of text inputs, default model DistilBERT.
  • Named Entity Recognition (NER): Identify entities within text and sort them by frequency, default model BERT (dbmdz/bert-large-cased-finetuned-conll03-english).
  • Paraphrasing: Rephrase sentences to produce semantically similar outputs, default model T5.
  • Keyword Extraction: Extract important keywords from text, with customizable output count, default model KeyBERT.
  • Embedding Generation: Create vector representations of text, default model SentenceTransformers (all-MiniLM-L6-v2).
  • Caching with LRU: Frequently used computations, such as generating embeddings and tokenizations, are cached using the Least Recently Used (LRU) strategy. This reduces response times for repeated requests and enhances overall performance.

Dependencies

  • Python 3.7+
  • FastAPI
  • Uvicorn
  • spaCy
  • transformers
  • sentence-transformers
  • keybert
  • torch
  • python-dotenv (for environment variable management)

Installation

To get started with the LLM Services API, follow these steps:

  1. Clone the Repository:
git clone https://github.com/samestrin/llm-services-api.git
cd llm-services-api
  1. Create a Virtual Environment:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
  1. Install the Dependencies:
pip install -r requirements.txt
  1. Download SpaCy Model:
python -m spacy download en_core_web_sm
  1. Create Your .env File:
echo "API_KEY=your-key-here" > .env
  1. Run the Application Locally:

You can run the application locally in two ways:

  • Using Uvicorn:

This is the recommended method for running in a development or production-like environment.

uvicorn main:app --reload --port 5000
  • Using Python:

This method allows you to pass command-line arguments for customizing models.

python main.py --embedding-model all-MiniLM-L6-v2 --summarization-model facebook/bart-large-cnn

Replace --embedding-model and --summarization-model with the models you wish to use. This approach offers flexibility by allowing you to specify different models for various NLP tasks.

Options

  -h, --help                                  Show this help message and exit
  --embedding-model EMBEDDING_MODEL           Specify embedding model
  --summarization-model SUMMARIZATION_MODEL   Specify summarization model
  --sentiment-model SENTIMENT_MODEL           Specify sentiment analysis model
  --ner-model NER_MODEL                       Specify named entity recognition model
  --paraphrase-model PARAPHRASE_MODEL         Specify paraphrasing model
  --keyword-model KEYWORD_MODEL               Specify keyword extraction mode

Running with Docker

To run the application in a Docker container, follow these steps:

  1. Build the Docker Image:
docker build -t llm-services-api .
  1. Run the Docker Container:
docker run -p 5000:5000 llm-services-api

The application will be accessible at http://localhost:5000.

Usage

The API provides several endpoints for various NLP tasks. Below is a summary of the available endpoints:

Endpoints

1. Text Summarization

  • Endpoint: /summarize
  • Method: POST
  • Request Body:
{
  "text": "Your text here"
}
  • Response:
{
  "summary": "The generated summary of the provided text."
}

2. Sentiment Analysis

  • Endpoint: /sentiment
  • Method: POST
  • Request Body:
{
  "text": "Your text here"
}
  • Response:
{
    "sentiment": [
        {
        "label": "POSITIVE", # or "NEGATIVE"
        "score": 0.99
        }
    ]
}

3. Named Entity Recognition

  • Endpoint: /entities
  • Method: POST
  • Request Body:
{
  "text": "Your text here"
}
  • Response:
{
    "entities": [
        {
        "entity": "PERSON",
        "word": "John Doe",
        "frequency": 3
        },
        ...
    ]
}

4. Paraphrasing

  • Endpoint: /paraphrase
  • Method: POST
  • Request Body:
{
  "text": "Your text here"
}
  • Response:
{
  "paraphrased_text": "The paraphrased version of the input text."
}

5. Keyword Extraction

  • Endpoint: /extract_keywords
  • Method: POST
  • Query Parameters:
    • num_keywords: Optional, defaults to 5. Specifies the number of keywords to extract.
  • Request Body:
{
  "text": "Your text here"
}
  • Response:
{
"keywords": [
    {
        "keyword": "important keyword",
        "score": 0.95
        },
        ...
    ]
}

6. Embedding Generation

  • Endpoint: /embed
  • Method: POST
  • Request Body:
{
  "text": "Your text here"
}
  • Response:
{
    "embedding": [0.1, 0.2, 0.3, ...] # Array of float numbers representing the text embedding
}

7. OpenAI-Compatible Embedding

  • Endpoint: /v1/embeddings
  • Method: POST
  • Request Body:
{
  "input": "Your text here",
  "model": "all-MiniLM-L6-v2"  # or another supported model
}
  • Response:
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [-0.006929283495992422, -0.005336422007530928, ...],  # Embedding array
    }
  ],
  "model": "all-MiniLM-L6-v2",
  "usage": {
    "prompt_tokens": 5,  # Number of tokens in the input
    "total_tokens": 5    # Total number of tokens processed
  }
}

8. Tokenization

  • Endpoint: /tokenize
  • Method: POST
  • Request Body:
{
  "text": "Your text here",
  "model": "all-MiniLM-L6-v2"  # Optional, specify a model for tokenization
}
  • Response:
{
  "tokens": [101, 7592, 999, ...]  # Array of token IDs representing the text
}

This endpoint allows you to tokenize input text using a specified or default model. If the model field is not provided, the default embeddings model all-MiniLM-L6-v2 will be used.

8. Detokenization

  • Endpoint: /detokenize
  • Method: POST
  • Request Body:
{
  "tokens": [101, 2023, 2003, 2019, 2742, 6251, 2000, 19204, 1012, 102],  # List of token IDs
  "model": "all-MiniLM-L6-v2"  # Optional, specify a model for detokenization
}
  • Response:
{
  "text": "This is an example sentence to tokenize."  # The reconstructed text
}

Contribute

Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Share

Twitter Facebook LinkedIn

About

A FastAPI-powered REST API offering a comprehensive suite of natural language processing services using machine learning models with PyTorch and Transformers, packaged in a Docker container to run efficiently.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages