Skip to content

Releases: Fangzhou-Code/Utils

1.0.6

15 Dec 10:39

Choose a tag to compare

v1.0.6

Rebuild

1.0.5

15 Dec 10:17

Choose a tag to compare

v1.0.5

update for new name

1.0.4

15 Dec 10:03
3ca9f13

Choose a tag to compare

v1.0.4

Update __init__.py

1.0.3

15 Dec 10:01

Choose a tag to compare

v1.0.3

UPDATE MANIFEST.in

1.0.2

15 Dec 09:38
07b698a

Choose a tag to compare

Update init.py to import correctly

1.0.1

15 Dec 09:27
43d962a

Choose a tag to compare

EnhancedLocalEmbeddings is a versatile tool for generating text embeddings using local models. It supports Hugging Face Transformers and SentenceTransformer, providing flexibility and efficiency for text processing tasks.

Key Features:

  • Dual Framework Support: Seamlessly works with Hugging Face models or SentenceTransformer, adapting to user needs.
  • Customizable Output: Allows setting output embedding dimensions to suit specific applications.
  • Multiple Text Modes: Supports single text, batch, and asynchronous embeddings.
  • Batch Efficiency: Optimized for embedding large datasets with batch processing.
  • Plug-and-Play Design: Easy integration into existing pipelines for applications like search, classification, and semantic analysis.

Ideal for developers and researchers seeking efficient, local embedding solutions without relying on external APIs.

1.0.0

15 Dec 08:23
5b08488

Choose a tag to compare

The EnhancedLocalEmbeddings tool provides a flexible and efficient solution for generating text embeddings using local models, supporting both Hugging Face Transformers and SentenceTransformer frameworks. It is designed for tasks requiring robust text representation in a variety of applications, including natural language processing, search, and recommendation systems.

Key Features:

  1. Dual Framework Support:

    • Seamless integration with Hugging Face models and SentenceTransformer.
    • Automatically determines the appropriate framework based on the model and tokenizer paths.
  2. Customizable Output:

    • Allows users to specify output dimensions for embeddings, offering control over the feature vector size.
  3. Multiple Modes of Operation:

    • Supports embedding single texts, multiple documents, or queries.
    • Offers both synchronous (embed_text, embed_documents) and asynchronous (aembed_text, aembed_documents) methods for flexibility in real-time and batch processing workflows.
  4. Batch Processing:

    • Efficient batch embedding for multiple texts, optimizing computational resources and processing time.
  5. Model Flexibility:

    • Leverages Hugging Face's AutoModel and AutoTokenizer for transformer-based models.
    • Supports SentenceTransformer for specialized embedding tasks.
  6. Ease of Use:

    • Intuitive API design, including callable instances for embedding multiple texts with __call__.
    • Provides tools for embedding queries (embed_query) and embedding in batches (embed_batch).
  7. Plug-and-Play:

    • Easily integrates into existing machine learning or natural language processing pipelines.

Example Use Cases:

  • Search and Retrieval: Generate text embeddings for ranking and retrieving documents based on similarity.
  • Text Clustering and Classification: Utilize embeddings for clustering similar texts or training classifiers.
  • Semantic Matching: Match user queries with relevant documents or responses in a semantic space.
  • Large-scale NLP Applications: Efficiently process and analyze large datasets with batch embeddings.

Technical Details:

  • Built using transformers, sentence-transformers, and torch for high performance.
  • Provides fallback mechanisms for compatibility with different model types.
  • Handles tokenization, truncation, and padding internally for hassle-free embedding generation.

This tool is designed for developers and researchers requiring precise, efficient, and customizable embedding capabilities in local environments, eliminating the dependency on remote APIs.