Releases: Fangzhou-Code/Utils
1.0.6
1.0.5
v1.0.5 update for new name
1.0.4
v1.0.4 Update __init__.py
1.0.3
v1.0.3 UPDATE MANIFEST.in
1.0.2
Update init.py to import correctly
1.0.1
EnhancedLocalEmbeddings is a versatile tool for generating text embeddings using local models. It supports Hugging Face Transformers and SentenceTransformer, providing flexibility and efficiency for text processing tasks.
Key Features:
- Dual Framework Support: Seamlessly works with Hugging Face models or SentenceTransformer, adapting to user needs.
- Customizable Output: Allows setting output embedding dimensions to suit specific applications.
- Multiple Text Modes: Supports single text, batch, and asynchronous embeddings.
- Batch Efficiency: Optimized for embedding large datasets with batch processing.
- Plug-and-Play Design: Easy integration into existing pipelines for applications like search, classification, and semantic analysis.
Ideal for developers and researchers seeking efficient, local embedding solutions without relying on external APIs.
1.0.0
The EnhancedLocalEmbeddings tool provides a flexible and efficient solution for generating text embeddings using local models, supporting both Hugging Face Transformers and SentenceTransformer frameworks. It is designed for tasks requiring robust text representation in a variety of applications, including natural language processing, search, and recommendation systems.
Key Features:
-
Dual Framework Support:
- Seamless integration with Hugging Face models and SentenceTransformer.
- Automatically determines the appropriate framework based on the model and tokenizer paths.
-
Customizable Output:
- Allows users to specify output dimensions for embeddings, offering control over the feature vector size.
-
Multiple Modes of Operation:
- Supports embedding single texts, multiple documents, or queries.
- Offers both synchronous (
embed_text,embed_documents) and asynchronous (aembed_text,aembed_documents) methods for flexibility in real-time and batch processing workflows.
-
Batch Processing:
- Efficient batch embedding for multiple texts, optimizing computational resources and processing time.
-
Model Flexibility:
- Leverages Hugging Face's
AutoModelandAutoTokenizerfor transformer-based models. - Supports SentenceTransformer for specialized embedding tasks.
- Leverages Hugging Face's
-
Ease of Use:
- Intuitive API design, including callable instances for embedding multiple texts with
__call__. - Provides tools for embedding queries (
embed_query) and embedding in batches (embed_batch).
- Intuitive API design, including callable instances for embedding multiple texts with
-
Plug-and-Play:
- Easily integrates into existing machine learning or natural language processing pipelines.
Example Use Cases:
- Search and Retrieval: Generate text embeddings for ranking and retrieving documents based on similarity.
- Text Clustering and Classification: Utilize embeddings for clustering similar texts or training classifiers.
- Semantic Matching: Match user queries with relevant documents or responses in a semantic space.
- Large-scale NLP Applications: Efficiently process and analyze large datasets with batch embeddings.
Technical Details:
- Built using
transformers,sentence-transformers, andtorchfor high performance. - Provides fallback mechanisms for compatibility with different model types.
- Handles tokenization, truncation, and padding internally for hassle-free embedding generation.
This tool is designed for developers and researchers requiring precise, efficient, and customizable embedding capabilities in local environments, eliminating the dependency on remote APIs.