Skip to content

Commit

Permalink
Merge pull request #46 from pinecone-io/seperate-openai-extra
Browse files Browse the repository at this point in the history
add openai extra
  • Loading branch information
acatav authored Sep 12, 2023
2 parents 281d1f2 + d4dda88 commit 53c1e19
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 11 deletions.
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ If you wish to use `SentenceTransformerEncoder` dense encoder, you will need to
pip install pinecone-text[dense]
```

If you wish to use `OpenAIEncoder` dense encoder, you will need to install the `openai` extra:
```bash
pip install pinecone-text[openai]
```

## Sparse Encoding

To convert your own text corpus to sparse vectors, you can either use [BM25](https://www.pinecone.io/learn/semantic-search/#bm25) or [SPLADE](https://www.pinecone.io/learn/splade/).
Expand All @@ -41,7 +46,7 @@ To encode your documents and queries using BM25 as vector for dot product search
> When conducting a search, you may come across queries that contain terms not found in the training corpus but are present in the database. To address this scenario, BM25Encoder uses a default document frequency value of 1 when encoding such terms.
#### Usage

For an end-to-end example, you can refer to our Quora dataset generation with BM25 [notebook](https://github.com/pinecone-io/examples/blob/master/pinecone/sparse/bm25/bm25-vector-generation.ipynb).
For an end-to-end example, you can refer to our Quora dataset generation with BM25 [notebook](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/search/semantic-search/sparse/bm25/bm25-vector-generation.ipynb).

```python
from pinecone_text.sparse import BM25Encoder
Expand Down Expand Up @@ -91,7 +96,7 @@ Currently the `SpladeEncoder` class supprts only the [naver/splade-cocondenser-e

#### Usage

For an end-to-end example, you can refer to our Quora dataset generation with SPLADE [notebook](https://github.com/pinecone-io/examples/blob/master/pinecone/sparse/splade/splade-vector-generation.ipynb).
For an end-to-end example, you can refer to our Quora dataset generation with SPLADE [notebook](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/search/semantic-search/sparse/splade/splade-vector-generation.ipynb).

```python
from pinecone_text.sparse import SpladeEncoder
Expand Down Expand Up @@ -138,8 +143,9 @@ encoder.encode_queries(["Who jumped over the lazy dog?"])

### OpenAI models

When using the `OpenAIEncoder`, you need to provide an API key for the OpenAI API, and store it in the `OPENAI_API_KEY` environment variable.
When using the `OpenAIEncoder`, you need to provide an API key for the OpenAI API, and store it in the `OPENAI_API_KEY` environment variable before you import the encoder.

By default the encoder will use `text-embedding-ada-002` as recommended by OpenAI. You can also specify a different model name using the `model_name` parameter.
#### Usage
```python
from pinecone_text.dense.openai_encoder import OpenAIEncoder
Expand Down
6 changes: 0 additions & 6 deletions pinecone_text/dense/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +0,0 @@
"""
Sentance Transformers are a class of models that take a sentence as input and output a vector representation of the sentence.
These models are useful for tasks such as semantic search, clustering, and classification. The sentence transformer models are
the work of the research team led by Nils Reimers at the University of Stuttgart. For more information, see the [Sentence Transformers paper](https://arxiv.org/abs/1908.10084).
"""
9 changes: 8 additions & 1 deletion pinecone_text/dense/openai_encoder.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
import openai
from typing import Union, List
from pinecone_text.dense.base_dense_ecoder import BaseDenseEncoder

try:
import openai
except (OSError, ImportError, ModuleNotFoundError) as e:
raise ImportError(
"Failed to import openai. Make sure you install openai extra dependencies by running: "
"`pip install pinecone-text[openai]"
) from e


class OpenAIEncoder(BaseDenseEncoder):
"""
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "pinecone-text"
version = "0.5.3"
version = "0.5.4"
description = "Text utilities library by Pinecone.io"
authors = ["Pinecone.io"]
readme = "README.md"
Expand All @@ -19,6 +19,7 @@ openai = { version = "^0.27.3", optional = true }
[tool.poetry.extras]
splade = ["torch", "transformers", "sentence-transformers"]
dense = ["torch", "transformers", "sentence-transformers", "openai"]
openai = ["openai"]

[tool.poetry.group.dev]
optional = true
Expand Down

0 comments on commit 53c1e19

Please sign in to comment.