Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add AzureOpenAIEncoder #73

Merged
merged 14 commits into from
Jan 13, 2024
Merged

feat: Add AzureOpenAIEncoder #73

merged 14 commits into from
Jan 13, 2024

Conversation

mckeown12
Copy link

@mckeown12 mckeown12 commented Jan 5, 2024

Azure's OpenAI deployments require a few extra parameters in order to initialize.

@stepanogil
Copy link

stepanogil commented Jan 5, 2024

the openai==1 uses a client. like so:

from openai import AzureOpenAI

client = AzureOpenAI(
api_key = os.getenv("AZURE_OPENAI_KEY"),
api_version = "2023-05-15",
azure_endpoint =os.getenv("AZURE_OPENAI_ENDPOINT")
)

response = client.embeddings.create(
input = "Your text string goes here",
model= "text-embedding-ada-002"
)

print(response.model_dump_json(indent=2))

I think this PR is for the <1. source: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=python

is there a particular reason why you're using this version? @mckeown12

I need this as well but I think we should go with the version 1 api.

@jamescalam
Copy link
Member

Hey @mckeown12 this is great, thanks for the PR!

I'm not super familiar w/ best practices for Azure OpenAI — but I'd also prefer whatever the most up-to-date approach is, if that is what you're recommending @stepanogil ?

@mckeown12
Copy link
Author

Thanks all for looking at this so quickly! @stepanogil I believe my PR is using the pinned openai library version 1.5.0 from the poetry.lock file so I guess I'm not sure where you're seeing <1 code. The init of AzureOpenAIEncoder instantiates a client in almost exactly like you showed. As a side note, we've been using my fork in an internal company project and are loving semantic-router so far!

@stepanogil
Copy link

@mckeown12
I was referring to the file you added: semantic_router/encoders/zure.py (is the filename a typo? :D) which contains the following:


assert (
self.api_key is not None
and self.deployment_name is not None
and self.azure_endpoint is not None
and self.api_version is not None
and self.model is not None
)

    try:
        self.client = openai.AzureOpenAI(
            azure_deployment=str(deployment_name),
            api_key=str(api_key),
            azure_endpoint=str(azure_endpoint),
            api_version=str(api_version),
            # _strict_response_validation=True,
        )
    except Exception as e:
        raise ValueError(f"OpenAI API client failed to initialize. Error: {e}")

deployment_name is no longer a required field to instantiate an Azure OpenAI client in OpenAI Python 1.x.
also api_type is also not used anymore.
only the following are the required params:

api_key
api_version
azure_endpoint

check the link I've provided and scroll down a little bit.
in that page, check the code sample provided for versions OpenAI Python 0.28.1 versus OpenAI Python 1.x.

https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=python-new

@mckeown12
Copy link
Author

So zure.py is more a nasty hack than a typo-- I was getting circular import issues because of Black formatting imports alphabetically to put azure ahead of base in semantic_router/encoders/__init__.py. If you have a better solution I'm all ears.

I guess azure_deployment is optional in 1.x because you can only have one deployment of a particular model at a particular endpoint and because you must pass model to client.embeddings.create? I can update the PR to reflect that. I also didn't handle any of the azure_ad_token pieces as I'm not sure how all that works...

@szelok
Copy link

szelok commented Jan 5, 2024

@jamescalam @simjak - my two cents. I think there will be more PRs to add custom embeddings (same will go with llm for dynamic routes too). Would it make sense to integrate with langchain embeddings? ie Add a LangChainEncoder and inject a langchain embeddings and delegate the real embed action to it - like below

from langchain_core.embeddings import Embeddings
from semantic_router.encoders import BaseEncoder

class LangChainEncoder(BaseEncoder):
    client: Embeddings
    ...
    def __call__(self, docs: list[str]) -> list[list[float]]:
        return client.embed_documents(docs)
        ...

Usage

from langchain.embeddings import AzureOpenAIEmbeddings
from semantic_router.encoder import LangChainEncoder

az_em=AzureOpenAIEmbeddings(...)
az_encoder = LangChainEncoder(az_em)

I think it will help you put your effort in 'semantic'

@stepanogil
Copy link

I guess azure_deployment is optional in 1.x because you can only have one deployment of a particular model at a particular endpoint and because you must pass model to client.embeddings.create? I can update the PR to reflect that. I also didn't handle any of the azure_ad_token pieces as I'm not sure how all that works...

it's my guess as well. i haven't tried deploying 2 instances of the same model in the same resource.

@jamescalam
Copy link
Member

@mckeown12 I'm not sure about the circular imports. We should find a better way of handling that. I will test, in the current state does this PR for AzureOpenAI work for you both @mckeown12 and @stepanogil ?

@jamescalam @simjak - my two cents. I think there will be more PRs to add custom embeddings (same will go with llm for dynamic routes too). Would it make sense to integrate with langchain embeddings? ie Add a LangChainEncoder and inject a langchain embeddings and delegate the real embed action to it - like below

...

I think it will help you put your effort in 'semantic'

@szelok I think you have a good point here. We can add support for the "essential" encoders, that being openai+azureopenai, cohere, HF, and fastembed, and the rest we can support indirectly via langchain. We'll get that in soon

@stepanogil
Copy link

@jamescalam i think @mckeown12 is still making adjustments to make azure_deployment optional when instantiating an Azure OpenAI client

@mckeown12
Copy link
Author

@jamescalam i think @mckeown12 is still making adjustments to make azure_deployment optional when instantiating an Azure OpenAI client

Just finished this and updated the PR. Let me know how y'all want to handle the circular imports issue with the ordering in semantic_router/encoders/__init__.py and I can update accordingly.

@jamescalam jamescalam changed the title Add AzureOpenAIEncoder feat: Add AzureOpenAIEncoder Jan 8, 2024
@jamescalam jamescalam assigned jamescalam and ashraq1455 and unassigned jamescalam Jan 8, 2024
Copy link
Member

@jamescalam jamescalam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mckeown12 nice! I ran the tests and linters, could you get those working? @ashraq1455 is going to look into the circular import issue and see if we can find a good solution, thanks :)

@mckeown12
Copy link
Author

@jamescalam, thanks for reviewing! I just merged main again, ran the linter, and got the tests passing locally. Not sure how to rerun the workflows though... Is that something y'all have to trigger?

Copy link

codecov bot commented Jan 9, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (cc9cc6b) 88.62% compared to head (3bf5c22) 89.03%.

Files Patch % Lines
semantic_router/encoders/zure.py 94.44% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #73      +/-   ##
==========================================
+ Coverage   88.62%   89.03%   +0.41%     
==========================================
  Files          23       24       +1     
  Lines         967     1040      +73     
==========================================
+ Hits          857      926      +69     
- Misses        110      114       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@jamescalam jamescalam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay we made it! @mckeown12 this is epic thankyou for the PR 🔥

@jamescalam jamescalam merged commit 666b361 into aurelio-labs:main Jan 13, 2024
8 checks passed
@arashaga
Copy link

arashaga commented Jan 16, 2024

When I run the Dynamic Routes it does not work with AzureOpenAI it appears that only OpenAILLM is supported there can we have a fix for this? you can reproduce the issue with running example notebook for Dynamic Routes. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants