Skip to content

Commit

Permalink
add lepton, quick start notebook, examples
Browse files Browse the repository at this point in the history
  • Loading branch information
hieuminh65 committed Mar 17, 2024
1 parent d7d5a60 commit 83ab759
Show file tree
Hide file tree
Showing 8 changed files with 265 additions and 12 deletions.
40 changes: 29 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,20 @@
Easy-to-use LLM API from a state-of-the-art provider and comparison.

## Features
- **Easy-to-use**: A simple and easy-to-use API for state-of-the-art language models from different providers but in a same way.
- **Comparison**: Compare the cost and performance of different providers and models.
- **Easy-to-use**: A simple and easy-to-use API for state-of-the-art language models from different providers but using in a same way.
- **Comparison**: Compare the cost and performance of different providers and models. Let you choose the best provider and model for your use case.
- **Log**: Log the response and cost of the request in a log file.
- **Providers**: Support for all of providers both open-source and closed-source.
- **Result**: See the actual time taken by the request, especially when you dont't trust the benchmark.

## Installation
1. Install the package

#### 1. Install the package
```bash
pip3 install api4all
```

2. Create and activate a virtual environment (optional but recommended)
#### 2. Create and activate a virtual environment (optional but recommended)
- Unix / macOS
```bash
python3 -m venv venv
Expand All @@ -27,7 +28,8 @@ python3 -m venv venv
```

## Quick Start
1. Wrap the API key in a `.env` file of the provider you want to test.

#### 1. Wrap the API keys in a `.env` file of the provider you want to test.
```bash
TOGETHER_API_KEY=xxx
OPENAI_API_KEY=xxx
Expand All @@ -41,7 +43,7 @@ export TOGETHER_API_KEY=xxx
export OPENAI_API_KEY=xxx
```

2. Run the code
#### 2. Run the code
```python
from api4all import EngineFactory

Expand All @@ -56,17 +58,19 @@ messages = [
engine = EngineFactory.create_engine(provider="together",
model="google/gemma-7b-it",
messages=messages,
temperature=0.5,
max_tokens=256,
temperature=0.9,
max_tokens=1028,
)

response = engine.generate_response()

print(response)
```

3. Check the [log file](logfile.log) for the response and the cost of the request.
```bash
- There are some examples in the [examples](examples) folder or <a href="https://colab.research.google.com/drive/1nMGqoWIkL2xLlaSE54vOHhpffaHpihY3?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a> to test the examples in Google Colab.

#### 3. Check the [log file](logfile.log) for the response and the cost of the request.
```log
Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
create at: 2024-03-15 16:38:18,129
INFO - SUCCESS
Expand Down Expand Up @@ -94,13 +98,16 @@ Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
| [Replicate](https://replicate.com) | Free to try | 50 Requests / Second | REPLICATE_API_KEY | "replicate" |
| [Fireworks](https://fireworks.ai) | $1 | 600 Requests / Minute | FIREWORKS_API_KEY | "fireworks" |
| [Deepinfra](https://deepinfra.com) | Free to try | 200 Concurrent request | DEEPINFRA_API_KEY | "deepinfra" |
| [Lepton](https://www.lepton.ai) | $10 | 10 Requests / Minute | LEPTON_API_KEY | "lepton" |
| ------ | ------ | ------ | ------ | ------ |
| [Google AI (Vertex AI)](https://ai.google.dev) | Unlimited | 60 Requests / Minute | GOOGLE_API_KEY | "google" |
| [OpenAI](http://openai.com) | &#x2715; | 60 Requests / Minute | OPENAI_API_KEY | "openai" |
| [Mistral AI](https://mistral.ai) | Free to try | 5 Requests / Second | MISTRAL_API_KEY | "mistral" |
| [Anthropic](https://www.anthropic.com) | Free to try | 5 Requests / Minute | ANTHROPIC_API_KEY | "anthropic" |


- **Free to try**: Free to try, no credit card required but limited to a certain number of tokens.
- Rate limit is based on the free plan of the provider. The actual rate limit may be different based on the plan you choose.

### Open-source models
-- |Mixtral-8x7b-Instruct-v0.1 | Gemma 7B it | Mistral-7B-Instruct-v0.1 | LLaMA2-70b |
Expand All @@ -115,6 +122,7 @@ Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
| [Replicate](https://replicate.com) | $0.3-$1 | &#x2715; | $0.05-$0.25 | $0.65-$2.75
| [Fireworks](https://fireworks.ai) | $0.5-$0.5 | $0.2-$0.2 | $0.2-$0.2 | $0.9-$0.9
| [Deepinfra](https://deepinfra.com) | $0.27-$0.27 | &#x2715; | &#x2715; | $0.7-$0.9
| [Lepton](https://www.lepton.ai) | $0.5-$0.5 | &#x2715; | &#x2715; | $0.8-$0.8

### Closed-source models
#### 1. Mistral AI
Expand Down Expand Up @@ -155,5 +163,15 @@ Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
| Google Gemini 1.0 Pro | $0 | $0 | 32,768 | "google/gemini-1.0-pro" |



## Contributing
Welcome to contribute to the project. If you see any updated pricing, new models, new providers, or any other changes, feel free to open an issue or a pull request.
Welcome to contribute to the project. If you see any updated pricing, new models, new providers, or any other changes, feel free to open an issue or a pull request.


## Problems from the providers and Solutions

#### Error with Gemini pro 1.0
```bash
ValueError: The `response.text` quick accessor only works when the response contains a valid `Part`, but none was returned. Check the `candidate.safety_ratings` to see if the response was blocked.
```
**Solution**: The output is larger than your maximum tokens. Increase the `max_tokens`.
14 changes: 14 additions & 0 deletions api4all/data/constant_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,13 @@
"output": 0.27
}
},
"lepton": {
"name": "mixtral-8x7b",
"price": {
"input": 0.5,
"output": 0.5
}
},
"mistral": {
"name": "open-mistral-7b",
"price": {
Expand Down Expand Up @@ -323,6 +330,13 @@
"input": 0.7,
"output": 0.9
}
},
"lepton": {
"name": "llama2-70b",
"price": {
"input": 0.8,
"output": 0.8
}
}
},
"context-length": 4096
Expand Down
79 changes: 78 additions & 1 deletion api4all/engines/engines.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from mistralai.client import MistralClient
import google.generativeai as genai

__all__ = ["GroqEngine", "AnyscaleEngine", "TogetherEngine", "FireworksEngine", "ReplicateEngine", "DeepinfraEngine", "OpenaiEngine", "AnthropicEngine", "MistralEngine"]
__all__ = ["GroqEngine", "AnyscaleEngine", "TogetherEngine", "FireworksEngine", "ReplicateEngine", "DeepinfraEngine", "OpenaiEngine", "AnthropicEngine", "MistralEngine", "GoogleEngine"]


#-----------------------------------------GROQ-----------------------------------------#
Expand Down Expand Up @@ -494,6 +494,83 @@ def generate_response(self,
return response


#-----------------------------------------Lepton-----------------------------------------#
@EngineFactory.register_engine('lepton')
class LeptonEngine(TextEngine):
def __init__(self,
model: str,
provider: str = "lepton",
temperature: Optional[float] = ModelConfig.DEFAULT_TEMPERATURE,
max_tokens: Optional[int] = ModelConfig.DEFAULT_MAX_TOKENS,
top_p: Optional[float] = ModelConfig.DEFAULT_TOP_P,
stop: Union[str, List[str], None] = ModelConfig.DEFAULT_STOP,
messages: Optional[List[Dict[str, str]]] = ModelConfig.MESSAGES_EXAMPLE
) -> None:
super().__init__(model, provider, temperature, max_tokens, top_p, stop, messages)

self._api_key = self._keys.get_api_keys("LEPTON_API_KEY")
if self._api_key is None:
self.logger.error(f"API key not found for {self.provider}")
raise ValueError(f"API key not found for {self.provider}")

self._api_name = dataEngine.getAPIname(self.model, self.provider)

# Set up the client
self._set_up_client()


def _set_up_client(self):
self.client = openai.OpenAI(base_url=f"https://{self._api_name}.lepton.run/api/v1/",
api_key = self._api_key)


def generate_response(self,
**kwargs: Any
) -> Union[str, None]:
"""
This method is used to generate a response from the AI model.
"""

start_time = time.time()

try:
completion = self.client.chat.completions.create(
messages=self.messages,
model=self._api_name,
temperature=self.temperature,
max_tokens=self.max_tokens,
top_p=self.top_p,
stop=self.stop
)
except Exception as e:
print(f"Error generating response: {e}")
self.logger.error(f"Error generating response of provider {self.provider}: {e}")
return None

actual_time = time.time() - start_time

content = completion.choices[0].message.content
input_tokens = completion.usage.prompt_tokens
output_tokens = completion.usage.completion_tokens
execution_time = None
cost = dataEngine.calculate_cost(self.provider, self.model, input_tokens, output_tokens)

response = TextResponse(
content=content,
cost=cost,
execution_time=execution_time,
actual_time=actual_time,
input_tokens=input_tokens,
output_tokens=output_tokens,
provider=self.provider
)

log_response(self.logger, "SUCCESS", response)

return response


#-----------------------------------------OpenAI-----------------------------------------#
@EngineFactory.register_engine('openai')
class OpenaiEngine(TextEngine):
Expand Down
12 changes: 12 additions & 0 deletions examples/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
GROQ_API_KEY=gsk_asdsaxxxxxxxxxx
OPENAI_API_KEY=sk_xxxxxxx
ANYSCALE_API_KEY=esecret_xxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxx
GOOGLE_API_KEY=xxxxxxxxx
MISTRAL_API_KEY=xxxxxxxxx
TOGETHER_API_KEY=xxxxxxxxx
ANTHROPIC_API_KEY=sk-xxxxxxxx
FIREWORKS_API_KEY=rxxxxxxxx
REPLICATE_API_KEY=rxxx
LEPTON_API_KEY=xxxxxxxxx
DEEPINFRA_API_KEY=xxxxxxx
30 changes: 30 additions & 0 deletions examples/quick-start-with-env-file.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
from api4all import EngineFactory

# All the API keys should be in the .env file in the same directory as this file

messages = [
{"role": "system",
"content": "You are a helpful assistant for my Calculus class."},
{"role": "user",
"content": "What is the current status of the economy?"},
{"role": "assistant",
"content": "I'm sorry, but as a Calculus assistant, I don't have the ability to provide real-time economic updates. However, I can help you understand economic concepts from a mathematical perspective. For example, I can explain how calculus is used in economics for optimization and understanding change."},
{"role": "user",
"content": "Oh, I see. Can you explain how calculus is used in economics?"},
{"role": "assistant",
"content": "Sure! In economics, calculus is used for optimization. For example, businesses often want to maximize profits or minimize costs. With calculus, we can find the 'optimal' point by setting the derivative of the profit or cost function to zero and solving for the variable. Calculus is also used to understand how economic quantities change. For example, the derivative of a function gives the rate of change of the function, which can represent things like the change in cost for producing one more unit of a good (marginal cost), or the change in revenue from selling one more unit of a good (marginal revenue)."},
{"role": "user",
"content": "Interesting. Can you tell me more about the Fundamental Theorem of Calculus?"}
]


# engine = EngineFactory.create_engine(provider="google", model="google/gemini-1.0-pro", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
engine = EngineFactory.create_engine(provider="together", model="mistralai/Mixtral-8x7B-Instruct-v0.1", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
# engine = EngineFactory.create_engine(provider="anthropic", model="anthropic/claude-3-haiku", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
# engine = EngineFactory.create_engine(provider="mistral", model="mistral/mistral-small-latest", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)


response = engine.generate_response()

# See the response and also checkout the log in logfile.log
print(response)
34 changes: 34 additions & 0 deletions examples/quick-start.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
from api4all import EngineFactory
import os

os.environ["TOGETHER_API_KEY"] = "xxxxx" # Replace with your API key
os.environ["GOOGLE_API_KEY"] = "xxxxx"
os.environ["ANTHROPIC_API_KEY"] = "xxxxx"
os.environ["MISTRAL_API_KEY"] = "xxxxx"

messages = [
{"role": "system",
"content": "You are a helpful assistant for my Calculus class."},
{"role": "user",
"content": "What is the current status of the economy?"},
{"role": "assistant",
"content": "I'm sorry, but as a Calculus assistant, I don't have the ability to provide real-time economic updates. However, I can help you understand economic concepts from a mathematical perspective. For example, I can explain how calculus is used in economics for optimization and understanding change."},
{"role": "user",
"content": "Oh, I see. Can you explain how calculus is used in economics?"},
{"role": "assistant",
"content": "Sure! In economics, calculus is used for optimization. For example, businesses often want to maximize profits or minimize costs. With calculus, we can find the 'optimal' point by setting the derivative of the profit or cost function to zero and solving for the variable. Calculus is also used to understand how economic quantities change. For example, the derivative of a function gives the rate of change of the function, which can represent things like the change in cost for producing one more unit of a good (marginal cost), or the change in revenue from selling one more unit of a good (marginal revenue)."},
{"role": "user",
"content": "Interesting. Can you tell me more about the Fundamental Theorem of Calculus?"}
]


# engine = EngineFactory.create_engine(provider="google", model="google/gemini-1.0-pro", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
engine = EngineFactory.create_engine(provider="together", model="mistralai/Mixtral-8x7B-Instruct-v0.1", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
# engine = EngineFactory.create_engine(provider="anthropic", model="anthropic/claude-3-haiku", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
# engine = EngineFactory.create_engine(provider="mistral", model="mistral/mistral-small-latest", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)


response = engine.generate_response()

# See the response and also checkout the log in logfile.log
print(response)
1 change: 1 addition & 0 deletions img/colab.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
67 changes: 67 additions & 0 deletions notebooks/api4all_quickstart.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "UMV-mLLMafW1"
},
"outputs": [],
"source": [
"%pip install api4all -q"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b_1w6U_cycvS"
},
"source": [
"## Run\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "N8MBeKVoahc9"
},
"outputs": [],
"source": [
"from api4all import EngineFactory\n",
"import os\n",
"os.environ[\"TOGETHER_API_KEY\"] = \"xxx\"\n",
"os.environ[\"MISTRAL_API_KEY\"] = \"xxxx\"\n",
"\n",
"messages = [\n",
" {\"role\": \"system\",\n",
" \"content\": \"You are a helpful assistent for the White House\"},\n",
" {\"role\": \"user\",\n",
" \"content\": \"What is the current status of the economy?\"}\n",
"]\n",
"\n",
"engine = EngineFactory.create_engine(provider=\"together\", model=\"google/gemma-7b-it\", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)\n",
"# engine = EngineFactory.create_engine(provider=\"mistral\", model=\"mistral/mistral-small-latest\", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)\n",
"\n",
"response = engine.generate_response()\n",
"\n",
"# See the response and also check the logfile.log\n",
"print(response)"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

0 comments on commit 83ab759

Please sign in to comment.