add lepton, quick start notebook, examples

hieuminh65 · Mar 17, 2024 · 83ab759 · 83ab759
1 parent d7d5a60
commit 83ab759
Show file tree

Hide file tree

Showing 8 changed files with 265 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -2,19 +2,20 @@
 Easy-to-use LLM API from a state-of-the-art provider and comparison.
 
 ## Features
-- **Easy-to-use**: A simple and easy-to-use API for state-of-the-art language models from different providers but in a same way.
-- **Comparison**: Compare the cost and performance of different providers and models.
+- **Easy-to-use**: A simple and easy-to-use API for state-of-the-art language models from different providers but using in a same way.
+- **Comparison**: Compare the cost and performance of different providers and models. Let you choose the best provider and model for your use case.
 - **Log**: Log the response and cost of the request in a log file.
 - **Providers**: Support for all of providers both open-source and closed-source.
 - **Result**: See the actual time taken by the request, especially when you dont't trust the benchmark.
 
 ## Installation
-1. Install the package
+
+#### 1. Install the package
 ```bash
 pip3 install api4all
 ```
 
-2. Create and activate a virtual environment (optional but recommended)
+#### 2. Create and activate a virtual environment (optional but recommended)
 - Unix / macOS
 ```bash
 python3 -m venv venv
@@ -27,7 +28,8 @@ python3 -m venv venv
 ```
 
 ## Quick Start
-1. Wrap the API key in a `.env` file of the provider you want to test.
+
+#### 1. Wrap the API keys in a `.env` file of the provider you want to test.
 ```bash
 TOGETHER_API_KEY=xxx
 OPENAI_API_KEY=xxx
@@ -41,7 +43,7 @@ export TOGETHER_API_KEY=xxx
 export OPENAI_API_KEY=xxx
 ```
 
-2. Run the code
+#### 2. Run the code
 ```python
 from api4all import EngineFactory
 
@@ -56,17 +58,19 @@ messages = [
 engine = EngineFactory.create_engine(provider="together", 
                                     model="google/gemma-7b-it", 
                                     messages=messages, 
-                                    temperature=0.5, 
-                                    max_tokens=256, 
+                                    temperature=0.9, 
+                                    max_tokens=1028, 
                                     )
 
 response = engine.generate_response()
 
 print(response)
 ```
 
-3. Check the [log file](logfile.log) for the response and the cost of the request.
-```bash
+- There are some examples in the [examples](examples) folder or <a href="https://colab.research.google.com/drive/1nMGqoWIkL2xLlaSE54vOHhpffaHpihY3?usp=sharing"><img src="img/colab.svg" alt="Open In Colab"></a> to test the examples in Google Colab.
+
+#### 3. Check the [log file](logfile.log) for the response and the cost of the request.
+```log
 Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
 	create at: 2024-03-15 16:38:18,129
 	INFO - SUCCESS
@@ -94,13 +98,16 @@ Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
 |  [Replicate](https://replicate.com)    |     Free to try  | 50 Requests / Second    | REPLICATE_API_KEY | "replicate"  |
 |  [Fireworks](https://fireworks.ai)     |     $1      | 600 Requests / Minute  |  FIREWORKS_API_KEY | "fireworks"  |  
 |  [Deepinfra](https://deepinfra.com)    |     Free to try     | 200 Concurrent request |  DEEPINFRA_API_KEY | "deepinfra"  |
+|  [Lepton](https://www.lepton.ai)    |     $10     | 10 Requests / Minute |  LEPTON_API_KEY | "lepton"  |
+|  ------    |     ------     |  ------ |  ------ |  ------  |
 |  [Google AI (Vertex AI)](https://ai.google.dev)    |     Unlimited     | 60 Requests / Minute | GOOGLE_API_KEY | "google"  |
 |  [OpenAI](http://openai.com)    |     &#x2715;     | 60 Requests / Minute | OPENAI_API_KEY | "openai"  |
 |  [Mistral AI](https://mistral.ai)    |     Free to try     | 5 Requests / Second | MISTRAL_API_KEY | "mistral"  |
 |  [Anthropic](https://www.anthropic.com)    |     Free to try     | 5 Requests / Minute | ANTHROPIC_API_KEY | "anthropic"  |
 
 
 - **Free to try**: Free to try, no credit card required but limited to a certain number of tokens.
+- Rate limit is based on the free plan of the provider. The actual rate limit may be different based on the plan you choose.
 
 ### Open-source models
   -- |Mixtral-8x7b-Instruct-v0.1 | Gemma 7B it |  Mistral-7B-Instruct-v0.1 | LLaMA2-70b |
@@ -115,6 +122,7 @@ Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
 |  [Replicate](https://replicate.com)    |     $0.3-$1       | &#x2715;       |  $0.05-$0.25 | $0.65-$2.75
 |  [Fireworks](https://fireworks.ai)     |     $0.5-$0.5        | $0.2-$0.2        |  $0.2-$0.2  | $0.9-$0.9
 |  [Deepinfra](https://deepinfra.com)    |     $0.27-$0.27    | &#x2715;    |   &#x2715; | $0.7-$0.9
+|  [Lepton](https://www.lepton.ai)    |     $0.5-$0.5    | &#x2715;    |   &#x2715; | $0.8-$0.8
 
 ### Closed-source models
 #### 1. Mistral AI
@@ -155,5 +163,15 @@ Request ID - fa8cebd0-265a-44b2-95d7-6ff1588d2c87
 |  Google Gemini 1.0 Pro  |     $0        | $0    |  32,768 | "google/gemini-1.0-pro" |
 
 
+
 ## Contributing
-Welcome to contribute to the project. If you see any updated pricing, new models, new providers, or any other changes, feel free to open an issue or a pull request.
+Welcome to contribute to the project. If you see any updated pricing, new models, new providers, or any other changes, feel free to open an issue or a pull request.
+
+
+## Problems from the providers and Solutions
+
+#### Error with Gemini pro 1.0
+```bash
+ValueError: The `response.text` quick accessor only works when the response contains a valid `Part`, but none was returned. Check the `candidate.safety_ratings` to see if the response was blocked.
+```
+**Solution**: The output is larger than your maximum tokens. Increase the `max_tokens`.
diff --git a/api4all/data/constant_data.py b/api4all/data/constant_data.py
@@ -91,6 +91,13 @@
                     "output": 0.27
                 }
             },
+            "lepton": {
+                "name": "mixtral-8x7b",
+                "price": {
+                    "input": 0.5,
+                    "output": 0.5
+                }
+            },
             "mistral": {
                 "name": "open-mistral-7b",
                 "price": {
@@ -323,6 +330,13 @@
                     "input": 0.7,
                     "output": 0.9
                 }
+            },
+            "lepton": {
+                "name": "llama2-70b",
+                "price": {
+                    "input": 0.8,
+                    "output": 0.8
+                }
             }
         },
         "context-length": 4096

diff --git a/api4all/engines/engines.py b/api4all/engines/engines.py
@@ -16,7 +16,7 @@
 from mistralai.client import MistralClient
 import google.generativeai as genai
 
-__all__ = ["GroqEngine", "AnyscaleEngine", "TogetherEngine", "FireworksEngine", "ReplicateEngine", "DeepinfraEngine", "OpenaiEngine", "AnthropicEngine", "MistralEngine"]
+__all__ = ["GroqEngine", "AnyscaleEngine", "TogetherEngine", "FireworksEngine", "ReplicateEngine", "DeepinfraEngine", "OpenaiEngine", "AnthropicEngine", "MistralEngine", "GoogleEngine"]
 
 
 #-----------------------------------------GROQ-----------------------------------------#
@@ -494,6 +494,83 @@ def generate_response(self,
         return response
 
 
+#-----------------------------------------Lepton-----------------------------------------#
+@EngineFactory.register_engine('lepton')
+class LeptonEngine(TextEngine):
+    def __init__(self,
+                model: str,
+                provider: str = "lepton",
+                temperature: Optional[float] = ModelConfig.DEFAULT_TEMPERATURE,
+                max_tokens: Optional[int] = ModelConfig.DEFAULT_MAX_TOKENS,
+                top_p: Optional[float] = ModelConfig.DEFAULT_TOP_P,
+                stop: Union[str, List[str], None] = ModelConfig.DEFAULT_STOP,
+                messages: Optional[List[Dict[str, str]]] = ModelConfig.MESSAGES_EXAMPLE
+                ) -> None:
+        super().__init__(model, provider, temperature, max_tokens, top_p, stop, messages)
+
+        self._api_key = self._keys.get_api_keys("LEPTON_API_KEY")
+        if self._api_key is None:
+            self.logger.error(f"API key not found for {self.provider}")
+            raise ValueError(f"API key not found for {self.provider}")
+
+        self._api_name = dataEngine.getAPIname(self.model, self.provider)
+
+        # Set up the client
+        self._set_up_client()
+
+
+    def _set_up_client(self):
+        self.client = openai.OpenAI(base_url=f"https://{self._api_name}.lepton.run/api/v1/",
+                                    api_key = self._api_key)
+
+
+    def generate_response(self,
+                        **kwargs: Any
+                        ) -> Union[str, None]:
+        """
+        This method is used to generate a response from the AI model.
+
+        """
+
+        start_time = time.time()
+
+        try:
+            completion = self.client.chat.completions.create(
+                messages=self.messages,
+                model=self._api_name,
+                temperature=self.temperature,
+                max_tokens=self.max_tokens,
+                top_p=self.top_p,
+                stop=self.stop
+            )
+        except Exception as e:
+            print(f"Error generating response: {e}")
+            self.logger.error(f"Error generating response of provider {self.provider}: {e}")
+            return None
+
+        actual_time = time.time() - start_time
+
+        content = completion.choices[0].message.content
+        input_tokens = completion.usage.prompt_tokens
+        output_tokens = completion.usage.completion_tokens
+        execution_time = None
+        cost = dataEngine.calculate_cost(self.provider, self.model, input_tokens, output_tokens)
+
+        response = TextResponse(
+            content=content,
+            cost=cost,
+            execution_time=execution_time,
+            actual_time=actual_time,
+            input_tokens=input_tokens,
+            output_tokens=output_tokens,
+            provider=self.provider
+        )
+
+        log_response(self.logger, "SUCCESS", response)
+
+        return response
+
+
 #-----------------------------------------OpenAI-----------------------------------------#
 @EngineFactory.register_engine('openai')
 class OpenaiEngine(TextEngine):

diff --git a/examples/.env.example b/examples/.env.example
@@ -0,0 +1,12 @@
+GROQ_API_KEY=gsk_asdsaxxxxxxxxxx
+OPENAI_API_KEY=sk_xxxxxxx
+ANYSCALE_API_KEY=esecret_xxxxxxxx
+OPENAI_API_KEY=sk-xxxxxxxxx
+GOOGLE_API_KEY=xxxxxxxxx
+MISTRAL_API_KEY=xxxxxxxxx
+TOGETHER_API_KEY=xxxxxxxxx
+ANTHROPIC_API_KEY=sk-xxxxxxxx
+FIREWORKS_API_KEY=rxxxxxxxx
+REPLICATE_API_KEY=rxxx
+LEPTON_API_KEY=xxxxxxxxx
+DEEPINFRA_API_KEY=xxxxxxx
diff --git a/examples/quick-start-with-env-file.py b/examples/quick-start-with-env-file.py
@@ -0,0 +1,30 @@
+from api4all import EngineFactory
+
+# All the API keys should be in the .env file in the same directory as this file
+
+messages = [
+    {"role": "system",
+    "content": "You are a helpful assistant for my Calculus class."},
+    {"role": "user",
+    "content": "What is the current status of the economy?"},
+    {"role": "assistant",
+    "content": "I'm sorry, but as a Calculus assistant, I don't have the ability to provide real-time economic updates. However, I can help you understand economic concepts from a mathematical perspective. For example, I can explain how calculus is used in economics for optimization and understanding change."},
+    {"role": "user",
+    "content": "Oh, I see. Can you explain how calculus is used in economics?"},
+    {"role": "assistant",
+    "content": "Sure! In economics, calculus is used for optimization. For example, businesses often want to maximize profits or minimize costs. With calculus, we can find the 'optimal' point by setting the derivative of the profit or cost function to zero and solving for the variable. Calculus is also used to understand how economic quantities change. For example, the derivative of a function gives the rate of change of the function, which can represent things like the change in cost for producing one more unit of a good (marginal cost), or the change in revenue from selling one more unit of a good (marginal revenue)."},
+    {"role": "user",
+    "content": "Interesting. Can you tell me more about the Fundamental Theorem of Calculus?"}
+]
+
+
+# engine = EngineFactory.create_engine(provider="google", model="google/gemini-1.0-pro", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
+engine = EngineFactory.create_engine(provider="together", model="mistralai/Mixtral-8x7B-Instruct-v0.1", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
+# engine = EngineFactory.create_engine(provider="anthropic", model="anthropic/claude-3-haiku", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
+# engine = EngineFactory.create_engine(provider="mistral", model="mistral/mistral-small-latest", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
+
+
+response = engine.generate_response()
+
+# See the response and also checkout the log in logfile.log
+print(response)
diff --git a/examples/quick-start.py b/examples/quick-start.py
@@ -0,0 +1,34 @@
+from api4all import EngineFactory
+import os
+
+os.environ["TOGETHER_API_KEY"] = "xxxxx" # Replace with your API key 
+os.environ["GOOGLE_API_KEY"] = "xxxxx"  
+os.environ["ANTHROPIC_API_KEY"] = "xxxxx"  
+os.environ["MISTRAL_API_KEY"] = "xxxxx" 
+
+messages = [
+    {"role": "system",
+    "content": "You are a helpful assistant for my Calculus class."},
+    {"role": "user",
+    "content": "What is the current status of the economy?"},
+    {"role": "assistant",
+    "content": "I'm sorry, but as a Calculus assistant, I don't have the ability to provide real-time economic updates. However, I can help you understand economic concepts from a mathematical perspective. For example, I can explain how calculus is used in economics for optimization and understanding change."},
+    {"role": "user",
+    "content": "Oh, I see. Can you explain how calculus is used in economics?"},
+    {"role": "assistant",
+    "content": "Sure! In economics, calculus is used for optimization. For example, businesses often want to maximize profits or minimize costs. With calculus, we can find the 'optimal' point by setting the derivative of the profit or cost function to zero and solving for the variable. Calculus is also used to understand how economic quantities change. For example, the derivative of a function gives the rate of change of the function, which can represent things like the change in cost for producing one more unit of a good (marginal cost), or the change in revenue from selling one more unit of a good (marginal revenue)."},
+    {"role": "user",
+    "content": "Interesting. Can you tell me more about the Fundamental Theorem of Calculus?"}
+]
+
+
+# engine = EngineFactory.create_engine(provider="google", model="google/gemini-1.0-pro", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
+engine = EngineFactory.create_engine(provider="together", model="mistralai/Mixtral-8x7B-Instruct-v0.1", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
+# engine = EngineFactory.create_engine(provider="anthropic", model="anthropic/claude-3-haiku", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
+# engine = EngineFactory.create_engine(provider="mistral", model="mistral/mistral-small-latest", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)
+
+
+response = engine.generate_response()
+
+# See the response and also checkout the log in logfile.log
+print(response)
diff --git a/img/colab.svg b/img/colab.svg
diff --git a/notebooks/api4all_quickstart.ipynb b/notebooks/api4all_quickstart.ipynb
@@ -0,0 +1,67 @@
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "UMV-mLLMafW1"
+      },
+      "outputs": [],
+      "source": [
+        "%pip install api4all -q"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "b_1w6U_cycvS"
+      },
+      "source": [
+        "## Run\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "N8MBeKVoahc9"
+      },
+      "outputs": [],
+      "source": [
+        "from api4all import EngineFactory\n",
+        "import os\n",
+        "os.environ[\"TOGETHER_API_KEY\"] = \"xxx\"\n",
+        "os.environ[\"MISTRAL_API_KEY\"] = \"xxxx\"\n",
+        "\n",
+        "messages = [\n",
+        "    {\"role\": \"system\",\n",
+        "    \"content\": \"You are a helpful assistent for the White House\"},\n",
+        "    {\"role\": \"user\",\n",
+        "    \"content\": \"What is the current status of the economy?\"}\n",
+        "]\n",
+        "\n",
+        "engine = EngineFactory.create_engine(provider=\"together\", model=\"google/gemma-7b-it\", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)\n",
+        "# engine = EngineFactory.create_engine(provider=\"mistral\", model=\"mistral/mistral-small-latest\", messages=messages, temperature=0.5, max_tokens=256, top_p=0.9, stop=None)\n",
+        "\n",
+        "response = engine.generate_response()\n",
+        "\n",
+        "# See the response and also check the logfile.log\n",
+        "print(response)"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}