ollama
diff --git a/‎README.md
Lines changed: 79 additions & 60 deletions b/‎README.md
Lines changed: 79 additions & 60 deletions
diff --git a/‎examples/README.md
Lines changed: 57 additions & 0 deletions b/‎examples/README.md
Lines changed: 57 additions & 0 deletions
diff --git a/‎examples/async-chat-stream/README.md
Lines changed: 0 additions & 3 deletions b/‎examples/async-chat-stream/README.md
Lines changed: 0 additions & 3 deletions
diff --git a/‎examples/async-chat-stream/main.py
Lines changed: 0 additions & 59 deletions b/‎examples/async-chat-stream/main.py
Lines changed: 0 additions & 59 deletions
diff --git a/‎examples/async-chat.py
Lines changed: 19 additions & 0 deletions b/‎examples/async-chat.py
Lines changed: 19 additions & 0 deletions
diff --git a/‎examples/async-generate.py
Lines changed: 15 additions & 0 deletions b/‎examples/async-generate.py
Lines changed: 15 additions & 0 deletions
@@ -2,6 +2,12 @@
 
 The Ollama Python library provides the easiest way to integrate Python 3.8+ projects with [Ollama](https://github.com/ollama/ollama).
 
+## Prerequisites
+
+- [Ollama](https://ollama.com/download) should be installed and running
+- Pull a model to use with the library: `ollama pull <model>` e.g. `ollama pull llama3.2`
+  - See [Ollama.com](https://ollama.com/search) for more information on the models available.
+
 ## Install
 
 ```sh
@@ -11,25 +17,34 @@ pip install ollama
 ## Usage
 
 ```python
-import ollama
-response = ollama.chat(model='llama3.1', messages=[
+from ollama import chat
+from ollama import ChatResponse
+
+response: ChatResponse = chat(model='llama3.2', messages=[
   {
     'role': 'user',
     'content': 'Why is the sky blue?',
   },
 ])
 print(response['message']['content'])
+# or access fields directly from the response object
+print(response.message.content)
 ```
 
+See [_types.py](ollama/_types.py) for more information on the response types.
+
 ## Streaming responses
 
-Response streaming can be enabled by setting `stream=True`, modifying function calls to return a Python generator where each part is an object in the stream.
+Response streaming can be enabled by setting `stream=True`.
+
+> [!NOTE]
+> Streaming Tool/Function calling is not yet supported.
 
 ```python
-import ollama
+from ollama import chat
 
-stream = ollama.chat(
-    model='llama3.1',
+stream = chat(
+    model='llama3.2',
     messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
     stream=True,
 )
@@ -38,20 +53,68 @@ for chunk in stream:
   print(chunk['message']['content'], end='', flush=True)
 ```
 
+## Custom client
+A custom client can be created by instantiating `Client` or `AsyncClient` from `ollama`.
+
+All extra keyword arguments are passed into the [`httpx.Client`](https://www.python-httpx.org/api/#client).
+
+```python
+from ollama import Client
+client = Client(
+  host='http://localhost:11434',
+  headers={'x-some-header': 'some-value'}
+)
+response = client.chat(model='llama3.2', messages=[
+  {
+    'role': 'user',
+    'content': 'Why is the sky blue?',
+  },
+])
+```
+
+## Async client
+
+The `AsyncClient` class is used to make asynchronous requests. It can be configured with the same fields as the `Client` class.
+
+```python
+import asyncio
+from ollama import AsyncClient
+
+async def chat():
+  message = {'role': 'user', 'content': 'Why is the sky blue?'}
+  response = await AsyncClient().chat(model='llama3.2', messages=[message])
+
+asyncio.run(chat())
+```
+
+Setting `stream=True` modifies functions to return a Python asynchronous generator:
+
+```python
+import asyncio
+from ollama import AsyncClient
+
+async def chat():
+  message = {'role': 'user', 'content': 'Why is the sky blue?'}
+  async for part in await AsyncClient().chat(model='llama3.2', messages=[message], stream=True):
+    print(part['message']['content'], end='', flush=True)
+
+asyncio.run(chat())
+```
+
 ## API
 
 The Ollama Python library's API is designed around the [Ollama REST API](https://github.com/ollama/ollama/blob/main/docs/api.md)
 
 ### Chat
 
 ```python
-ollama.chat(model='llama3.1', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
+ollama.chat(model='llama3.2', messages=[{'role': 'user', 'content': 'Why is the sky blue?'}])
 ```
 
 ### Generate
 
 ```python
-ollama.generate(model='llama3.1', prompt='Why is the sky blue?')
+ollama.generate(model='llama3.2', prompt='Why is the sky blue?')
 ```
 
 ### List
@@ -63,14 +126,14 @@ ollama.list()
 ### Show
 
 ```python
-ollama.show('llama3.1')
+ollama.show('llama3.2')
 ```
 
 ### Create
 
 ```python
 modelfile='''
-FROM llama3.1
+FROM llama3.2
 SYSTEM You are mario from super mario bros.
 '''
 
@@ -80,37 +143,37 @@ ollama.create(model='example', modelfile=modelfile)
 ### Copy
 
 ```python
-ollama.copy('llama3.1', 'user/llama3.1')
+ollama.copy('llama3.2', 'user/llama3.2')
 ```
 
 ### Delete
 
 ```python
-ollama.delete('llama3.1')
+ollama.delete('llama3.2')
 ```
 
 ### Pull
 
 ```python
-ollama.pull('llama3.1')
+ollama.pull('llama3.2')
 ```
 
 ### Push
 
 ```python
-ollama.push('user/llama3.1')
+ollama.push('user/llama3.2')
 ```
 
 ### Embed
 
 ```python
-ollama.embed(model='llama3.1', input='The sky is blue because of rayleigh scattering')
+ollama.embed(model='llama3.2', input='The sky is blue because of rayleigh scattering')
 ```
 
 ### Embed (batch)
 
 ```python
-ollama.embed(model='llama3.1', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
+ollama.embed(model='llama3.2', input=['The sky is blue because of rayleigh scattering', 'Grass is green because of chlorophyll'])
 ```
 
 ### Ps
@@ -119,50 +182,6 @@ ollama.embed(model='llama3.1', input=['The sky is blue because of rayleigh scatt
 ollama.ps()
 ```
 
-## Custom client
-
-A custom client can be created with the following fields:
-
-- `host`: The Ollama host to connect to
-- `timeout`: The timeout for requests
-
-```python
-from ollama import Client
-client = Client(host='http://localhost:11434')
-response = client.chat(model='llama3.1', messages=[
-  {
-    'role': 'user',
-    'content': 'Why is the sky blue?',
-  },
-])
-```
-
-## Async client
-
-```python
-import asyncio
-from ollama import AsyncClient
-
-async def chat():
-  message = {'role': 'user', 'content': 'Why is the sky blue?'}
-  response = await AsyncClient().chat(model='llama3.1', messages=[message])
-
-asyncio.run(chat())
-```
-
-Setting `stream=True` modifies functions to return a Python asynchronous generator:
-
-```python
-import asyncio
-from ollama import AsyncClient
-
-async def chat():
-  message = {'role': 'user', 'content': 'Why is the sky blue?'}
-  async for part in await AsyncClient().chat(model='llama3.1', messages=[message], stream=True):
-    print(part['message']['content'], end='', flush=True)
-
-asyncio.run(chat())
-```
 
 ## Errors
 
 
@@ -0,0 +1,57 @@
+# Running Examples
+
+Run the examples in this directory with:
+```sh
+# Run example
+python3 examples/<example>.py
+```
+
+### Chat - Chat with a model
+- [chat.py](chat.py)
+- [async-chat.py](async-chat.py)
+- [chat-stream.py](chat-stream.py) - Streamed outputs
+- [chat-with-history.py](chat-with-history.py) - Chat with model and maintain history of the conversation
+
+
+### Generate - Generate text with a model
+- [generate.py](generate.py)
+- [async-generate.py](async-generate.py)
+- [generate-stream.py](generate-stream.py) - Streamed outputs
+- [fill-in-middle.py](fill-in-middle.py) - Given a prefix and suffix, fill in the middle
+
+
+### Tools/Function Calling - Call a function with a model
+- [tools.py](tools.py) - Simple example of Tools/Function Calling
+- [async-tools.py](async-tools.py)
+
+
+### Multimodal with Images - Chat with a multimodal (image chat) model
+- [multimodal_chat.py](multimodal_chat.py)
+- [multimodal_generate.py](multimodal_generate.py)
+
+
+### Ollama List - List all downloaded models and their properties
+- [list.py](list.py)
+
+
+### Ollama ps - Show model status with CPU/GPU usage
+- [ps.py](ps.py)
+
+
+### Ollama Pull - Pull a model from Ollama
+Requirement: `pip install tqdm`
+- [pull.py](pull.py) 
+
+
+### Ollama Create - Create a model from a Modelfile
+```python
+python create.py <model> <modelfile>
+```
+- [create.py](create.py) 
+
+See [ollama/docs/modelfile.md](https://github.com/ollama/ollama/blob/main/docs/modelfile.md) for more information on the Modelfile format.
+
+
+### Ollama Embed - Generate embeddings with a model
+- [embed.py](embed.py)
+
@@ -0,0 +1,19 @@
+import asyncio
+from ollama import AsyncClient
+
+
+async def main():
+  messages = [
+    {
+      'role': 'user',
+      'content': 'Why is the sky blue?',
+    },
+  ]
+
+  client = AsyncClient()
+  response = await client.chat('llama3.2', messages=messages)
+  print(response['message']['content'])
+
+
+if __name__ == '__main__':
+  asyncio.run(main())
@@ -0,0 +1,15 @@
+import asyncio
+import ollama
+
+
+async def main():
+  client = ollama.AsyncClient()
+  response = await client.generate('llama3.2', 'Why is the sky blue?')
+  print(response['response'])
+
+
+if __name__ == '__main__':
+  try:
+    asyncio.run(main())
+  except KeyboardInterrupt:
+    print('\nGoodbye!')