Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
274 changes: 274 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,19 @@ Stream GPT-5 chats with the Responses API, initiate Realtime WebRTC conversation
- [Vision in a thread](#vision-in-a-thread)
- [Runs involving function tools](#runs-involving-function-tools)
- [Exploring chunks used in File Search](#exploring-chunks-used-in-file-search)
- [Evals](#evals)
- [Create an Eval](#create-an-eval)
- [Retrieve an Eval](#retrieve-an-eval)
- [List Evals](#list-evals)
- [Update an Eval](#update-an-eval)
- [Delete an Eval](#delete-an-eval)
- [Create an Eval Run](#create-an-eval-run)
- [List Eval Runs](#list-eval-runs)
- [Retrieve an Eval Run](#retrieve-an-eval-run)
- [Cancel an Eval Run](#cancel-an-eval-run)
- [Delete an Eval Run](#delete-an-eval-run)
- [List Output Items](#list-output-items)
- [Retrieve an Output Item](#retrieve-an-output-item)
- [Image Generation](#image-generation)
- [DALL·E 2](#dalle-2)
- [DALL·E 3](#dalle-3)
Expand Down Expand Up @@ -1669,6 +1682,267 @@ end.compact
client.messages.list(thread_id: thread_id)
```

### Evals

The [Evals API](https://platform.openai.com/docs/api-reference/evals) allows you to systematically evaluate the quality and performance of your AI models.

**Supported Endpoints:**
- `POST /v1/evals` - Create an evaluation
- `GET /v1/evals/{id}` - Retrieve an evaluation
- `GET /v1/evals` - List evaluations
- `POST /v1/evals/{id}` - Update an evaluation
- `DELETE /v1/evals/{id}` - Delete an evaluation
- `POST /v1/evals/{id}/runs` - Create an evaluation run
- `GET /v1/evals/{id}/runs/{run_id}` - Retrieve an evaluation run
- `GET /v1/evals/{id}/runs` - List evaluation runs
- `POST /v1/evals/{id}/runs/{run_id}/cancel` - Cancel an evaluation run
- `DELETE /v1/evals/{id}/runs/{run_id}` - Delete an evaluation run
- `GET /v1/evals/{id}/runs/{run_id}/output_items` - List output items
- `GET /v1/evals/{id}/runs/{run_id}/output_items/{item_id}` - Retrieve an output item

#### Create an Eval

Create an evaluation with testing criteria to assess model outputs:

```ruby
response = client.evals.create(
parameters: {
name: "Sentiment Analysis Eval",
data_source_config: {
type: "stored_completions",
metadata: { usecase: "chatbot" }
},
testing_criteria: [
{
type: "label_model",
model: "o3-mini",
input: [
{
role: "developer",
content: "Classify the sentiment of the following statement as one of 'positive', 'neutral', or 'negative'"
},
{
role: "user",
content: "Statement: {{item.input}}"
}
],
passing_labels: ["positive"],
labels: ["positive", "neutral", "negative"],
name: "Sentiment grader"
}
],
metadata: { team: "product", version: "1.0" }
}
)
puts response["id"]
# => "eval_abc123"
```

#### Retrieve an Eval

Get details about a specific evaluation:

```ruby
eval_id = "eval_abc123"
response = client.evals.retrieve(id: eval_id)
puts response["name"]
# => "Sentiment Analysis Eval"
```

#### List Evals

List all evaluations with optional pagination:

```ruby
# List all evals
response = client.evals.list

# List with limit
response = client.evals.list(parameters: { limit: 10 })

# List with pagination
response = client.evals.list(parameters: { after: "eval_abc123", limit: 20 })
```

#### Update an Eval

Update an evaluation's metadata:

```ruby
response = client.evals.update(
id: eval_id,
parameters: {
metadata: { version: "2.0", updated: "true" }
}
)
```

#### Delete an Eval

Delete an evaluation:

```ruby
response = client.evals.delete(id: eval_id)
puts response["deleted"]
# => true
```

#### Create an Eval Run

Run an evaluation against a model with test data:

```ruby
response = client.evals.runs.create(
eval_id: eval_id,
parameters: {
name: "gpt-4o-mini baseline",
data_source: {
type: "completions",
input_messages: {
type: "template",
template: [
{
role: "system",
content: "You are a sentiment analyzer. Respond with only: positive, neutral, or negative."
},
{
role: "user",
content: "{{item.input}}"
}
]
},
sampling_params: {
temperature: 0.7,
max_completion_tokens: 50,
top_p: 1.0
},
model: "gpt-4o-mini",
source: {
type: "file_content",
content: [
{
item: {
input: "I absolutely love this product! Best purchase ever.",
ground_truth: "positive"
}
},
{
item: {
input: "This is terrible. Very disappointed.",
ground_truth: "negative"
}
},
{
item: {
input: "It's okay, nothing special.",
ground_truth: "neutral"
}
}
]
}
},
metadata: { experiment: "baseline", date: "2024-01-15" }
}
)
puts response["id"]
# => "evalrun_xyz789"
```

#### List Eval Runs

List all runs for a specific evaluation:

```ruby
# List all runs
response = client.evals.runs.list(eval_id: eval_id)

# List with limit
response = client.evals.runs.list(
eval_id: eval_id,
parameters: { limit: 10 }
)

# List with pagination
response = client.evals.runs.list(
eval_id: eval_id,
parameters: { after: "evalrun_abc123", limit: 20 }
)
```

#### Retrieve an Eval Run

Get details about a specific evaluation run:

```ruby
run_id = "evalrun_xyz789"
response = client.evals.runs.retrieve(
eval_id: eval_id,
id: run_id
)
puts response["status"]
# => "completed"
```

#### Cancel an Eval Run

Cancel a running evaluation:

```ruby
response = client.evals.runs.cancel(
eval_id: eval_id,
id: run_id
)
puts response["status"]
# => "canceled"
```

#### Delete an Eval Run

Delete an evaluation run:

```ruby
response = client.evals.runs.delete(
eval_id: eval_id,
id: run_id
)
puts response["deleted"]
# => true
```

#### List Output Items

Retrieve the output items from an evaluation run:

```ruby
# List all output items
response = client.evals.runs.output_items.list(
eval_id: eval_id,
run_id: run_id
)

# List with pagination
response = client.evals.runs.output_items.list(
eval_id: eval_id,
run_id: run_id,
parameters: { limit: 10, after: "item_abc123" }
)
```

#### Retrieve an Output Item

Get details about a specific output item:

```ruby
output_item_id = "item_abc123"
response = client.evals.runs.output_items.retrieve(
eval_id: eval_id,
run_id: run_id,
id: output_item_id
)
puts response["status"]
# => "pass"
```

### Image Generation

Generate images using DALL·E 2 or DALL·E 3!
Expand Down
1 change: 1 addition & 0 deletions lib/openai.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
require_relative "openai/batches"
require_relative "openai/usage"
require_relative "openai/conversations"
require_relative "openai/evals"

module OpenAI
class Error < StandardError; end
Expand Down
4 changes: 4 additions & 0 deletions lib/openai/client.rb
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,10 @@ def conversations
@conversations ||= OpenAI::Conversations.new(client: self)
end

def evals
@evals ||= OpenAI::Evals.new(client: self)
end

def azure?
@api_type&.to_sym == :azure
end
Expand Down
75 changes: 75 additions & 0 deletions lib/openai/evals.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
module OpenAI
class Evals
def initialize(client:)
@client = client
end

def create(parameters: {})
@client.json_post(path: "/evals", parameters: parameters)
end

def retrieve(id:)
@client.get(path: "/evals/#{id}")
end

def update(id:, parameters: {})
@client.json_post(path: "/evals/#{id}", parameters: parameters)
end

def delete(id:)
@client.delete(path: "/evals/#{id}")
end

def list(parameters: {})
@client.get(path: "/evals", parameters: parameters)
end

def runs
@runs ||= Runs.new(client: @client)
end

class Runs
def initialize(client:)
@client = client
end

def create(eval_id:, parameters: {})
@client.json_post(path: "/evals/#{eval_id}/runs", parameters: parameters)
end

def retrieve(eval_id:, id:)
@client.get(path: "/evals/#{eval_id}/runs/#{id}")
end

def list(eval_id:, parameters: {})
@client.get(path: "/evals/#{eval_id}/runs", parameters: parameters)
end

def cancel(eval_id:, id:)
@client.post(path: "/evals/#{eval_id}/runs/#{id}/cancel")
end

def delete(eval_id:, id:)
@client.delete(path: "/evals/#{eval_id}/runs/#{id}")
end

def output_items
@output_items ||= OutputItems.new(client: @client)
end

class OutputItems
def initialize(client:)
@client = client
end

def list(eval_id:, run_id:, parameters: {})
@client.get(path: "/evals/#{eval_id}/runs/#{run_id}/output_items", parameters: parameters)
end

def retrieve(eval_id:, run_id:, id:)
@client.get(path: "/evals/#{eval_id}/runs/#{run_id}/output_items/#{id}")
end
end
end
end
end
Loading