Anthropic Parallel API Processor

Overview

This tool is designed to generate datasets faster and more cost-effectively by parallel calling the Anthropic API with optional caching for Claude. It's perfect for processing large volumes of requests while respecting rate limits.

A key feature of this processor is its approach to token estimation. Since there's no open-source tokenizer model available for the new Claude models, we use an older tokenizer to make initial estimates. Once we receive a response from the API, we update these estimates with the actual token usage. This adaptive approach allows us to maintain efficient processing while adhering to Anthropic's rate limits.

Key Features

Parallel Processing: Maximize throughput with concurrent API requests.
Rate Limiting: Stay within Anthropic's API limits for requests and tokens.
Adaptive Token Estimation: Use initial estimates and update with actual usage.
Caching Support: Optional caching for efficient processing of repeated content.
Error Handling: Retry failed requests and log issues for easy debugging.
Memory Efficient: Stream requests from file to handle large datasets.

Quick Start

Installation

This project was developed using Python 3.10.13. For optimal compatibility and performance, it is strongly recommended that you use the same version.

Clone the repository:

git clone https://github.com/your-username/anthropic-parallel-processor.git

cd anthropic-parallel-processor

Create a virtual environment:

python -m venv .venv

For macOS/Linux:

source .venv/bin/activate

For Windows:

.venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Set up your Anthropic API key:

You have two options to set up your API key:

a. Create a .env file in the root of the project and add your API key:
```
ANTHROPIC_API_KEY=your-api-key-here
```
b. Or, set it as an environment variable in your terminal:
```
export ANTHROPIC_API_KEY=your-api-key-here
```
📎 Note: Replace 'your-api-key-here' with your actual Anthropic API key.

Usage

Without Caching

python api_request_parallel_processor.py \
--requests_filepath examples/test_requests_to_parallel_process.jsonl \
--save_filepath examples/data/test_requests_to_parallel_process_results.jsonl \
--request_url https://api.anthropic.com/v1/messages \
--max_requests_per_minute 40 \
--max_tokens_per_minute 16000 \
--max_attempts 5 \
--logging_level INFO

With Caching

python api_request_parallel_processor.py \
--requests_filepath examples/test_caching_requests_to_parallel_process.jsonl \
--save_filepath examples/data/test_caching_requests_to_parallel_process_results.jsonl \
--request_url https://api.anthropic.com/v1/messages \
--use_caching True \
--max_requests_per_minute 40 \
--max_tokens_per_minute 16000 \
--max_attempts 5 \
--logging_level INFO

Important Notes ⚠️

Cache Limitations

The minimum cacheable prompt length is:

1024 tokens for Claude 3.5 Sonnet and Claude 3 Opus
2048 tokens for Claude 3 Haiku

We suggest first testing the prompt manually until you see that caching works. In our script, there is an initial call for caching that will let you know if caching is used. If not, you will receive a warning and a 10-second delay before starting the parallel API calls, giving you time to kill the process if needed.

You can learn more about caching from the official Anthropic documentation: Prompt Caching

Rate Limits

Users need to be aware of and check their specific rate limits. The default settings in this script (40 requests per minute and 16,000 tokens per minute) are set to approximately 80% of the Tier 1 limits. However, based on your tier, there can be different rate limits.

To check your Tier and rate limits:

Go to the Anthropic Console
Navigate to the Settings tab
Look for the Limits tab in the sidebar on the left side
Here you can see Rate limits for all models and your current tier

You can read more about rate limits and tiers in the official Anthropic documentation: Anthropic API Rate Limits

Make sure to adjust the max_requests_per_minute and max_tokens_per_minute configuration options according to your specific tier and needs to ensure optimal performance and avoid hitting rate limits.

Input File Format

The input file should be a JSONL file where each line is a JSON object representing a single API request. Here's an example structure:

{"model": "claude-3-5-sonnet-20240620", "max_tokens": 1024, "messages": [{"role": "user", "content": "Tell me a joke"}], "metadata": {"row_id": 1}}

For caching, use the following structure:

{
  "model": "claude-3-5-sonnet-20240620",
  "max_tokens": 1024,
  "system": [
    {
      "type": "text",
      "text": "You are an AI assistant tasked with analyzing blogs."
    },
    {
      "type": "text",
      "text": "<blog content here>",
      "cache_control": {"type": "ephemeral"}
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": "Analyze the main themes of this blog."
    }
  ]
}

Generating Request Files

You can generate JSONL files for API requests using Python. The following examples demonstrate one approach for both non-caching and caching scenarios, but keep in mind that there are many ways to create these files depending on your specific needs and data sources. These examples are meant to serve as a starting point:

Without Caching

To generate a JSONL file for standard requests:

import json

filename = "examples/test_requests_to_parallel_process.jsonl"
n_requests = 10
jobs = [
    {
        "model": "claude-3-5-sonnet-20240620",
        "max_tokens": 1024,
        "temperature": 0,
        "messages": [
            {
                "role": "user",
                "content": f"How much is 8 * {x}? Return only the result.\n Result:",
            }
        ],
    }
    for x in range(n_requests)
]
with open(filename, "w") as f:
    for job in jobs:
        json_string = json.dumps(job)
        f.write(json_string + "\n")

With Caching

For requests utilizing caching:

import json

filename = "examples/test_caching_requests_to_parallel_process.jsonl"
queries = [
    "<query/instruction_1>",
    "<query/instruction_2>",
    # ...
]
jobs = [
    {
        "model": "claude-3-5-sonnet-20240620",
        "max_tokens": 1024,
        "temperature": 0,
        "system": [
            {
                "type": "text",
                "text": "You are an AI assistant tasked with... Your goal is to provide insightful information and knowledge.\n",
            },
            {
                "type": "text",
                "text": "<Large repetitive prompt you want to cache.>",
                "cache_control": {"type": "ephemeral"},
            },
        ],
        "messages": [
            {
                "role": "user",
                "content": query,
            }
        ],
    }
    for query in queries
]
with open(filename, "w") as f:
    for job in jobs:
        json_string = json.dumps(job)
        f.write(json_string + "\n")

Remember to replace <Large repetitive prompt you want to cache.> and <query/instruction_X> with your actual data.

Note on Metadata 💡

You can add a metadata key to each request object to include any additional information you want to associate with the request. This can be particularly useful for tracking or mapping requests to your dataset. For example:

"metadata": {"row_id": 1, "source": "dataset_A", "category": "science"}

The metadata will be preserved in the output, allowing you to easily map the results back to your original data or include any other relevant information for post-processing.

Configuration Options

requests_filepath: Path to the input JSONL file.
save_filepath: Path for the output JSONL file (optional).
request_url: Anthropic API endpoint (default: "https://api.anthropic.com/v1/messages").
api_key: Your Anthropic API key (can be set as an environment variable).
max_requests_per_minute: Target requests per minute (default: 40).
max_tokens_per_minute: Target tokens per minute (default: 16,000).
max_attempts: Number of retries for failed requests (default: 5).
logging_level: Logging verbosity (default: INFO).
- ERROR or 40: Logs when requests fail after all retries
- WARNING or 30: Logs when requests hit rate limits or other errors
- SUCCESS or 25: Logs successful operations (Loguru-specific level)
- INFO or 20: Logs when requests start and the status at finish
- DEBUG or 10: Logs various things as the loop runs to see when they occur
- TRACE or 5: Logs very detailed information for debugging
use_caching: Enable caching for repeated content (optional).

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project Origin

This project was inspired by and adapted from the OpenAI API Request Parallel Processor, which can be found in the OpenAI Cookbook. I've modified and extended the original script to work with Anthropic's API. Sources:

Script: api_request_parallel_processor.py
Repository: OpenAI Cookbook

We appreciate the work done by OpenAI in creating the original script, which served as a valuable starting point for this project.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_request_parallel_processor.py		api_request_parallel_processor.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anthropic Parallel API Processor

Overview

Key Features

Quick Start

Installation

Usage

Without Caching

With Caching

Important Notes ⚠️

Cache Limitations

Rate Limits

Input File Format

Generating Request Files

Without Caching

With Caching

Note on Metadata 💡

Configuration Options

Contributing

License

Project Origin

About

Releases

Packages

Languages

License

milistu/anthropic-parallel-calling

Folders and files

Latest commit

History

Repository files navigation

Anthropic Parallel API Processor

Overview

Key Features

Quick Start

Installation

Usage

Without Caching

With Caching

Important Notes ⚠️

Cache Limitations

Rate Limits

Input File Format

Generating Request Files

Without Caching

With Caching

Note on Metadata 💡

Configuration Options

Contributing

License

Project Origin

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages