Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
unnir committed Jul 11, 2024
1 parent a50e141 commit 287d890
Show file tree
Hide file tree
Showing 12 changed files with 1,050 additions and 1 deletion.
24 changes: 24 additions & 0 deletions .github/workflows/workflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: CI

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run tests
run: python -m unittest discover tests
50 changes: 50 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Python
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyCharm
.idea/

# VS Code
.vscode/

# Jupyter Notebook
.ipynb_checkpoints

# Environment
.env
.venv
env/
venv/

# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db


testing.ipynb
testing.py
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Vadim Borisov

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
133 changes: 132 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,132 @@
# augini
# Augini

Augini is a Python framework for generating synthetic tabular data using AI. It leverages the power of language models to create realistic, fictional data based on existing datasets.

## Installation

You can install Augini using pip:
```sh
pip install augini
```

## Quick Start

Here's a simple example of how to use Augini:

```python
from augini import Augini
import pandas as pd

# Initialize Augini
augini = Augini(api_key="your_api_key", use_openrouter=True)

# Create a sample DataFrame
data = {
'Place of Birth': ['New York', 'London', 'Tokyo'],
'Age': [30, 25, 40],
'Gender': ['Male', 'Female', 'Male']
}
df = pd.DataFrame(data)

# Add synthetic features
result_df = augini.augment_columns(df, 'NAME', 'OCCUPATION', 'FAVORITE_DRINK')

print(result_df)
```

## Features

- Generate synthetic data based on existing datasets
- Customizable prompts for data generation
- Support for both OpenAI API and OpenRouter
- Asynchronous processing for improved performance

## Extending and Enriching Data

Augini can be used to extend, augment, and enrich your datasets by adding synthetic features and bringing knowledge from language models to your data.

### Adding Multiple Features

You can add multiple features to your DataFrame:

```python
result_df = augini.augment_columns(df, 'Hobby', 'FavoriteColor', 'FavoriteMovie')
print(result_df)
```

### Custom Prompts for Feature Generation

Custom prompts allow you to generate specific features based on your needs:

```python
custom_prompt = "Based on the person's name and age, suggest a quirky pet for them. Respond with a JSON object with the key 'QuirkyPet'."
result_df = augini.augment_single(df, 'QuirkyPet', custom_prompt=custom_prompt)
print(result_df)
```

### Anonymizing Data

You can anonymize sensitive information in your dataset by generating synthetic data:

```python
anonymize_prompt = "Create an anonymous profile for the person based on their age and city. Respond with a JSON object with keys 'AnonymousName' and 'AnonymousEmail'."
result_df = augini.augment_single(df, 'AnonymousProfile', custom_prompt=anonymize_prompt)
print(result_df)
```

## Bringing Knowledge from LLMs

Leverage the knowledge embedded in language models to enhance your datasets:

### Generating Detailed Descriptions

```python
description_prompt = "Generate a detailed description for a person based on their age and city. Respond with a JSON object with the key 'Description'."
result_df = augini.augment_single(df, 'Description', custom_prompt=description_prompt)
print(result_df)
```

### Suggesting Recommendations

```python
recommendation_prompt = "Suggest a book and a movie for a person based on their age and city. Respond with a JSON object with keys 'RecommendedBook' and 'RecommendedMovie'."
result_df = augini.augment_single(df, 'Recommendations', custom_prompt=recommendation_prompt)
print(result_df)
```

## Full Example

Here's a full example demonstrating multiple features and custom prompts:

```python
from augini import Augini
import pandas as pd

# Initialize Augini
augini = Augini(api_key="your_api_key", use_openrouter=True)

# Create a sample DataFrame
data = {
'Name': ['Alice Johnson', 'Bob Smith', 'Charlie Davis'],
'Age': [28, 34, 45],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

# Add multiple synthetic features
result_df = augini.augment_columns(df, 'Occupation', 'Hobby', 'FavoriteColor')

# Add a custom feature
custom_prompt = "Based on the person's name and age, suggest a quirky pet for them. Respond with a JSON object with the key 'QuirkyPet'."
result_df = augini.augment_single(result_df, 'QuirkyPet', custom_prompt=custom_prompt)

# Anonymize data
anonymize_prompt = "Create an anonymous profile for the person based on their age and city. Respond with a JSON object with keys 'AnonymousName' and 'AnonymousEmail'."
result_df = augini.augment_single(result_df, 'AnonymousProfile', custom_prompt=anonymize_prompt)

print(result_df)
```

## Contributing

We welcome contributions to enhance Augini! Feel free to open issues and submit pull requests on our GitHub repository.
4 changes: 4 additions & 0 deletions augini/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .core import Augini

__version__ = "0.1.0"
__all__ = ["Augini"]
Loading

0 comments on commit 287d890

Please sign in to comment.