Skip to content

Commit

Permalink
chore: first full version supporting audio analysis and vector search
Browse files Browse the repository at this point in the history
  • Loading branch information
leggetter committed Sep 19, 2024
1 parent 91d9cd0 commit 5c93207
Show file tree
Hide file tree
Showing 12 changed files with 604 additions and 85 deletions.
22 changes: 18 additions & 4 deletions .env-example
Original file line number Diff line number Diff line change
@@ -1,4 +1,18 @@
SECRET_KEY="put your secret here"
REPLICATE_API_TOKEN=
WEBHOOK_URL=
MONGODB_CONNECTION_URI=
# A secret used for signing session cookies
# https://flask.palletsprojects.com/en/2.3.x/config/#SECRET_KEY
SECRET_KEY=""

# MongoDB Atlas connection string
MONGODB_CONNECTION_URI=""

# Hookdeck Project API Key
# Hookdeck Dashboard -> Settings -> Secrets
HOOKDECK_PROJECT_API_KEY=""

# Replicate API Token
REPLICATE_API_TOKEN=""

# Hookdeck Source URLs
# These will be automatically populated for you in the next step
AUDIO_WEBHOOK_URL=""
EMBEDDINGS_WEBHOOK_URL=""
82 changes: 77 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,84 @@
# Index All The Things!

Allows an asset with a URL to be analyzed and a textual and embedding representation stored in MongoDB.

A vector search can then be performed on the embeddings.

At present the application supports analyzing audio assets and getting the transcribed contents. However, there is a framework in place to support other asset types such as text, HTML, images, and video

## How it works

The following diagram shows the sequence of how assets are submitted within the Flask application and processed by Replicate, and the results sent via webhooks through Hookdeck.

![Sequence Diagram](docs/sequence-diagram.png)

## TODO:
## Prerequisites

- A free [Hookdeck account](https://dashboard.hookdeck.com/signup?ref=github-iatt)
- The [Hookdeck CLI installed](https://hookdeck.com/docs/cli?ref=github-iatt)
- A trial [MongoDB Atlas account]()
- [Python 3](https://www.python.org/downloads/)
- [Poetry](https://python-poetry.org/docs/#installation) for package management

## Development setup

### Dependencies

Activate the virtual environment:

```sh
poetry shell
```

Install dependencies:

```sh
poetry install
```

### Configuration

Create a `.env` file with the following configuration, replacing with values as indicated:

```
# A secret used for signing session cookies
# https://flask.palletsprojects.com/en/2.3.x/config/#SECRET_KEY
SECRET_KEY=""
# MongoDB Atlas connection string
MONGODB_CONNECTION_URI=""
# Hookdeck Project API Key
# Hookdeck Dashboard -> Settings -> Secrets
HOOKDECK_PROJECT_API_KEY=""
# Replicate API Token
REPLICATE_API_TOKEN=""
# Hookdeck Source URLs
# These will be automatically populated for you in the next step
AUDIO_WEBHOOK_URL=""
EMBEDDINGS_WEBHOOK_URL=""
```

Run the following to create Hookdeck connections to receive webhooks from Replicate:

```sh
poetry run python create-hookdeck-connections.py
```

### Run the app

Run the app:

```sh
poetry run python -m flask --app app --debug run
```

Create a localtunnel:

```sh
hookdeck listen '*' 5000
```

- [x] Save each indexing request to MongoDB
- [x] List existing indexing request from MongoDB, including the indexing status
- [ ] Create a Vector indexing. This could be triggered via CDC.
- [ ] Create some sort of search functionality using the Vector
Navigate to `localhost:5000` within your web browser.
Loading

0 comments on commit 5c93207

Please sign in to comment.