Skip to content

Commit

Permalink
init migration
Browse files Browse the repository at this point in the history
  • Loading branch information
ritual-all committed Jan 18, 2024
0 parents commit 6f69975
Show file tree
Hide file tree
Showing 35 changed files with 1,420 additions and 0 deletions.
33 changes: 33 additions & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# pre-commit workflow
#
# Ensures the codebase passes the pre-commit stack.

name: pre-commit

on: [pull_request]

jobs:
pre-commit:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Run pre-commit hooks
uses: pre-commit/action@v3.0.0
with:
extra_args: --all-files --show-diff-on-failure

- name: Setup TFLint
uses: terraform-linters/setup-tflint@v3
with:
tflint_version: v0.44.1

- name: Show version
run: tflint --version

- name: Init TFLint
run: tflint --init

- name: Run TFLint
run: tflint -f compact
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Secret configs
*.json
configs/encoded

# Deployment files
*.tar.gz

# Terraform
.terraform/
.terraform*
terraform.tfvars
terraform.tfstate*
12 changes: 12 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
repos:
# Default pre-commit hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
# Ensure EOF exists
- id: end-of-file-fixer
# Prevent adding large files
- id: check-added-large-files
args: ["--maxkb=5000"]
# Newline at end of file
- id: trailing-whitespace
16 changes: 16 additions & 0 deletions .tflint.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
plugin "terraform" {
enabled = true
preset = "recommended"
}

plugin "aws" {
enabled = true
version = "0.28.0"
source = "github.com/terraform-linters/tflint-ruleset-aws"
}

plugin "google" {
enabled = true
version = "0.26.0"
source = "github.com/terraform-linters/tflint-ruleset-google"
}
32 changes: 32 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
The Clear BSD License

Copyright (c) 2023 Origin Research Ltd
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted (subject to the limitations in the disclaimer below) provided that
the following conditions are met:

* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from this
software without specific prior written permission.

NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY
THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
111 changes: 111 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Infernet Node Deployment

Deploy a cluster of heterogenous [Infernet](https://github.com/ritual-net/infernet-node) nodes on Amazon Web Services (AWS) and / or Google Cloud Platform (GCP), using [Terraform](https://www.terraform.io/) for infrastructure procurement and [Docker compose](https://docs.docker.com/compose/) for deployment.


### Setup
1. [Install Terraform](https://developer.hashicorp.com/terraform/install)
2. **Configure nodes**: A node configuration file **for each** node being deployed.
- See [example configuration](configs/0.json.example).
- They must be named `0.json`, `1.json`, etc...
- Misnamed files are ignored.
- They must be placed under the top-level `configs/` directory.
- Each node *strictly* requires its own configuration `.json` file, even if those are identical.
- Number of `.json` files must match the `node_count` variable in `terraform.tfvars`.
- Extra files are ignored.
- For instructions on configuring nodes, refer to the [Infernet Node](https://github.com/ritual-net/infernet-node).

#### Infernet Router:
The Infernet Router REST server is configured automatically by Terraform. However, if you plan to use it, you need to understand its implications:
> **IMPORTANT:** When configuring a heterogeneous node cluster (i.e. `0.json`, `1.json`, etc. are not identical), container IDs should be reserved for a **unique container setup at the cluster level, i.e. across nodes (and thus `.json` files)**. That is becuase the router uses container IDs to make routing decisions between services running across the cluster.
>
> _Example:_ Consider nodes A and B, each running a single LLM inference container; node A runs `image1`, and node B runs `image2`. If we set `id: "llm-inference"` in both containers (`containers[0].id` attribute in `0.json`, `1.json`), the router will be **unable to disambiguate** between the two services, and will consider them interchangeable, _which they are not._ Any requests for `"llm-inference"` will be routed to either container, which is an error.
>
> Therefore, **re-using a IDs across configuration files must imply an identical container configuration**, including image, environment variables, command, etc. This will explicitly tell the router which containers are interchangeable, and allow it to distribute requests for those containers across _all nodes running that container._
### Deploy on AWS

1. Create an AWS service account for deployment:
```bash
cd procure/aws
chmod 700 create_service_account.sh
./create_service_account.sh
```
This will require local authentication with the AWS CLI. Add `access_key_id` and `secret_access_key` to your Terraform variables (see step 3).

2. Make a copy of the example configuration file [terraform.tfvars.example](procure/aws/terraform.tfvars.example):
```bash
cd procure/aws
cp terraform.tfvars.example terraform.tfvars
```

3. Configure your `terraform.tfvars` file. See [variables.tf](procure/aws/variables.tf) for config descriptions.

4. Run Terraform:
```bash
# Initialize
cd procure
make init provider=aws
# Print deployment plan
make plan provider=aws
# Deploy
make apply provider=aws
# WARNING: Destructive
# Destroy deployment
make destroy provider=aws
```

### Deploy on GCP


1. Create a GCP service account for deployment:
```bash
cd procure/gcp
chmod 700 create_service_account.sh
./create_service_account.sh
```
This will require local authentication with the GCP CLI, and create a local credentials file. Add the path to the credentials file (`gcp_credentials_file_path`) to your Terraform variables (see step 3).

2. Make a copy of the example configuration file [terraform.tfvars.example](procure/gcp/terraform.tfvars.example):
```bash
cd procure/gcp
cp terraform.tfvars.example terraform.tfvars
```
3. Configure your `terraform.tfvars` file. See [variables.tf](procure/gcp/variables.tf) for config descriptions.

4. Run Terraform:
```bash
# Initialize
cd procure
make init provider=gcp
# Print deployment plan
make plan provider=gcp
# Deploy
make apply provider=gcp
# WARNING: Destructive
# Destroy deployment
make destroy provider=gcp
```

### Using TfLint

```bash
# Install tflint
brew install tflint
# Install plugins
tflint --init
# Run on all directories
tflint --recursive
```

## License

[BSD 3-clause Clear](./LICENSE)
64 changes: 64 additions & 0 deletions configs/0.json.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"log_path": "infernet_node.log",
"server": {
"port": 4000
},
"chain": {
"enabled": true,
"rpc_url": "http://127.0.0.1:8545",
"coordinator_address": "0x...",
"trail_head_blocks": 4,
"wallet": {
"max_gas_limit": 100000,
"private_key": "12345s"
}
},
"docker": {
"username": "username",
"password": "password"
},
"redis": {
"host": "localhost",
"port": 6379
},
"forward_stats": true,
"containers": [
{
"id": "container-1",
"image": "org1/image1:tag1",
"description": "Container 1 description",
"external": true,
"port": "4999",
"allowed_addresses": [],
"allowed_delegate_addresses": [],
"allowed_ips": [
"XX.XX.XX.XXX",
"XX.XX.XX.XXX"
],
"command": "--bind=0.0.0.0:3000 --workers=2",
"env": {
"KEY1": "VALUE1",
"KEY2": "VALUE2"
},
"gpu": true
},
{
"id": "container-2",
"image": "org2/image2:tag2",
"description": "Container 2 description",
"external": false,
"port": "4998",
"allowed_addresses": [],
"allowed_delegate_addresses": [],
"allowed_ips": [
"XX.XX.XX.XXX",
"XX.XX.XX.XXX"
],
"command": "--bind=0.0.0.0:3000 --workers=2",
"env": {
"KEY3": "VALUE3",
"KEY4": "VALUE4"
}
}
]
}
56 changes: 56 additions & 0 deletions deploy/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
version: '3'

services:
node:
image: ritualnetwork/infernet-node:0.1.0
ports:
- "0.0.0.0:4000:4000"
volumes:
- type: bind
source: ./config.json
target: /app/config.json
- node-logs:/logs
- /var/run/docker.sock:/var/run/docker.sock
networks:
- network
restart:
on-failure
depends_on:
- redis
extra_hosts:
- "host.docker.internal:host-gateway"
stop_grace_period: 1m

redis:
image: redis:latest
ports:
- "6379:6379"
networks:
- network
volumes:
- ./redis.conf:/usr/local/etc/redis/redis.conf
- redis-data:/data
restart:
on-failure

fluentbit:
image: fluent/fluent-bit:latest
ports:
- "24224:24224"
environment:
- FLUENTBIT_CONFIG_PATH=/fluent-bit/etc/fluent-bit.conf
volumes:
- ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
- /var/log:/var/log:ro
networks:
- network
restart:
on-failure

networks:
network:


volumes:
node-logs:
redis-data:
38 changes: 38 additions & 0 deletions deploy/fluent-bit.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
[SERVICE]
Flush 1
Daemon Off
Log_Level info
storage.path /tmp/fluentbit.log
storage.sync normal
storage.checksum on
storage.backlog.mem_limit 5M

[INPUT]
Name forward
Listen 0.0.0.0
Port 24224
Storage.type filesystem

[OUTPUT]
name stdout
match *

[OUTPUT]
Name pgsql
Match stats.node
Host meta-sink.ritual.net
Port 5432
User append_only_user
Password ogy29Z4mRCLfpup*9fn6
Database postgres
Table node_stats

[OUTPUT]
Name pgsql
Match stats.live
Host meta-sink.ritual.net
Port 5432
User append_only_user
Password ogy29Z4mRCLfpup*9fn6
Database postgres
Table live_stats
Loading

0 comments on commit 6f69975

Please sign in to comment.