Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(recovery): add crash recovery implementation #3491

Draft
wants to merge 1 commit into
base: satp-dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions packages/cactus-plugin-satp-hermes/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: '3.8'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to do a few changes. We will have a satp runner image that encapsulates satp. The docker file is being developed by Bruno (please sync with him). In particular, please note what are the necessary env variables that need to be fed to docker compose.

See suggestion below:

version: '3.8'

services:
  db:
    image: postgres:13
    environment:
      POSTGRES_DB: ${DB_NAME}
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_HOST: ${DB_HOST}
      PGPORT: ${DB_PORT}
    ports:
      - "${DB_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

  satp-runner:
    image: satp-runner-gateway:latest  #to be published
    environment:
      # environment variables your SATP runner needs
    ports:
      - "3000:3000"  #  port mapping
    depends_on:
      - db

volumes:
  pgdata:

Additionally, we will need a load balancer, and other services for monitoring (@eduv09 will implement those). We would have something like this:

  nginx:
    image: nginx:latest
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    ports:
      - "80:80"
    depends_on:
      - satp-runner

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER:-admin}
      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin}
    ports:
      - "3000:3000"
    depends_on:
      - prometheus

services:
db:
image: postgres:13
environment:
POSTGRES_DB: ${DB_NAME}
POSTGRES_USER: ${DB_USER}
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_HOST: ${DB_HOST}
PGPORT: ${DB_PORT}
ports:
- "${DB_PORT}:5432"
volumes:
- pgdata:/var/lib/postgresql/data

volumes:
pgdata:
16 changes: 13 additions & 3 deletions packages/cactus-plugin-satp-hermes/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,16 @@
"watch": "tsc --build --watch",
"forge": "forge build ./src/solidity/*.sol --out ./src/solidity/generated",
"forge:test": "forge build ./src/test/solidity/contracts/*.sol --out ./src/test/solidity/generated",
"forge:all": "run-s 'forge' 'forge:test'"
"forge:all": "run-s 'forge' 'forge:test'",
"db:setup": "bash -c 'npm run db:destroy || true && run-s db:start db:migrate db:seed'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did many contributions to this code. Unfortunately the squash you did removed that history.

Please add me as a co-author to the squashed commit:
https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors

"db:destroy": "docker-compose down -v && npm run db:cleanup",
"db:start": "docker-compose up -d",
"db:stop": "docker-compose down",
"db:reset": "run-s db:destroy db:start db:migrate db:seed",
"db:migrate": "knex migrate:latest --knexfile src/knex/knexfile.js",
"db:migrate:production": "knex migrate:latest --env production --knexfile src/knex/knexfile.ts",
"db:seed": "knex seed:run --knexfile src/knex/knexfile.ts",
"db:cleanup": "find src/knex/data -name '.dev-*.sqlite3' -delete"
},
"jest": {
"moduleNameMapper": {
Expand Down Expand Up @@ -120,7 +129,7 @@
"kubo-rpc-client": "3.0.1",
"npm-run-all": "4.1.5",
"openzeppelin-solidity": "3.4.2",
"safe-stable-stringify": "^2.5.0",
"pg": "^8.8.0",
"secp256k1": "4.0.3",
"socket.io": "4.6.2",
"sqlite3": "5.1.5",
Expand All @@ -143,6 +152,7 @@
"@types/fs-extra": "11.0.4",
"@types/google-protobuf": "3.15.5",
"@types/node": "18.18.2",
"@types/pg": "8.6.5",
"@types/swagger-ui-express": "4.1.6",
"@types/tape": "4.13.4",
"@types/uuid": "10.0.0",
Expand Down Expand Up @@ -183,4 +193,4 @@
"runOnChangeOnly": true
}
}
}
}
17 changes: 16 additions & 1 deletion packages/cactus-plugin-satp-hermes/src/knex/knexfile-remote.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import path from "path";
import { v4 as uuidv4 } from "uuid";
import dotenv from "dotenv";

dotenv.config({ path: path.resolve(__dirname, "../../.env") });

// default configuration for knex
module.exports = {
development: {
client: "sqlite3",
Expand All @@ -13,4 +15,17 @@ module.exports = {
},
useNullAsDefault: true,
},
production: {
client: "pg",
connection: {
host: process.env.DB_HOST,
port: process.env.DB_PORT,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: process.env.DB_NAME,
},
migrations: {
directory: path.resolve(__dirname, "migrations"),
},
},
};
22 changes: 20 additions & 2 deletions packages/cactus-plugin-satp-hermes/src/knex/knexfile.ts
Original file line number Diff line number Diff line change
@@ -1,16 +1,34 @@
import path from "path";
import { v4 as uuidv4 } from "uuid";
import dotenv from "dotenv";

dotenv.config({ path: path.resolve(__dirname, "../../.env") });

// default configuration for knex
module.exports = {
development: {
client: "sqlite3",
connection: {
filename: path.resolve(__dirname, ".dev-" + uuidv4() + ".sqlite3"),
filename: path.join(__dirname, "data", "/.dev-" + uuidv4() + ".sqlite3"),
},
migrations: {
directory: path.resolve(__dirname, "migrations"),
},
seeds: {
directory: path.resolve(__dirname, "seeds"),
},
useNullAsDefault: true,
},
production: {
client: "pg",
connection: {
host: process.env.DB_HOST,
port: process.env.DB_PORT,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: process.env.DB_NAME,
},
migrations: {
directory: path.resolve(__dirname, "migrations"),
},
},
};

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import { Knex } from "knex";

export function up(knex: Knex): Knex.SchemaBuilder {
return knex.schema.createTable("logs", (table) => {
table.string("sessionID").notNullable();
table.string("type").notNullable();
table.string("key").notNullable().primary();
table.string("operation").notNullable();
table.string("timestamp").notNullable();
table.string("data").notNullable();
});
}

export function down(knex: Knex): Knex.SchemaBuilder {
return knex.schema.dropTable("logs");
}

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import { Knex } from "knex";

export async function up(knex: Knex): Promise<void> {
return knex.schema.createTable("remote-logs", (table) => {
table.string("hash").notNullable();
table.string("signature").notNullable();
table.string("signerPubKey").notNullable();
table.string("key").notNullable().primary();
});
}

export async function down(knex: Knex): Promise<void> {
return knex.schema.dropTable("remote-logs");
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
// 20240821000000_seed_dev_logs.ts

import { Knex } from "knex";

export async function seed(knex: Knex): Promise<void> {
// Check if we're in the development environment
if (process.env.NODE_ENV !== "development") {
console.log("Skipping seed: Not in development environment");
return;
}

// Function to clear table if it exists
async function clearTableIfExists(tableName: string) {
if (await knex.schema.hasTable(tableName)) {
await knex(tableName).del();
console.log(`Cleared existing entries from ${tableName}`);
} else {
console.log(`Table ${tableName} does not exist, skipping clear`);
}
}

// Clear existing entries if tables exist
await clearTableIfExists("logs");
await clearTableIfExists("remote-logs");

// Insert a single deterministic log entry
await knex("logs").insert({
sessionID: "test-session-001",
type: "info",
key: "test-log-001",
operation: "create",
timestamp: "2024-08-21T12:00:00Z",
data: JSON.stringify({ message: "This is a test log entry" }),
});
}
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,73 @@ service CrashRecovery {
// util RPCs

// step RPCs
rpc RecoverV2Message (RecoverMessage) returns (RecoverUpdateMessage);
rpc RecoverV2UpdateMessage (RecoverUpdateMessage) returns (RecoverSuccessMessage);
rpc RecoverV2SuccessMessage (RecoverSuccessMessage) returns (google.protobuf.Empty) {};
rpc RollbackV2Message (RollbackMessage) returns (RollbackAckMessage);
rpc RollbackV2AckMessage (RollbackAckMessage) returns (google.protobuf.Empty) {};
}

message RecoverMessage {
string sessionId = 1;
string messageType = 2;
string satpPhase = 3;
int32 sequenceNumber = 4;
bool isBackup = 5;
string newIdentityPublicKey = 6;
int64 lastEntryTimestamp = 7;
string senderSignature = 8;
}

message RecoverUpdateMessage {
string sessionId = 1;
string messageType = 2;
string hashRecoverMessage = 3;
repeated LocalLog recoveredLogs = 4;
string senderSignature = 5;
}

message RecoverSuccessMessage {
string sessionId = 1;
string messageType = 2;
string hashRecoverUpdateMessage = 3;
bool success = 4;
repeated string entriesChanged = 5;
string senderSignature = 6;
}

message RollbackMessage {
string sessionId = 1;
string messageType = 2;
bool success = 3;
repeated string actionsPerformed = 4;
repeated string proofs = 5;
string senderSignature = 6;
}

message RollbackAckMessage {
string sessionId = 1;
string messageType = 2;
bool success = 3;
repeated string actionsPerformed = 4;
repeated string proofs = 5;
string senderSignature = 6;
}

message LocalLog {
string key=1;
string sessionId=2;
string data=3;
string type=4;
string operation=5;
string timestamp=6;
}

message RollbackLogEntry {
string session_id = 1;
string stage = 2;
string timestamp = 3;
string action = 4; // action performed during rollback
string status = 5; // status of rollback (e.g., SUCCESS, FAILED)
string details = 6; // Additional details or metadata about the rollback
}
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,8 @@ export class BLODispatcher {
const res = Array.from(await this.manager.getSessions().keys());
return res;
}

// TODO implement recovery handlers
// get channel by caller; give needed client from orchestrator to handler to call
// for all channels, find session id on request
// TODO implement handlers GetAudit, Transact, Cancel, Routes
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
// handler to allow a user application to communicate a gateway it crashed and needs to be recovered. It "forces" and update of status with a counterparty gateway
// TODO update the spec with a RecoverForce message that is handled by this handler
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
// handler to allow a user application to force a rollback
// TODO update the spec with RollbackForce message that is handled by this handler
Loading
Loading