-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(recovery): add crash recovery implementation #3491
base: satp-dev
Are you sure you want to change the base?
feat(recovery): add crash recovery implementation #3491
Conversation
I will review this PR |
f9014b0
to
0de9744
Compare
@Yogesh01000100 please rebase with satp-dev (should not have conflicts) |
0de9744
to
4c0124d
Compare
@Yogesh01000100 please include documentation and tests, and update the description, as discussed. |
ce9a179
to
24b8eaf
Compare
24b8eaf
to
728e7cb
Compare
@Yogesh01000100 could you please squash the commits and rebase with latest version of satp-dev, prior to merge? |
728e7cb
to
1a55673
Compare
Signed-off-by: Yogesh01000100 <yogeshone678@gmail.com> chore(satp-hermes): improve DB management Signed-off-by: Rafael Belchior <rafael.belchior@tecnico.ulisboa.pt> chore(satp-hermes): crash recovery architecture Signed-off-by: Rafael Belchior <rafael.belchior@tecnico.ulisboa.pt> fix(recovery): enhance crash recovery and rollback implementation Signed-off-by: Yogesh01000100 <yogeshone678@gmail.com> refactor(recovery): consolidate logic and improve SATP message handling Signed-off-by: Yogesh01000100 <yogeshone678@gmail.com> feat(recovery): add rollback implementations Signed-off-by: Yogesh01000100 <yogeshone678@gmail.com> fix: correct return types and inits Signed-off-by: Yogesh01000100 <yogeshone678@gmail.com>
1a55673
to
21ad772
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks very good, but there are some changes to be done prior to merging.
Summarizing my comments:
- Add other authors to the commit
- Incorporate feedback from the logging process (namely un-hardcoding logs and adding more information)
- Implement RollbackState (for example, should state how many more steps are to be rolled-back, at any moment; what was rolledback already; estimated time to completion, etc)
- Please add tests that support the new feature
- Please add comprehensive documentation on this feature. Example: The readme of SATP should have a section on how to run the docker compose with several examples of configurations.
@@ -0,0 +1,17 @@ | |||
version: '3.8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to do a few changes. We will have a satp runner image that encapsulates satp. The docker file is being developed by Bruno (please sync with him). In particular, please note what are the necessary env variables that need to be fed to docker compose.
See suggestion below:
version: '3.8'
services:
db:
image: postgres:13
environment:
POSTGRES_DB: ${DB_NAME}
POSTGRES_USER: ${DB_USER}
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_HOST: ${DB_HOST}
PGPORT: ${DB_PORT}
ports:
- "${DB_PORT}:5432"
volumes:
- pgdata:/var/lib/postgresql/data
satp-runner:
image: satp-runner-gateway:latest #to be published
environment:
# environment variables your SATP runner needs
ports:
- "3000:3000" # port mapping
depends_on:
- db
volumes:
pgdata:
Additionally, we will need a load balancer, and other services for monitoring (@eduv09 will implement those). We would have something like this:
nginx:
image: nginx:latest
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
ports:
- "80:80"
depends_on:
- satp-runner
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- "9090:9090"
grafana:
image: grafana/grafana:latest
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_USER=${GRAFANA_ADMIN_USER:-admin}
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin}
ports:
- "3000:3000"
depends_on:
- prometheus
@@ -78,7 +78,16 @@ | |||
"watch": "tsc --build --watch", | |||
"forge": "forge build ./src/solidity/*.sol --out ./src/solidity/generated", | |||
"forge:test": "forge build ./src/test/solidity/contracts/*.sol --out ./src/test/solidity/generated", | |||
"forge:all": "run-s 'forge' 'forge:test'" | |||
"forge:all": "run-s 'forge' 'forge:test'", | |||
"db:setup": "bash -c 'npm run db:destroy || true && run-s db:start db:migrate db:seed'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did many contributions to this code. Unfortunately the squash you did removed that history.
Please add me as a co-author to the squashed commit:
https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors
} | ||
|
||
// todo read from local log to get session data | ||
/*private async getSessions(): Map<string, SessionData> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented?
const logEntry = { | ||
sessionID: req.sessionId, | ||
type: "RECOVER_SUCCESS", | ||
key: "key", // generateKey(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented?
sessionID: req.sessionId, | ||
type: "ROLLBACK_", | ||
key: "key", //generateKey(), | ||
operation: "ROLLBACK_", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
throw new Error(`${fnTag}, session data is not correctly initialized`); | ||
} | ||
try { | ||
// TODO record the rollback on the log. Implement RollbackLogEntry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove todos when the task is completed. I see you are already recording the rollback log entry (despite being a dummy entry). Please un-hardcode it, and gather information from the current session
try { | ||
// TODO: Implement Stage 1 specific rollback logic | ||
|
||
// TODO: Record the rollback on the log. Implement RollbackLogEntry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please un-hardcode the log creation where applicable
try { | ||
// TODO: Implement Stage 2 specific rollback logic | ||
|
||
// TODO: Record the rollback on the log. Implement RollbackLogEntry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please un-hardcode the log creation where applicable.
For instance, details can be a subset of the execution trace to the moment. Remember, these logs should help developers finding bugs or understanding the state of the application at any moment
// TODO: Implement Stage 3 specific cleanup logic | ||
|
||
state.currentStage = "Stage3"; | ||
// TODO: Update other state properties as needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments from other stages
@@ -1,4 +1,4 @@ | |||
// @generated by protoc-gen-es v1.7.2 with parameter "target=ts" | |||
// @generated by protoc-gen-es v1.8.0 with parameter "target=ts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us please fix the latest working version of the generator in package.json (if not done yet)
This PR addresses issue #3114
Changes
crash_recovery.proto
and related ts files; added core recovery logic (created functions not yet implemented).knexfile.ts
andknexfile-remote.ts
; added Docker Compose for production.