Skip to content

Commit

Permalink
Merge branch 'update-sequence-docs' into decouple-flask-app
Browse files Browse the repository at this point in the history
  • Loading branch information
btylerburton committed Jan 15, 2025
2 parents 856d868 + 59381ff commit efc4b7b
Show file tree
Hide file tree
Showing 17 changed files with 1,782 additions and 235 deletions.
2 changes: 1 addition & 1 deletion docs/diagrams/mermaid/dest/arcgis-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/diagrams/mermaid/dest/dcat-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/etl_full_harvest-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/diagrams/mermaid/dest/etl_full_harvest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
![diagram](./etl_full_harvest-1.svg)
1 change: 0 additions & 1 deletion docs/diagrams/mermaid/dest/etl_pipeline-1.svg

This file was deleted.

1 change: 0 additions & 1 deletion docs/diagrams/mermaid/dest/etl_pipeline.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/diagrams/mermaid/dest/h20_compare_dcat-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/diagrams/mermaid/dest/new_harvesting-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/diagrams/mermaid/dest/old_harvesting-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/diagrams/mermaid/dest/single_xml-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/diagrams/mermaid/dest/waf_xml-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions docs/diagrams/mermaid/src/etl_ckan_sync_only.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
```mermaid
sequenceDiagram
autonumber
participant FA as Flask App
participant HDB as Harvest DB
participant DHR as Datagov Harvest Runner
participant MD as MDTranslator
participant HS as Agency<br>Harvest Source
participant CKAN
participant SES
note over FA: TRIGGER <br> via GH Action,<br>or manually via Flask app
FA->>+HDB: create harvest_job
HDB-->>-FA: returns harvest_job obj
FA->>+DHR: invoke harvest.py<br> with corresponding harvest_source config & <<job_id>>
DHR-->>-FA: returns OK
FA->>HDB: update job_status: in_progress
note over DHR: EXTRACT
DHR->>+HS: Fetch source from <<source_url>>
HS->>-DHR: return source
DHR->>+HDB: Fetch records from db
HDB-->>-DHR: Return active records<br>with corresponding <<harvest_source_id>><br>filtered by most recent TIMESTAMP
note over DHR: COMPARE
loop hash source record and COMPARE with active records' <<source_hash>>
DHR->>DHR: Generate lists to CREATE/UPDATE/DELETE
DHR->>HDB: Write records with status: create, update, delete
end
note over DHR: TRANSFORM<br>(optional)<br>*for non-dcat sources
loop items to transform
DHR->>+MD: MDTransform(dataset)
MD-->>-DHR: Transformed Item
alt Transform fails
DHR-->>HDB: Log failures as harvest_error with type: transform<br>update harvest_record status: error_transform
end
end
note over DHR: VALIDATE
loop VALIDATE items to create/update
DHR->>DHR: Validate against schema
alt Validation fails
DHR-->>HDB: Log failures as harvest_error with type: validation<br>update harvest_record status: error_validation
end
end
note over DHR: LOAD
loop SYNC items to create/update/delete
DHR->>CKAN: CKAN package_create (create), <br>package_update (update), <br>dataset_purge (delete)
alt Sync fails
DHR-->>HDB: Log failures as harvest_error with type: sync<br>UPDATE harvest_record to status: error_sync
end
end
note over DHR: REPORT
DHR->>HDB: POST harvest job metrics <br> UPDATE harvest_job to status: complete
DHR->>SES: Email job metrics (jobMetrics, notification_emails)
```
53 changes: 53 additions & 0 deletions docs/diagrams/mermaid/src/etl_full_harvest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
```mermaid
sequenceDiagram
autonumber
participant FA as Flask App
participant HDB as Harvest DB
participant DHR as Datagov Harvest Runner
participant MD as MDTranslator
participant HS as Agency<br>Harvest Source
participant CKAN
participant SES
note over FA: TRIGGER <br> via GH Action,<br>or manually via Flask app
FA->>+HDB: create harvest_job
HDB-->>-FA: returns harvest_job obj
FA->>+DHR: invoke harvest.py<br> with corresponding harvest_source config & <<job_id>>
DHR-->>-FA: returns OK
FA->>HDB: update job_status: in_progress
note over DHR: EXTRACT
DHR->>+HS: Fetch source from <<source_url>>
HS->>-DHR: return source
DHR->>+HDB: Fetch records from db
HDB-->>-DHR: Return active records<br>with corresponding <<harvest_source_id>><br>filtered by most recent TIMESTAMP
note over DHR: COMPARE
loop hash source record and COMPARE with active records' <<source_hash>>
DHR->>DHR: Generate lists to CREATE/UPDATE/DELETE
DHR->>HDB: Write records with status: create, update, delete
end
note over DHR: TRANSFORM<br>(optional)<br>*for non-dcat sources
loop items to transform
DHR->>+MD: MDTransform(dataset)
MD-->>-DHR: Transformed Item
alt Transform fails
DHR-->>HDB: Log failures as harvest_error with type: transform<br>update harvest_record status: error_transform
end
end
note over DHR: VALIDATE
loop VALIDATE items to create/update
DHR->>DHR: Validate against schema
alt Validation fails
DHR-->>HDB: Log failures as harvest_error with type: validation<br>update harvest_record status: error_validation
end
end
note over DHR: LOAD
loop SYNC items to create/update/delete
DHR->>CKAN: CKAN package_create (create), <br>package_update (update), <br>dataset_purge (delete)
alt Sync fails
DHR-->>HDB: Log failures as harvest_error with type: sync<br>UPDATE harvest_record to status: error_sync
end
end
note over DHR: REPORT
DHR->>HDB: POST harvest job metrics <br> UPDATE harvest_job to status: complete
DHR->>SES: Email job metrics (jobMetrics, notification_emails)
```
63 changes: 0 additions & 63 deletions docs/diagrams/mermaid/src/etl_pipeline.md

This file was deleted.

53 changes: 53 additions & 0 deletions docs/diagrams/mermaid/src/etl_restart_failed_harvest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
```mermaid
sequenceDiagram
autonumber
participant FA as Flask App
participant HDB as Harvest DB
participant DHR as Datagov Harvest Runner
participant MD as MDTranslator
participant HS as Agency<br>Harvest Source
participant CKAN
participant SES
note over FA: TRIGGER <br> via GH Action,<br>or manually via Flask app
FA->>+HDB: create harvest_job
HDB-->>-FA: returns harvest_job obj
FA->>+DHR: invoke harvest.py<br> with corresponding harvest_source config & <<job_id>>
DHR-->>-FA: returns OK
FA->>HDB: update job_status: in_progress
note over DHR: EXTRACT
DHR->>+HS: Fetch source from <<source_url>>
HS->>-DHR: return source
DHR->>+HDB: Fetch records from db
HDB-->>-DHR: Return active records<br>with corresponding <<harvest_source_id>><br>filtered by most recent TIMESTAMP
note over DHR: COMPARE
loop hash source record and COMPARE with active records' <<source_hash>>
DHR->>DHR: Generate lists to CREATE/UPDATE/DELETE
DHR->>HDB: Write records with status: create, update, delete
end
note over DHR: TRANSFORM<br>(optional)<br>*for non-dcat sources
loop items to transform
DHR->>+MD: MDTransform(dataset)
MD-->>-DHR: Transformed Item
alt Transform fails
DHR-->>HDB: Log failures as harvest_error with type: transform<br>update harvest_record status: error_transform
end
end
note over DHR: VALIDATE
loop VALIDATE items to create/update
DHR->>DHR: Validate against schema
alt Validation fails
DHR-->>HDB: Log failures as harvest_error with type: validation<br>update harvest_record status: error_validation
end
end
note over DHR: LOAD
loop SYNC items to create/update/delete
DHR->>CKAN: CKAN package_create (create), <br>package_update (update), <br>dataset_purge (delete)
alt Sync fails
DHR-->>HDB: Log failures as harvest_error with type: sync<br>UPDATE harvest_record to status: error_sync
end
end
note over DHR: REPORT
DHR->>HDB: POST harvest job metrics <br> UPDATE harvest_job to status: complete
DHR->>SES: Email job metrics (jobMetrics, notification_emails)
```
53 changes: 53 additions & 0 deletions docs/diagrams/mermaid/src/etl_validation_only.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
```mermaid
sequenceDiagram
autonumber
participant FA as Flask App
participant HDB as Harvest DB
participant DHR as Datagov Harvest Runner
participant MD as MDTranslator
participant HS as Agency<br>Harvest Source
participant CKAN
participant SES
note over FA: TRIGGER <br> via GH Action,<br>or manually via Flask app
FA->>+HDB: create harvest_job
HDB-->>-FA: returns harvest_job obj
FA->>+DHR: invoke harvest.py<br> with corresponding harvest_source config & <<job_id>>
DHR-->>-FA: returns OK
FA->>HDB: update job_status: in_progress
note over DHR: EXTRACT
DHR->>+HS: Fetch source from <<source_url>>
HS->>-DHR: return source
DHR->>+HDB: Fetch records from db
HDB-->>-DHR: Return active records<br>with corresponding <<harvest_source_id>><br>filtered by most recent TIMESTAMP
note over DHR: COMPARE
loop hash source record and COMPARE with active records' <<source_hash>>
DHR->>DHR: Generate lists to CREATE/UPDATE/DELETE
DHR->>HDB: Write records with status: create, update, delete
end
note over DHR: TRANSFORM<br>(optional)<br>*for non-dcat sources
loop items to transform
DHR->>+MD: MDTransform(dataset)
MD-->>-DHR: Transformed Item
alt Transform fails
DHR-->>HDB: Log failures as harvest_error with type: transform<br>update harvest_record status: error_transform
end
end
note over DHR: VALIDATE
loop VALIDATE items to create/update
DHR->>DHR: Validate against schema
alt Validation fails
DHR-->>HDB: Log failures as harvest_error with type: validation<br>update harvest_record status: error_validation
end
end
note over DHR: LOAD
loop SYNC items to create/update/delete
DHR->>CKAN: CKAN package_create (create), <br>package_update (update), <br>dataset_purge (delete)
alt Sync fails
DHR-->>HDB: Log failures as harvest_error with type: sync<br>UPDATE harvest_record to status: error_sync
end
end
note over DHR: REPORT
DHR->>HDB: POST harvest job metrics <br> UPDATE harvest_job to status: complete
DHR->>SES: Email job metrics (jobMetrics, notification_emails)
```
Loading

1 comment on commit efc4b7b

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests Skipped Failures Errors Time
2 0 💤 0 ❌ 0 🔥 5.298s ⏱️

Please sign in to comment.