-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'update-sequence-docs' into decouple-flask-app
- Loading branch information
Showing
17 changed files
with
1,782 additions
and
235 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
![diagram](./etl_full_harvest-1.svg) |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
```mermaid | ||
sequenceDiagram | ||
autonumber | ||
participant FA as Flask App | ||
participant HDB as Harvest DB | ||
participant DHR as Datagov Harvest Runner | ||
participant MD as MDTranslator | ||
participant HS as Agency<br>Harvest Source | ||
participant CKAN | ||
participant SES | ||
note over FA: TRIGGER <br> via GH Action,<br>or manually via Flask app | ||
FA->>+HDB: create harvest_job | ||
HDB-->>-FA: returns harvest_job obj | ||
FA->>+DHR: invoke harvest.py<br> with corresponding harvest_source config & <<job_id>> | ||
DHR-->>-FA: returns OK | ||
FA->>HDB: update job_status: in_progress | ||
note over DHR: EXTRACT | ||
DHR->>+HS: Fetch source from <<source_url>> | ||
HS->>-DHR: return source | ||
DHR->>+HDB: Fetch records from db | ||
HDB-->>-DHR: Return active records<br>with corresponding <<harvest_source_id>><br>filtered by most recent TIMESTAMP | ||
note over DHR: COMPARE | ||
loop hash source record and COMPARE with active records' <<source_hash>> | ||
DHR->>DHR: Generate lists to CREATE/UPDATE/DELETE | ||
DHR->>HDB: Write records with status: create, update, delete | ||
end | ||
note over DHR: TRANSFORM<br>(optional)<br>*for non-dcat sources | ||
loop items to transform | ||
DHR->>+MD: MDTransform(dataset) | ||
MD-->>-DHR: Transformed Item | ||
alt Transform fails | ||
DHR-->>HDB: Log failures as harvest_error with type: transform<br>update harvest_record status: error_transform | ||
end | ||
end | ||
note over DHR: VALIDATE | ||
loop VALIDATE items to create/update | ||
DHR->>DHR: Validate against schema | ||
alt Validation fails | ||
DHR-->>HDB: Log failures as harvest_error with type: validation<br>update harvest_record status: error_validation | ||
end | ||
end | ||
note over DHR: LOAD | ||
loop SYNC items to create/update/delete | ||
DHR->>CKAN: CKAN package_create (create), <br>package_update (update), <br>dataset_purge (delete) | ||
alt Sync fails | ||
DHR-->>HDB: Log failures as harvest_error with type: sync<br>UPDATE harvest_record to status: error_sync | ||
end | ||
end | ||
note over DHR: REPORT | ||
DHR->>HDB: POST harvest job metrics <br> UPDATE harvest_job to status: complete | ||
DHR->>SES: Email job metrics (jobMetrics, notification_emails) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
```mermaid | ||
sequenceDiagram | ||
autonumber | ||
participant FA as Flask App | ||
participant HDB as Harvest DB | ||
participant DHR as Datagov Harvest Runner | ||
participant MD as MDTranslator | ||
participant HS as Agency<br>Harvest Source | ||
participant CKAN | ||
participant SES | ||
note over FA: TRIGGER <br> via GH Action,<br>or manually via Flask app | ||
FA->>+HDB: create harvest_job | ||
HDB-->>-FA: returns harvest_job obj | ||
FA->>+DHR: invoke harvest.py<br> with corresponding harvest_source config & <<job_id>> | ||
DHR-->>-FA: returns OK | ||
FA->>HDB: update job_status: in_progress | ||
note over DHR: EXTRACT | ||
DHR->>+HS: Fetch source from <<source_url>> | ||
HS->>-DHR: return source | ||
DHR->>+HDB: Fetch records from db | ||
HDB-->>-DHR: Return active records<br>with corresponding <<harvest_source_id>><br>filtered by most recent TIMESTAMP | ||
note over DHR: COMPARE | ||
loop hash source record and COMPARE with active records' <<source_hash>> | ||
DHR->>DHR: Generate lists to CREATE/UPDATE/DELETE | ||
DHR->>HDB: Write records with status: create, update, delete | ||
end | ||
note over DHR: TRANSFORM<br>(optional)<br>*for non-dcat sources | ||
loop items to transform | ||
DHR->>+MD: MDTransform(dataset) | ||
MD-->>-DHR: Transformed Item | ||
alt Transform fails | ||
DHR-->>HDB: Log failures as harvest_error with type: transform<br>update harvest_record status: error_transform | ||
end | ||
end | ||
note over DHR: VALIDATE | ||
loop VALIDATE items to create/update | ||
DHR->>DHR: Validate against schema | ||
alt Validation fails | ||
DHR-->>HDB: Log failures as harvest_error with type: validation<br>update harvest_record status: error_validation | ||
end | ||
end | ||
note over DHR: LOAD | ||
loop SYNC items to create/update/delete | ||
DHR->>CKAN: CKAN package_create (create), <br>package_update (update), <br>dataset_purge (delete) | ||
alt Sync fails | ||
DHR-->>HDB: Log failures as harvest_error with type: sync<br>UPDATE harvest_record to status: error_sync | ||
end | ||
end | ||
note over DHR: REPORT | ||
DHR->>HDB: POST harvest job metrics <br> UPDATE harvest_job to status: complete | ||
DHR->>SES: Email job metrics (jobMetrics, notification_emails) | ||
``` |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
```mermaid | ||
sequenceDiagram | ||
autonumber | ||
participant FA as Flask App | ||
participant HDB as Harvest DB | ||
participant DHR as Datagov Harvest Runner | ||
participant MD as MDTranslator | ||
participant HS as Agency<br>Harvest Source | ||
participant CKAN | ||
participant SES | ||
note over FA: TRIGGER <br> via GH Action,<br>or manually via Flask app | ||
FA->>+HDB: create harvest_job | ||
HDB-->>-FA: returns harvest_job obj | ||
FA->>+DHR: invoke harvest.py<br> with corresponding harvest_source config & <<job_id>> | ||
DHR-->>-FA: returns OK | ||
FA->>HDB: update job_status: in_progress | ||
note over DHR: EXTRACT | ||
DHR->>+HS: Fetch source from <<source_url>> | ||
HS->>-DHR: return source | ||
DHR->>+HDB: Fetch records from db | ||
HDB-->>-DHR: Return active records<br>with corresponding <<harvest_source_id>><br>filtered by most recent TIMESTAMP | ||
note over DHR: COMPARE | ||
loop hash source record and COMPARE with active records' <<source_hash>> | ||
DHR->>DHR: Generate lists to CREATE/UPDATE/DELETE | ||
DHR->>HDB: Write records with status: create, update, delete | ||
end | ||
note over DHR: TRANSFORM<br>(optional)<br>*for non-dcat sources | ||
loop items to transform | ||
DHR->>+MD: MDTransform(dataset) | ||
MD-->>-DHR: Transformed Item | ||
alt Transform fails | ||
DHR-->>HDB: Log failures as harvest_error with type: transform<br>update harvest_record status: error_transform | ||
end | ||
end | ||
note over DHR: VALIDATE | ||
loop VALIDATE items to create/update | ||
DHR->>DHR: Validate against schema | ||
alt Validation fails | ||
DHR-->>HDB: Log failures as harvest_error with type: validation<br>update harvest_record status: error_validation | ||
end | ||
end | ||
note over DHR: LOAD | ||
loop SYNC items to create/update/delete | ||
DHR->>CKAN: CKAN package_create (create), <br>package_update (update), <br>dataset_purge (delete) | ||
alt Sync fails | ||
DHR-->>HDB: Log failures as harvest_error with type: sync<br>UPDATE harvest_record to status: error_sync | ||
end | ||
end | ||
note over DHR: REPORT | ||
DHR->>HDB: POST harvest job metrics <br> UPDATE harvest_job to status: complete | ||
DHR->>SES: Email job metrics (jobMetrics, notification_emails) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
```mermaid | ||
sequenceDiagram | ||
autonumber | ||
participant FA as Flask App | ||
participant HDB as Harvest DB | ||
participant DHR as Datagov Harvest Runner | ||
participant MD as MDTranslator | ||
participant HS as Agency<br>Harvest Source | ||
participant CKAN | ||
participant SES | ||
note over FA: TRIGGER <br> via GH Action,<br>or manually via Flask app | ||
FA->>+HDB: create harvest_job | ||
HDB-->>-FA: returns harvest_job obj | ||
FA->>+DHR: invoke harvest.py<br> with corresponding harvest_source config & <<job_id>> | ||
DHR-->>-FA: returns OK | ||
FA->>HDB: update job_status: in_progress | ||
note over DHR: EXTRACT | ||
DHR->>+HS: Fetch source from <<source_url>> | ||
HS->>-DHR: return source | ||
DHR->>+HDB: Fetch records from db | ||
HDB-->>-DHR: Return active records<br>with corresponding <<harvest_source_id>><br>filtered by most recent TIMESTAMP | ||
note over DHR: COMPARE | ||
loop hash source record and COMPARE with active records' <<source_hash>> | ||
DHR->>DHR: Generate lists to CREATE/UPDATE/DELETE | ||
DHR->>HDB: Write records with status: create, update, delete | ||
end | ||
note over DHR: TRANSFORM<br>(optional)<br>*for non-dcat sources | ||
loop items to transform | ||
DHR->>+MD: MDTransform(dataset) | ||
MD-->>-DHR: Transformed Item | ||
alt Transform fails | ||
DHR-->>HDB: Log failures as harvest_error with type: transform<br>update harvest_record status: error_transform | ||
end | ||
end | ||
note over DHR: VALIDATE | ||
loop VALIDATE items to create/update | ||
DHR->>DHR: Validate against schema | ||
alt Validation fails | ||
DHR-->>HDB: Log failures as harvest_error with type: validation<br>update harvest_record status: error_validation | ||
end | ||
end | ||
note over DHR: LOAD | ||
loop SYNC items to create/update/delete | ||
DHR->>CKAN: CKAN package_create (create), <br>package_update (update), <br>dataset_purge (delete) | ||
alt Sync fails | ||
DHR-->>HDB: Log failures as harvest_error with type: sync<br>UPDATE harvest_record to status: error_sync | ||
end | ||
end | ||
note over DHR: REPORT | ||
DHR->>HDB: POST harvest job metrics <br> UPDATE harvest_job to status: complete | ||
DHR->>SES: Email job metrics (jobMetrics, notification_emails) | ||
``` |
Oops, something went wrong.
efc4b7b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.