Skip to content

Commit

Permalink
Readme.md fixup
Browse files Browse the repository at this point in the history
  • Loading branch information
robert-bryson committed Jan 3, 2024
1 parent b61562c commit c66c4c2
Showing 1 changed file with 61 additions and 43 deletions.
104 changes: 61 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,68 +5,86 @@ transformation, and loading into the data.gov catalog.

## Features

The datagov-harvesting-logic offers the following features:

- Extract
- general purpose fetching and downloading of web resources.
- catered extraction to the following data formats:
- General purpose fetching and downloading of web resources.
- Catered extraction to the following data formats:
- DCAT-US
- Validation
- DCAT-US
- jsonschema validation using draft 2020-12.
- `jsonschema` validation using draft 2020-12.
- Load
- DCAT-US
- conversion of dcatu-us catalog into ckan dataset schema
- create, delete, update, and patch of ckan package/dataset
- Conversion of dcat-us catalog into ckan dataset schema
- Create, delete, update, and patch of ckan package/dataset

## Requirements

This project is using poetry to manage this project. Install [here](https://python-poetry.org/docs/#installation).
This project is using `poetry` to manage this project. Install [here](https://python-poetry.org/docs/#installation).

Once installed, `poetry install` installs dependencies into a local virtual environment.

## Testing

### CKAN load testing

- CKAN load testing doesn't require the services provided in the `docker-compose.yml`.
- [catalog-dev](https://catalog-dev.data.gov/) is used for ckan load testing.
- Create an api-key by signing into catalog-dev.
- Create an api-key by signing into catalog-dev.
- Create a `credentials.py` file at the root of the project containing the variable `ckan_catalog_dev_api_key` assigned to the api-key.
- run tests with the command `poetry run pytest ./tests/load/ckan`
- Run tests with the command `poetry run pytest ./tests/load/ckan`

### Harvester testing
- These tests are found in `extract`, and `validate`. Some of them rely on services in the `docker-compose.yml`. run using docker `docker compose up -d` and with the command `poetry run pytest --ignore=./tests/load/ckan`.

- These tests are found in `extract`, and `validate`. Some of them rely on services in the `docker-compose.yml`. Run using docker `docker compose up -d` and with the command `poetry run pytest --ignore=./tests/load/ckan`.

If you followed the instructions for `CKAN load testing` and `Harvester testing` you can simply run `poetry run pytest` to run all tests.

## Comparison

- `./tests/harvest_sources/ckan_datasets_resp.json`
- Represents what ckan would respond with after querying for the harvest source name
- `./tests/harvest_sources/dcatus_compare.json`
- Represents a changed harvest source
- Created:
- datasets[0]

```diff
+ "identifier" = "cftc-dc10"
```

- Deleted:
- datasets[0]

```diff
- "identifier" = "cftc-dc1"
```

- Updated:
- datasets[1]

```diff
- "modified": "R/P1M"
+ "modified": "R/P1M Update"
```

- datasets[2]

```diff
- "keyword": ["cotton on call", "cotton on-call"]
+ "keyword": ["cotton on call", "cotton on-call", "update keyword"]
```

- datasets[3]

```diff
"publisher": {
"name": "U.S. Commodity Futures Trading Commission",
"subOrganizationOf": {
- "name": "U.S. Government"
+ "name": "Changed Value"
}
}
```

## Comparison
- ./tests/harvest_sources/ckan_datasets_resp.json
- represents what ckan would respond with after querying for the harvest source name
- ./tests/harvest_sources/dcatus_compare.json
- represents a changed harvest source
- what has been created?
- datasets[0]
- "identifier" = "cftc-dc10"
- what has been deleted?
- datasets[0]
- "identifier" = "cftc-dc1"
- what has been updated?
- datasets[1]
- from "modified": "R/P1M" to "modified": "R/P1M Update"
- datasets[2]
- from "keyword": ["cotton on call", "cotton on-call"]
- to "keyword": ["cotton on call", "cotton on-call", "update keyword"]
- datasets[3]
- from "publisher": {
"name": "U.S. Commodity Futures Trading Commission",
"subOrganizationOf": {
"name": "U.S. Government"
}
}
- to "publisher": {
"name": "U.S. Commodity Futures Trading Commission",
"subOrganizationOf": {
"name": "Changed Value"
}
}
- ./test/harvest_sources/dcatus.json
- represents an original harvest source prior to change occuring.
- `./test/harvest_sources/dcatus.json`
- Represents an original harvest source prior to change occuring.

1 comment on commit c66c4c2

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
harvester
   __init__.py120100% 
   compare.py120100% 
   extract.py4877 85%
   load.py1001010 90%
   transform.py1377 46%
harvester/utils
   __init__.py20100% 
   json.py40100% 
   util.py70100% 
harvester/validate
   __init__.py20100% 
   dcat_us.py2433 88%
TOTAL2242788% 

Tests Skipped Failures Errors Time
26 0 💤 0 ❌ 0 🔥 17.321s ⏱️

Please sign in to comment.