Skip to content

Commit

Permalink
Merge pull request #115 from unicef/fix/update-dpga-example
Browse files Browse the repository at this point in the history
Documentation: Update DPGA API endpoint in tutorial and example code
  • Loading branch information
merlos authored Dec 24, 2024
2 parents 872f525 + ade458e commit 000d8ec
Show file tree
Hide file tree
Showing 28 changed files with 1,515 additions and 2,302 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ docs/_site
# mag-cli docs site
mag-cli/site

# helm repo created by helm-scripts/local-helm-repo.sh script
_helm-repo

# Dev config for testing helm scripts
helm-scripts/dev.config


# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
6 changes: 3 additions & 3 deletions docs/get-started/automate-data-ingestion.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ The next step is to create a pipeline using Dagster. A pipeline is just a piece
The first thing we need to do is to install Dagster.

```sh
pip install dagster==1.6.4 dagster-webserver==1.6.4
pip install dagster==1.9.3 dagster-webserver==1.9.3
```

:::{.callout-note}
Expand Down Expand Up @@ -187,7 +187,7 @@ from dagster import asset
@asset
def raw_dpgs() -> DataFrame:
""" DPGs data from the API"""
dpgs_json_dict = requests.get("https://api.digitalpublicgoods.net/dpgs").json()
dpgs_json_dict = requests.get("https://app.digitalpublicgoods.net/api/dpgs").json()
df = pd.DataFrame.from_dict(dpgs_json_dict)
return df
Expand Down Expand Up @@ -286,7 +286,7 @@ from dagster import asset
def raw_dpgs() -> DataFrame:
""" DPGs data from the API"""
# Load from API
dpgs_json_dict = requests.get("https://api.digitalpublicgoods.net/dpgs").json()
dpgs_json_dict = requests.get("https://app.digitalpublicgoods.net/api/dpgs").json()
# Convert to pandas dataframe
df = pd.DataFrame.from_dict(dpgs_json_dict)
Expand Down
5 changes: 4 additions & 1 deletion docs/get-started/exploratory-analysis.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,16 @@ Ok. so now we can start coding. Copy this code in the first cell and run the cel
This will install some python packages. You can run command-line commands by prepending '!' to the command.
Now, add a new cell, copy the code below and run the cell
```python
import requests
import pandas as pd
dpgs_json_dict = requests.get("https://api.digitalpublicgoods.net/dpgs").json()
# Download the API data and convert to a pandas DataFrame
dpgs_json_dict = requests.get("https://app.digitalpublicgoods.net/api/dpgs").json()
df = pd.DataFrame.from_dict(dpgs_json_dict)
# See what we got
df.head()
```
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
1 change: 1 addition & 0 deletions examples/dpga-explorations/data/2024-12-24-dgps.json

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions examples/dpga-explorations/data/2024-12-24-dpgs.json

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file modified examples/dpga-explorations/data/latest-categories.parquet
Binary file not shown.
Binary file modified examples/dpga-explorations/data/latest-clear-ownership.parquet
Binary file not shown.
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion examples/dpga-explorations/data/latest-dpgs.json

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion examples/dpga-explorations/data/latest-nominees.json

This file was deleted.

Binary file modified examples/dpga-explorations/data/latest-open-licenses.parquet
Binary file not shown.
Binary file modified examples/dpga-explorations/data/latest-sdgs.parquet
Binary file not shown.
3,224 changes: 1,197 additions & 2,027 deletions examples/dpga-explorations/dpg-explorations.ipynb

Large diffs are not rendered by default.

535 changes: 298 additions & 237 deletions examples/dpga-explorations/dpga-basic.ipynb

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions examples/dpga-pipeline/dpga-pipeline-full-example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

This is a [Dagster](https://dagster.io) project that loads data form the DPGA pipeline and stores it in a Minio/S3 bucket.

This version of the pipeline uses more advanced concepts such as IOManagers, ConfigurableResources and asset Metadata.
This version of the pipeline uses more advanced concepts such as `IOManagers`, `ConfigurableResources` and asset `Metadata`.

This is part of the getting started tutorial of [magasin](http://magasin.github.io/get-started/)
This is part of the getting started tutorial of [magasin](http://magasin.unicef.io/get-started/)

## Usage

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,28 +117,6 @@ def apply_transformations(dpg_api_json_dict: Dict) -> DataFrame:

return df

#
# Assets
#

@asset(io_manager_key="minio_json_io_manager")
def raw_nominees(context: AssetExecutionContext, dpg_api: DPGResource) -> DataFrame:
"""
This asset contains the raw list of nominees from the DPG API
"""
# Get the json from the API
nominees_json_dict = dpg_api.get_list_from_dpga(stage="nominees").json()

df = apply_transformations(nominees_json_dict)

# Add some metadata
context.add_output_metadata(
metadata={
"number_of_nominees": len(df),
"preview": MetadataValue.md(df.head().to_markdown()),
}
)
return df

@asset(io_manager_key="minio_json_io_manager")
def raw_dpgs(context: AssetExecutionContext, dpg_api: DPGResource) -> DataFrame:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class DPGResource(ConfigurableResource):
description=(
"URL for the DPGA API"
),
default="https://api.digitalpublicgoods.net"
default="https://app.digitalpublicgoods.net/api"
)

def get_list_from_dpga(self, stage="dpgs") -> Response:
Expand Down
4 changes: 2 additions & 2 deletions examples/dpga-pipeline/dpga-pipeline-store-local/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# dpga-pipeline-store-local

This is the first version of the dpga pipeline that is run in the [second step](httpa://magasin.github.io/get-started/automate-data-ingestion.html) of the [getting started tutorial](httpa://magasin.github.io/get-started/).
This is the first version of the dpga pipeline that is run in the [second step](https://magasin.unicef.io/get-started/automate-data-ingestion.html) of the [getting started tutorial](https://magasin.unicef.io/get-started/).

Tested with `dagster==1.6.0` and `dagster-webserver==1.6.0`
Tested with `dagster==1.9.3` and `dagster-webserver==1.9.3`

## Usage

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
@asset
def raw_dpgs() -> DataFrame:
""" DPGs data from the API"""
dpgs_json_dict = requests.get("https://api.digitalpublicgoods.net/dpgs").json()
dpgs_json_dict = requests.get("https://app.digitalpublicgoods.net/api/dpgs").json()
df = pd.DataFrame.from_dict(dpgs_json_dict)
return df

Expand Down
4 changes: 2 additions & 2 deletions examples/dpga-pipeline/dpga-pipeline-store-minio/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# dpga-pipeline-store-minio

This is the first version of the dpga pipeline that is run in the [second step](httpa://magasin.github.io/get-started/automate-data-ingestion.html) of the [getting started tutorial](httpa://magasin.github.io/get-started/).
This version of the Dagster DPGA pipeline that is run in the [second step](https://magasin.unicef.io/get-started/automate-data-ingestion.html) of the [getting started tutorial](https://magasin.unicef.io/get-started/).

Tested with `dagster==1.6.0` and `dagster-webserver==1.6.0`
Tested with `dagster==1.9.3` and `dagster-webserver==1.9.3`

## Usage

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
def raw_dpgs() -> DataFrame:
""" DPGs data from the API"""
# Load from API
dpgs_json_dict = requests.get("https://api.digitalpublicgoods.net/dpgs").json()
dpgs_json_dict = requests.get("https://app.digitalpublicgoods.net/api/dpgs").json()

# Convert to pandas dataframe
df = pd.DataFrame.from_dict(dpgs_json_dict)
Expand Down

0 comments on commit 000d8ec

Please sign in to comment.