Skip to content

Commit

Permalink
Merge pull request #159 from Urban-Analytics-Technology-Platform/upda…
Browse files Browse the repository at this point in the history
…te-flow-diagram

updates flow-diagrams and readme with package rename
  • Loading branch information
andrewphilipsmith authored Dec 6, 2024
2 parents 3afaf20 + c1ec99c commit 76ff16b
Show file tree
Hide file tree
Showing 7 changed files with 138 additions and 77 deletions.
32 changes: 20 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,28 +28,36 @@

<!-- prettier-ignore-end -->

Popgetter is a convenience tool for downloading census data from a number of
different jurisdictions and coercing the data into common formats. The aim is
that city or region scale analysis can be easily
# What is popgetter?

Popgetter is a collection of tools, designed to make it convenient to download
census data from a number of different jurisdictions and coercing the data into
common formats. The aim is that city or region scale analysis can be easily
[replicated](https://the-turing-way.netlify.app/reproducible-research/overview/overview-definitions.html#table-of-definitions-for-reproducibility)
for different geographies, using the most detailed, locally available data.

## What popgetter does and doesn't do
# What is poppusher?

This repo is "poppusher", which is one component of the popgetter project.
Poppusher is a pipeline which downloads data from a number of different
jurisdictions and then processes it into a common format. The data is then
stored in a cloud-based data store, which can be accessed by other components of
the popgetter project.

See the [flow diagram](flow-diagram.md) for more details.

## What the popgetter system does and doesn't do

**Popgetter DOES:**

For each of the implemented countries:

- Download the most detailed geometries, for which census data is available.
- Download the most detailed census available for selected variables (currently
focused on population and car ownership).
- Download the most detailed census available for most, if not all, variables
published by the census.
- Ensures that the geometries and census data join correctly.

**Popgetter WILL**

- present some standard metadata to allow the user to see which variables are
available and any possible trade-off between geographic and demographic
disaggregation.
- Presents some standard metadata to allow the user to explore which variables
are available.
- publish the data in a set of common file types (eg CloudGeoBuff, Parquet,
PMtiles).

Expand Down
2 changes: 1 addition & 1 deletion cli_usage.md → docs/dagster_cli_usage.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# poppusher CLI
# Dagster CLI usage

For countries which have been ported to Dagster, the downloads can be invoked
via the [Dagster CLI](https://docs.dagster.io/_apidocs/cli).
Expand Down
110 changes: 110 additions & 0 deletions docs/flow-diagram.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Data flow diagram

The diagrams below show the flow of data through the system. The main purpose is
highlight the distinction between the data preparation pipeline (`poppusher`)
and the data access components (`popgetter-*`).

```mermaid
---
title: Poppusher
---
graph LR
subgraph download [Download country data]
census_download@{ shape: text, label: "(customised for each country's census format)" }
aA(Scotland)
aB(Northern Ireland)
aC(Singapore)
aD(USA)
aE(Belgium)
aF(Australia)
end
raw(raw data)
aA & aB & aC & aD & aE & aF --> raw ==> ingest
subgraph ingest
bA(Convert to common file formats)
bB(Derive common metadata info)
bC(Derive common metrics)
bA --> bB
bB --> bA
bC --> bB
bB --> bC
end
direction TB
subgraph processed [Cloud hosted structure data store]
direction TB
dir_struct_docs@{ shape: text, label: "(**_See docs_**)" }
dA("`**countries**
(plain-text)`")
subgraph percountry [per-country files]
dCa("`**metadata**
(parquet)`")
dCb("`**metrics**
(parquet)`")
dCc("`**geometry**
- (flatgeobuff)
- (GeoJSON)
- (PMTiles)`")
end
dir_struct_docs ~~~ dA
dA ~~~ percountry
click dir_struct_docs href "https://poppusher.readthedocs.io/en/latest/output_structure/" _blank
end
ingest ==> processed
```

```mermaid
---
title: Popgetter
---
graph LR
subgraph processed [Cloud hosted structure data store]
direction TB
dir_struct_docs@{ shape: text, label: "(**_See docs_**)" }
dA("`**countries**
(plain-text)`")
subgraph percountry [per-country files]
dCa("`**metadata**
(parquet)`")
dCb("`**metrics**
(parquet)`")
dCc("`**geometry**
- (flatgeobuff)
- (GeoJSON)
- (PMTiles)`")
end
dir_struct_docs ~~~ dA
dA ~~~ percountry
click dir_struct_docs href "https://poppusher.readthedocs.io/en/latest/output_structure/" _blank
end
direction TB
subgraph clients
core("`**popgetter-core**
common part of all clients
- complied to wasm.
- understands the directory structure
and downloads the data.
`")
direction TB
fA("`**popgetter-cli**
A commandline tool to query and download data`")
fB("`**popgetter-py**
Enables access from Python`")
fC("`**popgetter-browser**
A web interface for exploring the available data`")
fD("`**popgetter-llm**
An experimental natural language client using LLMs`")
core --> fA
core --> fB
core --> fC
core --> fD
end
processed ===> core
```
63 changes: 0 additions & 63 deletions flow-diagram.md

This file was deleted.

5 changes: 5 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1 +1,6 @@
site_name: Poppusher

plugins:
- search
- mermaid2:
javascript: https://unpkg.com/mermaid@11.4.0/dist/mermaid.esm.min.mjs
File renamed without changes.
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,8 @@ dev = [
"pyright >=1.1.339" # Used for static type checking (mypy is not yet compatible with Dagster)
]
docs = [
"mkdocs >=1.6.0"
"mkdocs >=1.6.0",
"mkdocs-mermaid2-plugin >=1.2.1",
]

[project.urls]
Expand Down

0 comments on commit 76ff16b

Please sign in to comment.