Use flatgeobuf data extracts #1047

spwoodcock · 2023-12-16T17:12:42Z

Fixes #934 and addresses part of #381 and #889

Updates

Custom data extract:

If user uploads a flatgeobuf, this is deserialised as geojson for display on the map. The file is uploaded to our S3.
If user uploads geojson, this is converted to flatgeobuf on the backend and file uploaded to our S3.
In either case, a URL to the .fgb is stored in public.projects.data_extract_type (abusing this param for now, but it should be migrated to public.projects.data_extract_url eventually).

Custom data extracts are stored in the FMTM S3 bucket and managed by us.

OSM data extract:

When the user continues from the 'Data Extract' page to the 'Split Tasks' page, the endpoint is called to generate a new data extract in the background.
- This same endpoint can also accept a project_id param to return an already existing data extract URL.
The data extract is generated in flatgeobuf format and stored in the S3 bucket that is part of raw-data-api. The URL to this extract is stored in our database.
The data extract is converted to GeoJSON file during the project creation and passed in as a param to the task splitting /task_split endpoint (eventually the splitting algorithm may accept flatgeobuf instead).

OSM data extracts are stored in the raw-data-api S3 bucket and managed by that service.

Checks

I have given this quite a few tests with different scenarios, but please thoroughly test this code @nrjadkry and @varun2948 before we merge.
The frontend implementation is a bit messy and could probably do with cleaning up.
The backend tests need to also be updated accordingly - should be pretty simple.

Future updates

As a consequence of this big change, we need to update a few things.

Frontend:

Improve the error handling and user feedback during this flow.
We need to block the user generating splits using the 'task splitting algorithm' until the data extract generation completes.
This data extract should be loaded in OpenLayers directly as flatgeobuf format throughout the frontend. There are helpers for this. We can even filter the data returned based on the viewport, or only return the extract data for a given task area.
References to 'buildings' and 'lines' uploaded extracts should be consolidated into one reference (dataExtractFile), as the type of geometries can be determined on the backend instead.
- On the project creation Data Extract page we can also remove the radio to select the type of extract. The user can just upload an extract and we handle the rest.
- The extracts are stored externally and no longer in our database, so this simplifies code on both backend and frontend.

Backend:

To do this quickly, I ended up reusing an endpoint that uses raw-data-api directly.
We have a partial implementation in place to do this via osm-rawdata instead: project_crud.get_data_extract_from_osm_rawdata.
We need to update osm-rawdata to support the bind_zip=False param, then use the package instead.

Note

We also need to consider regeneration of the extract, as it lasts 90 days.

This should be enough time for most projects, however if the user calls the get_data_extract endpoint and the extract is missing (deleted) a new file should be generated (already implemented).

We just need to make sure this endpoint is called on the project detail page, to return the flatgeobuf URL and display it on the map.

nrjadkry · 2023-12-18T05:43:48Z

@spwoodcock I have pushed minor fixes here.
I have tested creating project by drawing an aoi.
Other than that, I think this is good to go.

nrjadkry · 2023-12-18T07:11:21Z

@spwoodcock , I can see that you have used get_data_extracts api for downloading the data extracts. But, I think we need to use osm-rawdata in this too since we will need to extract different data according to the category. Is there any reason you have used raw-data-api query directly?

spwoodcock · 2023-12-18T12:57:38Z

I commented on that in the original description, but that was how it was implemented until now.

We had a function in crud to use osm-rawdata, but it was never actually used by an endpoint.

I updated the existing code that uses raw-data-api to add the config param bind_zip=False that is essential here.

So basically we definitely want to update this to use osm-rawdata, but I wasn't sure if we need to update osm-rawdata to support adding the bind zip param first? (I hope it's quite simple to pass in the extra config. Originally we also had to consider adding auth to osm-rawdata too, but I worked it out with Kshitij so we don't need that anymore).

spwoodcock · 2023-12-20T14:45:25Z

I won't have time to fix the broken test or update to use osm-rawdata before the new year, so this will get merged in Jan (unless you are able to work on those @nrjadkry ) 👍

spwoodcock · 2023-12-25T08:07:23Z

Note: modify to address #1056

spwoodcock requested review from varun2948 and nrjadkry December 16, 2023 17:12

spwoodcock self-assigned this Dec 16, 2023

spwoodcock temporarily deployed to test December 16, 2023 17:12 — with GitHub Actions Inactive

github-actions bot added frontend Related to frontend code backend Related to backend code labels Dec 16, 2023

spwoodcock had a problem deploying to test December 16, 2023 17:13 — with GitHub Actions Failure

spwoodcock temporarily deployed to test December 16, 2023 17:21 — with GitHub Actions Inactive

spwoodcock had a problem deploying to test December 16, 2023 17:21 — with GitHub Actions Failure

nrjadkry temporarily deployed to test December 18, 2023 05:30 — with GitHub Actions Inactive

nrjadkry had a problem deploying to test December 18, 2023 05:30 — with GitHub Actions Failure

nrjadkry temporarily deployed to test December 18, 2023 05:41 — with GitHub Actions Inactive

nrjadkry had a problem deploying to test December 18, 2023 05:41 — with GitHub Actions Failure

spwoodcock added 12 commits January 4, 2024 11:09

fix: allow creating organisations without logo

fe9f39e

build(frontend): add flatgeobuf dependency

b78fdf6

refactor: extra typing and docstring for geojson_to_flatgeobuf

a228d57

fix: geojson --> fgb async, handle empty return

9fce423

feat: upload custom extract to s3, insert url to db

a21822d

refactor: check_crs geojson parsing

ef9c47b

test: fix extract upload test

3730dfc

refactor: consolidate building+line geojson --> extract geojson

a6aa0d7

fix: sort category list during form selection

54de008

build: bump fmtm-splitter --> 0.2.5 to avoid fgb errors

9857fb1

fix: task_split endpoint always take data extract param

a9e6cda

feat: add endpoint to get data extract as flatgeobuf

4c6751d

spwoodcock and others added 6 commits January 4, 2024 11:10

feat: get new data extract on page change if selected

dcea85f

feat: handle fgb custom data extract upload

2ab7c81

feat: add data extract url to project on submit

6c99b84

refactor: remove extra log.warning in project_crud

b418ba5

hotfix:check geom type Feature in get_data_extract_url

02d2ff0

fix: check geom type Feature in preview tasks

4a056e8

spwoodcock force-pushed the feat/flatgeobuf-extracts branch from d7bccaf to 4a056e8 Compare January 4, 2024 11:37

spwoodcock temporarily deployed to test January 4, 2024 11:37 — with GitHub Actions Inactive

spwoodcock had a problem deploying to test January 4, 2024 11:37 — with GitHub Actions Failure

refactor: replace pkg_resources deprecation with importlib

dbc42f2

spwoodcock temporarily deployed to test January 4, 2024 12:35 — with GitHub Actions Inactive

spwoodcock temporarily deployed to 1047/merge January 4, 2024 12:35 — with GitHub Actions Inactive

spwoodcock had a problem deploying to test January 4, 2024 12:38 — with GitHub Actions Failure

spwoodcock temporarily deployed to test January 4, 2024 12:52 — with GitHub Actions Inactive

spwoodcock temporarily deployed to 1047/merge January 4, 2024 12:52 — with GitHub Actions Inactive

spwoodcock had a problem deploying to test January 4, 2024 12:55 — with GitHub Actions Failure

spwoodcock merged commit 06b64db into development Jan 4, 2024
5 of 7 checks passed

spwoodcock deleted the feat/flatgeobuf-extracts branch January 4, 2024 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use flatgeobuf data extracts #1047

Use flatgeobuf data extracts #1047

spwoodcock commented Dec 16, 2023 •

edited

Loading

nrjadkry commented Dec 18, 2023

nrjadkry commented Dec 18, 2023

spwoodcock commented Dec 18, 2023

spwoodcock commented Dec 20, 2023

spwoodcock commented Dec 25, 2023

Use flatgeobuf data extracts #1047

Use flatgeobuf data extracts #1047

Conversation

spwoodcock commented Dec 16, 2023 • edited Loading

Updates

Checks

Future updates

Note

nrjadkry commented Dec 18, 2023

nrjadkry commented Dec 18, 2023

spwoodcock commented Dec 18, 2023

spwoodcock commented Dec 20, 2023

spwoodcock commented Dec 25, 2023

spwoodcock commented Dec 16, 2023 •

edited

Loading