Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use flatgeobuf data extracts #1047

Merged
merged 19 commits into from
Jan 4, 2024
Merged

Conversation

spwoodcock
Copy link
Member

@spwoodcock spwoodcock commented Dec 16, 2023

Fixes #934 and addresses part of #381 and #889

Updates

Custom data extract:

  • If user uploads a flatgeobuf, this is deserialised as geojson for display on the map. The file is uploaded to our S3.
  • If user uploads geojson, this is converted to flatgeobuf on the backend and file uploaded to our S3.
  • In either case, a URL to the .fgb is stored in public.projects.data_extract_type (abusing this param for now, but it should be migrated to public.projects.data_extract_url eventually).

Custom data extracts are stored in the FMTM S3 bucket and managed by us.

OSM data extract:

  • When the user continues from the 'Data Extract' page to the 'Split Tasks' page, the endpoint is called to generate a new data extract in the background.
    • This same endpoint can also accept a project_id param to return an already existing data extract URL.
  • The data extract is generated in flatgeobuf format and stored in the S3 bucket that is part of raw-data-api. The URL to this extract is stored in our database.
  • The data extract is converted to GeoJSON file during the project creation and passed in as a param to the task splitting /task_split endpoint (eventually the splitting algorithm may accept flatgeobuf instead).

OSM data extracts are stored in the raw-data-api S3 bucket and managed by that service.

Checks

  • I have given this quite a few tests with different scenarios, but please thoroughly test this code @nrjadkry and @varun2948 before we merge.
  • The frontend implementation is a bit messy and could probably do with cleaning up.
  • The backend tests need to also be updated accordingly - should be pretty simple.

Future updates

As a consequence of this big change, we need to update a few things.

Frontend:

  • Improve the error handling and user feedback during this flow.
  • We need to block the user generating splits using the 'task splitting algorithm' until the data extract generation completes.
  • This data extract should be loaded in OpenLayers directly as flatgeobuf format throughout the frontend. There are helpers for this. We can even filter the data returned based on the viewport, or only return the extract data for a given task area.
  • References to 'buildings' and 'lines' uploaded extracts should be consolidated into one reference (dataExtractFile), as the type of geometries can be determined on the backend instead.
    • On the project creation Data Extract page we can also remove the radio to select the type of extract. The user can just upload an extract and we handle the rest.
    • The extracts are stored externally and no longer in our database, so this simplifies code on both backend and frontend.

Backend:

  • To do this quickly, I ended up reusing an endpoint that uses raw-data-api directly.
  • We have a partial implementation in place to do this via osm-rawdata instead: project_crud.get_data_extract_from_osm_rawdata.
  • We need to update osm-rawdata to support the bind_zip=False param, then use the package instead.

Note

We also need to consider regeneration of the extract, as it lasts 90 days.

This should be enough time for most projects, however if the user calls the get_data_extract endpoint and the extract is missing (deleted) a new file should be generated (already implemented).

We just need to make sure this endpoint is called on the project detail page, to return the flatgeobuf URL and display it on the map.

@nrjadkry
Copy link
Contributor

@spwoodcock I have pushed minor fixes here.
I have tested creating project by drawing an aoi.
Other than that, I think this is good to go.

@nrjadkry
Copy link
Contributor

@spwoodcock , I can see that you have used get_data_extracts api for downloading the data extracts. But, I think we need to use osm-rawdata in this too since we will need to extract different data according to the category. Is there any reason you have used raw-data-api query directly?

@spwoodcock
Copy link
Member Author

I commented on that in the original description, but that was how it was implemented until now.

We had a function in crud to use osm-rawdata, but it was never actually used by an endpoint.

I updated the existing code that uses raw-data-api to add the config param bind_zip=False that is essential here.

So basically we definitely want to update this to use osm-rawdata, but I wasn't sure if we need to update osm-rawdata to support adding the bind zip param first? (I hope it's quite simple to pass in the extra config. Originally we also had to consider adding auth to osm-rawdata too, but I worked it out with Kshitij so we don't need that anymore).

@spwoodcock
Copy link
Member Author

I won't have time to fix the broken test or update to use osm-rawdata before the new year, so this will get merged in Jan (unless you are able to work on those @nrjadkry ) 👍

@spwoodcock
Copy link
Member Author

Note: modify to address #1056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to backend code frontend Related to frontend code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use data extracts hosted in raw-data-api
2 participants