Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploration: roadmap for explorers and mdims #3992

Open
pabloarosado opened this issue Feb 17, 2025 · 2 comments
Open

Exploration: roadmap for explorers and mdims #3992

pabloarosado opened this issue Feb 17, 2025 · 2 comments
Labels

Comments

@pabloarosado
Copy link
Contributor

pabloarosado commented Feb 17, 2025

One-liner

Define our ETL workflow for Explorers and MDIMs while unifying tooling as much as possible.

(previous context: #3969)

Context: MDIM vs Explorers

We have different kinds of similar objects in etl/owid-content:

Object Type TODO
Grapher-based Find examples
CSV-based Find examples
Indicator-based Find examples
Multidim pages Find examples

While we want to adopt more and more MDIM pages, we will still have explorers around. This is because both objects are, conceptually, different things:

  • MDIM: It is a data page, which, like any other data page, speaks about one specific indicator. The only difference is that, in the MDIM case, the indicator has multiple dimensions.
  • Explorer: Can host multiple indicators with different meanings.

Therefore, we need to improve the data workflow to support both products.

Goals

1. MDIMs and Explorers should come from ETL

Given the context explained above, and after various discussions, we agree that we should move towards having both explorers and MDIMS be ETL-based (export://explorers/ and export://multidim/, respectively).

NOTE: Ideally, the explorer config should live in a table in DB (similar to the multi_dim_data_pages table) instead of a tsv file in owid-content (but this is a separate issue).

  • Migrate explorers (one-off)
  • Explorers as MDIMs?
    • Some explorers may be converted into multidim pages when appropriate.
    • Are there any specific explorers with low-hanging fruits to convert into mdims?

2. Standardize the tooling used in explorers and MDIMs

These two objects are very similar, and ideally, they should rely on standard tooling to minimize the maintenance burden. This implies some additional transition work in the coming months.

  • Are there functions already developed for MDIM pages that could be reused in existing indicator-based explorers?
  • TODO: Test in COVID MDIM if we can set up a very similar pipeline compared to MDIMs.

3. Create a pleasant workflow experience for data scientists

@Marigold
Copy link
Collaborator

I spent a good chunk of time browsing various explorers, and whoa... this isn't going to be easy. It feels like every explorer is unique, and there's no obvious way to have a single approach for everything. The only thing I can confidently say is that CSV-based explorers are bad (though that alone doesn’t justify spending time migrating them).

I'm still wrapping my head around everything, so take the following notes with a grain of salt.

1. MDIMs and Explorers Should Come from ETL

The main question is whether we'd allow editing explorers from Admin or not. If yes, we'd need either some kind of "override" in the Admin layer (either in owid-content or the DB) or a way to write changes back to ETL. (Remember that we did this for indicator metadata, and it's used very rarely.)

Explorers with many combinations, like minerals, are well suited for ETL, but more bespoke explorers, like migration, are much more complex. Then again, some people prefer YAML, while others prefer Python, and it's unclear whether we should enforce a single approach.

2. Standardize the Tooling Used in Explorers and MDIMs

@lucasrodes has already done this with the COVID explorer and COVID MDIM. The explorer YAML representation is really close to MDIMs. I can imagine generating a similar config file that could power both MDIMs and (indicator-based) explorers. If we can make it work for COVID, where we’re already pretty close, then it should be doable for anything. But does this grand unification bring enough value?

I guess we need a couple more MDIMs to better decide where to put our energy.

Appendix

Some explorers I found interesting:

  • Water and Sanitation – CSV-based explorer, could be worth migrating to indicator-based.
  • Monkeypox – CSV-based explorer, more bespoke. Could it be migrated to ETL, and would it be worth it?

@lucasrodes
Copy link
Member

Thanks for the summary, @Marigold! You touch on very valid points.

Just to disclose my bias up front, my dream is to migrate all explorers and have a standardized way of doing things in the MDIM/explorer space, as we have for data steps.

My take is that this might not provide much value in the short term, but it will in the long term. I'm especially concerned with the update flow, where I think we should assume that everything is ETL-powered. So I don't think this is super urgent, yet a goal that would be great to have in, say, 1-2 year time.

In general, I think that deprecating CSV-based explorers (and chart-based) will help us maintain our infrastructure in the long run. It's annoying when developing tools to account for all these edge cases that do not come from ETL.

1. MDIMs and Explorers Should Come from ETL

I think we should probably create an issue with all explorers and rank them somehow by type or complexity. Also, whenever attempting to "migrate" one, we should advertise it to avoid conflicts with other edits.

One risk here is that the data scientist in charge of this explorer might be used to their current pipeline, so we should make sure that the new indicator-based is easy to understand and with appropriate tooling. I think it could make sense to do this after agreeing on some templating (as in MDIMs) in point 2 below.

2. Standardize the Tooling Used in Explorers and MDIMs

I am happy to look at the COVID explorer again and see how the MDIM tooling/approach can be applied there.

I think that we could possibly need some engineering work here, to add some of the features that we have on MDIMs now (being able to reference them by catalogPath, display settings per view, etc.) Basically, it'd be nice to improve the explorer config API on the engineering side and align it with MDIMs a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants