Skip to content

Conversation

zhixuanjia
Copy link
Contributor

@zhixuanjia zhixuanjia commented Sep 28, 2025

Improvements on generic scroll API in OpenAPI v3

  1. scrollId is now provided for each entity in the response. This enables backward pagination (scroll backward) and scroll from any position.
  2. Facets are part of the scroll API response. This change makes it closer to its counterpart in RestLI and GraphQL which has facets supported.

Test

curl -X POST 'http://localhost:8080/openapi/v3/entity/scroll?count=3' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJhY3RvclR5cGUiOiJVU0VSIiwiYWN0b3JJZCI6ImRhdGFodWIiLCJ0eXBlIjoiUEVSU09OQUwiLCJ2ZXJzaW9uIjoiMiIsImp0aSI6ImVhNzQ2NjI5LTUzMGYtNGNiYi1iOTg2LTE5MWEyYWZlNzllNSIsInN1YiI6ImRhdGFodWIiLCJleHAiOjE3NjI0NzM5MDgsImlzcyI6ImRhdGFodWItbWV0YWRhdGEtc2VydmljZSJ9.NR41fm8ICkB7WEdruK9AZA-CCuN8mSth9WLJmGGkmSA' \
  -d '{
        "aspects": [],
        "entities": []
  }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2626    0  2575  100    51  34894    691 --:--:-- --:--:-- --:--:-- 35486
{
  "scrollId": "eyJzb3J0IjpbInVybjpsaTpkYXRhSHViSW5nZXN0aW9uU291cmNlOmRhdGFodWItZ2MiXSwicGl0SWQiOm51bGwsImV4cGlyYXRpb25UaW1lIjowfQ==",
  "entities": [
    {
      "urn": "urn:li:corpuser:datahub",
      "corpUserKey": {
        "value": {
          "username": "datahub"
        }
      },
      "corpUserEditableInfo": {
        "value": {
          "teams": [],
          "skills": [],
          "pictureLink": "https://raw.githubusercontent.com/datahub-project/datahub/master/datahub-web-react/src/images/default_avatar.png"
        }
      },
      "corpUserInfo": {
        "value": {
          "active": true,
          "system": true,
          "title": "DataHub Root User",
          "displayName": "DataHub"
        }
      },
      "scrollId": "eyJzb3J0IjpbInVybjpsaTpjb3JwdXNlcjpkYXRhaHViIl0sInBpdElkIjpudWxsLCJleHBpcmF0aW9uVGltZSI6MH0="
    },
    {
      "urn": "urn:li:dataHubAccessToken:DE7TX5PvdX44V/y5swC3boFW17OO6+iY1MwNxmRbLPM=",
      "dataHubAccessTokenInfo": {
        "value": {
          "name": "test",
          "description": "test",
          "createdAt": 1759881908153,
          "actorUrn": "urn:li:corpuser:datahub",
          "expiresAt": 1762473908153,
          "ownerUrn": "urn:li:corpuser:datahub"
        }
      },
      "dataHubAccessTokenKey": {
        "value": {
          "id": "DE7TX5PvdX44V/y5swC3boFW17OO6+iY1MwNxmRbLPM="
        }
      },
      "scrollId": "eyJzb3J0IjpbInVybjpsaTpkYXRhSHViQWNjZXNzVG9rZW46REU3VFg1UHZkWDQ0Vi95NXN3QzNib0ZXMTdPTzYraVkxTXdOeG1SYkxQTT0iXSwicGl0SWQiOm51bGwsImV4cGlyYXRpb25UaW1lIjowfQ=="
    },
    {
      "urn": "urn:li:dataHubIngestionSource:datahub-gc",
      "dataHubIngestionSourceKey": {
        "value": {
          "id": "datahub-gc"
        }
      },
      "dataHubIngestionSourceInfo": {
        "value": {
          "name": "datahub-gc",
          "schedule": {
            "timezone": "UTC",
            "interval": "0 1 * * *"
          },
          "source": {
            "type": "SYSTEM"
          },
          "type": "datahub-gc",
          "config": {
            "recipe": "{\"source\":{\"type\":\"datahub-gc\",\"config\":{\"cleanup_expired_tokens\":false,\"truncate_indices\":true,\"truncate_index_older_than_days\":30,\"dataprocess_cleanup\":{\"enabled\":false,\"retention_days\":10,\"delete_empty_data_jobs\":false,\"delete_empty_data_flows\":false,\"hard_delete_entities\":false,\"keep_last_n\":5,\"batch_size\":500,\"max_workers\":10},\"soft_deleted_entities_cleanup\":{\"retention_days\":10,\"enabled\":true,\"batch_size\":500,\"max_workers\":10,\"limit_entities_delete\":25000,\"runtime_limit_seconds\":7200},\"execution_request_cleanup\":{\"keep_history_min_count\":10,\"keep_history_max_count\":1000,\"keep_history_max_days\":90,\"batch_read_size\":100,\"enabled\":true,\"runtime_limit_seconds\":3600,\"limit_entities_delete\":10000,\"max_read_errors\":10}}}}",
            "extraArgs": {},
            "version": "",
            "debugMode": false,
            "executorId": "default"
          }
        }
      },
      "scrollId": "eyJzb3J0IjpbInVybjpsaTpkYXRhSHViSW5nZXN0aW9uU291cmNlOmRhdGFodWItZ2MiXSwicGl0SWQiOm51bGwsImV4cGlyYXRpb25UaW1lIjowfQ=="
    }
  ],
  "facets": [
    {
      "field": "entity",
      "aggregations": {
        "dataHubIngestionSource": 1,
        "corpuser": 1,
        "dataHubAccessToken": 1
      }
    }
  ],
  "totalCount": 172
}

Copy link

codecov bot commented Oct 4, 2025

Bundle Report

Changes will decrease total bundle size by 3.53kB (-0.01%) ⬇️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
datahub-react-web-esm 28.58MB -3.53kB (-0.01%) ⬇️

Affected Assets, Files, and Routes:

view changes for bundle: datahub-react-web-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/index-*.js -3.53kB 18.92MB -0.02%

Copy link

codecov bot commented Oct 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@zhixuanjia zhixuanjia changed the title feat(scroll entities): Improve Scroll entities API to enable advanced pagination feat(OpenAPI v3): Improve generic scroll API to have advanced pagination and facets Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community devops PR or Issue related to DataHub backend & deployment needs-review Label for PRs that need review from a maintainer. product PR or Issue related to the DataHub UI/UX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant