You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The metadata scan has a hard limit (metadata-fetch-limit, default 1,000) applied at the storage scan level. INACTIVE (soft-deleted) rows consume this limit quota alongside active rows, but there is currently no way to monitor the true quota usage before silent truncation occurs.
When the total rows (ACTIVE + INACTIVE) in the metastore exceed the limit, active entities are silently dropped from the scan result, potentially causing serving issues.
Scenario
With metadata-fetch-limit = 1,000 (default) and a single database:
Step
Metastore rows
Edge ACTIVE
Edge INACTIVE
Create 1,000 tables
1,000
1,000
0
Delete 500 tables
1,000
500
500
Create 1 new table
⚠️1,001
501
500
After the last step, scanStorage() fetches 1,000 rows (sorted by key, capped by limit). One row is truncated — which may be the newly created active table. The isActive filter then removes ~500 INACTIVE edges, leaving DdlPage.count ≈ 500.
Result: An active table is silently missing from the scan, but DdlPage.count (≈500) gives no indication that truncation occurred.
Note: "Delete" here means the full 2-step soft delete (deactivate → delete), which sets edge to INACTIVE while keeping the metastore row.
No existing API can detect this
API
What it exposes
Why it's insufficient
DDL API (DdlPage.count)
Edge-ACTIVE entity count (after isActive filter)
INACTIVE (deleted) rows are filtered out. In the scenario above, count ≈ 500 while true metastore rows = 1,001. No truncation signal.
Shows true row count, but MetastoreInspector queries the database directly (bypasses AbstractLabel). Requires merging local + global dumps and phase-prefix filtering. Not storage-agnostic.
Actuator (/actuator/env)
metadata-fetch-limit config value
Limit value only. No row count information.
In-memory cache
Active entities loaded by updateAllMetadata()
Same as DDL API — only edge-ACTIVE entities after filter.
Root cause: The true quota usage (scannedRowCount — rows returned from storage before isActive filter) is computed in AbstractLabel.scan() but discarded. No API exposes this value.
Proposed Solution
The server already scans the metastore periodically (updateAllMetadata()). During each scan, scannedRowCount (rows returned from storage, before the isActive filter) is computed in AbstractLabel.scan() but immediately discarded.
Proposal: Retain scannedRowCount and expose it through DdlPage.
LocalBackedJdbcHashLabel.scan() merges results from two stores:
local.zipWith(global) { a, b -> a + b } // DataFrame.plus()
Since ScanResult is consumed within AbstractLabel.scan() and only DataFrame is returned, the scannedRowCount from each store needs to be carried through the merge. This may require either:
Passing scannedRowCount via DataFrame metadata (e.g., stats field), or
Adjusting LocalBackedJdbcHashLabel.scan() to sum row counts from both stores
This is an implementation detail to consider.
Alternatives Considered
External monitoring only (shell script + existing DDL API): DdlPage.count alone cannot detect truncation when INACTIVE rows are present (see Scenario). Only viable as a rough approximation.
Metastore dump API: Can show true row counts, but requires merging local + global stores and phase-prefix filtering. Also, MetastoreInspector queries the underlying database directly (bypassing AbstractLabel), so it is not storage-agnostic — a future storage change (e.g., JDBC → HBase) would break this approach.
Additional Context
metadata-fetch-limit (default 1,000) is queryable via /actuator/env/kc.graph.metadata-fetch-limit
Problem
The metadata scan has a hard limit (
metadata-fetch-limit, default 1,000) applied at the storage scan level. INACTIVE (soft-deleted) rows consume this limit quota alongside active rows, but there is currently no way to monitor the true quota usage before silent truncation occurs.When the total rows (ACTIVE + INACTIVE) in the metastore exceed the limit, active entities are silently dropped from the scan result, potentially causing serving issues.
Scenario
With
metadata-fetch-limit = 1,000(default) and a single database:After the last step,
scanStorage()fetches 1,000 rows (sorted by key, capped by limit). One row is truncated — which may be the newly created active table. TheisActivefilter then removes ~500 INACTIVE edges, leavingDdlPage.count ≈ 500.Result: An active table is silently missing from the scan, but
DdlPage.count(≈500) gives no indication that truncation occurred.No existing API can detect this
DdlPage.count)isActivefilter)count ≈ 500while true metastore rows = 1,001. No truncation signal./graph/v2/metastore/{global,local})MetastoreInspectorqueries the database directly (bypassesAbstractLabel). Requires merging local + global dumps and phase-prefix filtering. Not storage-agnostic./actuator/env)metadata-fetch-limitconfig valueupdateAllMetadata()Root cause: The true quota usage (
scannedRowCount— rows returned from storage beforeisActivefilter) is computed inAbstractLabel.scan()but discarded. No API exposes this value.Proposed Solution
The server already scans the metastore periodically (
updateAllMetadata()). During each scan,scannedRowCount(rows returned from storage, before theisActivefilter) is computed inAbstractLabel.scan()but immediately discarded.Proposal: Retain
scannedRowCountand expose it throughDdlPage.This requires minimal changes — no scan logic modification, just passing a count that is already computed:
ScanResultscannedRowCount: IntfieldAbstractLabel.scan()scannedRowCounttoScanResultDdlPagescanCount: Long(default =countfor backward compat)DdlService.getAll()scanCountfrom scan resultWith
scanCountexposed, monitoring becomes straightforward:scanCount / metadata-fetch-limit= quota usage.Note on
LocalBackedJdbcHashLabelLocalBackedJdbcHashLabel.scan()merges results from two stores:local.zipWith(global) { a, b -> a + b } // DataFrame.plus()Since
ScanResultis consumed withinAbstractLabel.scan()and onlyDataFrameis returned, thescannedRowCountfrom each store needs to be carried through the merge. This may require either:scannedRowCountviaDataFramemetadata (e.g.,statsfield), orLocalBackedJdbcHashLabel.scan()to sum row counts from both storesThis is an implementation detail to consider.
Alternatives Considered
DdlPage.countalone cannot detect truncation when INACTIVE rows are present (see Scenario). Only viable as a rough approximation.MetastoreInspectorqueries the underlying database directly (bypassingAbstractLabel), so it is not storage-agnostic — a future storage change (e.g., JDBC → HBase) would break this approach.Additional Context
metadata-fetch-limit(default 1,000) is queryable via/actuator/env/kc.graph.metadata-fetch-limitScanResultalready carrieshasNext(scan metadata) — carryingscannedRowCountis analogousDdlPageis only constructed fromDdlService.getAll()(scan-based), soscanCountis not semantically out of placeFeedback on this approach is welcome.
Internal scan flow reference
sequenceDiagram participant DdlService participant Label as AbstractLabel.scan() participant SQL as scanStorage() participant Filter as .filter { isActive } DdlService->>Label: scan(ScanFilter(limit=metadata-fetch-limit)) Label->>SQL: SELECT ... WHERE k LIKE '{prefix}%' LIMIT {limit} SQL-->>Label: allRows (ACTIVE + INACTIVE mixed) Note right of SQL: 📊 scannedRowCount = true quota usage Label->>Filter: allRows.filter { isActive } Filter-->>Label: rows (INACTIVE removed) Note right of Filter: ⚠️ scannedRowCount discarded here Label-->>DdlService: DataFrame(rows) DdlService->>DdlService: DdlPage(count = rows.size) Note over DdlService: ❌ Only post-filter count exposed