Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation check for forward index disabled when enabling columnar segment creation #12838

Conversation

jackjlli
Copy link
Member

@jackjlli jackjlli commented Apr 9, 2024

Now that column major segment builder is enabled by default in this PR (#12770), the validation check should be added on cases where the forward index is disabled. Otherwise, the segment generation would fail and be paused. Thus, it'd be great to capture this early when the table config is added/updated.

This PR aims to add the validation check for forward index disabled when enabling columnar segment creation.

@jackjlli jackjlli force-pushed the add-validation-check-for-forward-index-disabled-when-enabling-columnar-segment-creation branch from 4c2fca1 to c7f5887 Compare April 9, 2024 17:35
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow disabling forward index for consuming segment? Can we add a test to verify if it works?

@jackjlli
Copy link
Member Author

jackjlli commented Apr 9, 2024

Do we allow disabling forward index for consuming segment? Can we add a test to verify if it works?

With the change in this current PR no we won't allow disabling forward index for consuming segment as it will fail on table creation. The columnMajorSegmentBuilderEnabled is set to true by default in the previous PR. And in fact, columnar segment generation won't work with consuming segment whose column has forward index disabled. Even without adding the validation check, the columnar segment generation would still fail (check this line of code). The purpose of this PR is to fail fast instead of waiting for a segment commit request.

Regarding tests, I think the ones in TableConfigUtilsTest would be sufficient as in this PR the failure would be detected during table creation?

@Jackie-Jiang
Copy link
Contributor

What I was asking is when we disable forward index, will the config be picked up for consuming segment? Will the consuming segment only create inverted index, but not forward index?

@Jackie-Jiang
Copy link
Contributor

I don't think forward index can be disabled for mutable segment (see this line), not sure how it is handled right now

@jackjlli
Copy link
Member Author

jackjlli commented Apr 9, 2024

Right, I was about to paste you the same link. Since forward index is always enabled by default, I don't see a reason why it's explicitly set to be disabled for realtime tables. And since with columnMajorSegmentBuilderEnabled is on by default, any realtime segment completion will fail eventually, I think it makes more sense to fail it fast in order to bring the attention to the table owners/pinot admins. @Jackie-Jiang wdyt?

@Jackie-Jiang
Copy link
Contributor

My point is that forward index cannot be disabled for consuming segment anyway, regardless of whether columnMajorSegmentBuilderEnabled is enabled or not. Have you tried disabling forward index for any REALTIME table? Does it work?

@jackjlli
Copy link
Member Author

I did test out the cases when forward index is disabled for the realtime table. And here is the exception messages:

2024/04/09 16:31:42.220 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread_55] Allocating 200 bytes for: airlineStats__7__1__20240409T2331Z:$ts$DAY.sv.unsorted.fwd
2024/04/09 16:31:42.220 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread_55] Allocating 512 bytes for: airlineStats__7__1__20240409T2331Z:$ts$MONTH.dict
2024/04/09 16:31:42.220 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread_55] Allocating 200 bytes for: airlineStats__7__1__20240409T2331Z:$ts$MONTH.sv.unsorted.fwd
2024/04/09 16:31:42.220 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread_55] Allocating 512 bytes for: airlineStats__7__1__20240409T2331Z:$ts$WEEK.dict
2024/04/09 16:31:42.220 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread_55] Allocating 200 bytes for: airlineStats__7__1__20240409T2331Z:$ts$WEEK.sv.unsorted.fwd
2024/04/09 16:31:42.220 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread_55] Allocating 256 bytes for: airlineStats__7__1__20240409T2331Z:ActualElapsedTime.dict
2024/04/09 16:31:42.220 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread_55] Allocating 200 bytes for: airlineStats__7__1__20240409T2331Z:ActualElapsedTime.sv.unsorted.fwd
2024/04/09 16:31:42.221 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread_55] Allocating 256 bytes for: airlineStats__7__1__20240409T2331Z:AirTime.dict
...
2024/04/09 16:31:42.232 WARN [RealtimeSegmentDataManager_airlineStats__6__1__20240409T2331Z] [HelixTaskExecutor-message_handle_thread_54] Scheduling task to call controller to mark the segment as OFFLINE in Ideal State due to initialization error: 'Forward index is required'
2024/04/09 16:31:42.232 ERROR [216_7050 - SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread_55] Caught exception in state transition OFFLINE -> CONSUMING for table: airlineStats_REALTIME, segment: airlineStats__7__1__20240409T2331Z
java.lang.IllegalArgumentException: Forward index is required
        at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:145) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl$IndexContainer.<init>(MutableSegmentImpl.java:1304) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.pinot.segment.local.indexsegment.mutable.MutableSegmentImpl.<init>(MutableSegmentImpl.java:364) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.pinot.core.data.manager.realtime.RealtimeSegmentDataManager.<init>(RealtimeSegmentDataManager.java:1529) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:462) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:241) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:91) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
        at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
        at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:278) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) ~[pinot-all-1.2.0-SNAPSHOT-jar-with-dependencies.jar:1.2.0-SNAPSHOT-c7f5887d55981764646d873b836eaed384ed4339]
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]

updated the fieldConfig to the realtime table config:

@@ -55,6 +57,19 @@
           "MONTH"
         ]
       }
+    },
+    {
+      "name": "AirTime",
+      "encodingType": "DICTIONARY",
+      "indexType": "INVERTED",
+      "indexTypes": [
+        "INVERTED"
+      ],
+      "indexes": null,
+      "properties": {
+        "forwardIndexDisabled": "true"
+      },
+      "tierOverwrites": null
     }
   ],
   "metadata": {

So I think it still makes sense to fail fast on table config update.

@Jackie-Jiang
Copy link
Contributor

Yeah, so basically we can add a check that forward index cannot be disabled for real-time table. It should have nothing to do with enabling columnar segment creation.

@jackjlli jackjlli force-pushed the add-validation-check-for-forward-index-disabled-when-enabling-columnar-segment-creation branch from c7f5887 to 75498ba Compare April 10, 2024 20:48
@jackjlli
Copy link
Member Author

Ah that's good point! Adjusted the logic to only check for the table type in the latest push. @Jackie-Jiang PTAL. Thanks!

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comment

@jackjlli jackjlli force-pushed the add-validation-check-for-forward-index-disabled-when-enabling-columnar-segment-creation branch from 75498ba to 7dfd5c6 Compare April 16, 2024 20:46
@jackjlli jackjlli merged commit 1d807df into master Apr 17, 2024
20 checks passed
@jackjlli jackjlli deleted the add-validation-check-for-forward-index-disabled-when-enabling-columnar-segment-creation branch April 17, 2024 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants