Skip to content

remove parser/parserMap from DataSchema#19173

Merged
clintropolis merged 9 commits intoapache:masterfrom
clintropolis:remove-parser-dataschema
Mar 18, 2026
Merged

remove parser/parserMap from DataSchema#19173
clintropolis merged 9 commits intoapache:masterfrom
clintropolis:remove-parser-dataschema

Conversation

@clintropolis
Copy link
Member

@clintropolis clintropolis commented Mar 17, 2026

Description

Removes parser from DataSchema since parsers are no longer supported by any ingestion task type after #19109 and #19166

(same release note as #19166 since they are a bit related)

Release note

Support for the deprecated parser has been removed or streaming ingest tasks such as Kafka and Kinesis. Operators must now specify inputSource/inputFormat on the ioConfig of the supervisor spec, and the dataSchema must not specify a parser. This should be done prior to upgrading to Druid 37 or newer.

@@ -219,37 +216,31 @@ public abstract class SeekableStreamIndexTaskTestBase extends EasyMockSupport
OBJECT_MAPPER = new TestUtils().getTestObjectMapper();
OBJECT_MAPPER.registerSubtypes(new NamedType(JSONParseSpec.class, "json"));
OLD_DATA_SCHEMA = DataSchema.builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so OLD anymore now is it

Does this need to exist? It looks like the tests that use it all have names like "xyzWithLegacyParser". Consider removing it and the tests that use it


public DataSchema(
String dataSource,
@Nullable TimestampSpec timestampSpec, // can be null in old task spec
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should timestampSpec and dimensionsSpec be required now that parser is not allowed? They used to be computed on demand from parser if not specified. From reading the code, I think there was this weird thing where if you omitted all three of them, the error you'd get from getTimestampSpec() and getDimensionsSpec() would be an NPE with message "inputRowParser". Now those methods will just return null, which may be worse, since they could cause even stranger failures downstream.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had it like this, requiring both of these be set to non-null, but a bunch of tests everywhere failed, which caused me to look a bit closer at the logic.

The previous behavior was that if both timestampSpec and dimensionsSpec are null, and parser was null, then it was fine and allowed, and they apparently got filled in later by callers with defaults based on at least some of the passing tests that had neither set. What was not ok was both time and dims being set and also parser being set, or parser being set but having null timestamp and dims. Idk if this is what was intended, but it seems like what it was.

I'm ok changing the logic to require them, but it would be a slight user facing behavior change. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous behavior was that if both timestampSpec and dimensionsSpec are null, and parser was null, then it was fine and allowed

How did that work for timestampSpec? The old code was:

  public TimestampSpec getTimestampSpec()
  {
    if (timestampSpec == null) {
      timestampSpec = Preconditions.checkNotNull(getParser(), "inputRowParser").getParseSpec().getTimestampSpec();
    }
    return timestampSpec;
  }

If timestampSpec and parser are both not set, wouldn't calling getTimestampSpec() throw NPE? Maybe the tests weren't calling it, but I think in real production usage, it would always be called.

I suppose in the old code dimensionsSpec could be null though. In that case we would use an exclusions-based dimensionsSpec created by computeDimensionsSpec. We can keep that behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did that work for timestampSpec?

not sure, maybe it was only for tests that were not doing anything with it... will have a look when i fixup stuff

)
{
this(dataSource, timestampSpec, dimensionsSpec, aggregators, granularitySpec, transformSpec, projections);
InvalidInput.conditionalException(parserMap == null, "parser was removed in Druid 37, define the timestampSpec and dimensionSpec on the schema directly instead of nested inside the parser definition");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have both a release note and an item in the documentation that for supervisors, any usages of parser need be migrated to timestampSpec, dimensionsSpec, and (elsewhere in the spec) inputFormat -before- updating. After updating the supervisor will fail to deserialize so updating it will be trickier.

Btw, dimensionsSpec (spelling).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, yea I marked this as release notes, but forgot to write one. It somewhat falls under the umbrella of #19166, but this was not removed in that PR because hadoop was still using it.

@clintropolis clintropolis merged commit fa55b6b into apache:master Mar 18, 2026
37 checks passed
@clintropolis clintropolis deleted the remove-parser-dataschema branch March 18, 2026 18:20
@github-actions github-actions bot added this to the 37.0.0 milestone Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants