Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dataset support for quidem tests #17699

Merged
merged 76 commits into from
Feb 10, 2025
Merged

Conversation

kgyrtkirk
Copy link
Member

@kgyrtkirk kgyrtkirk commented Feb 5, 2025

Enables to use datasets in druidtest uri-s.

The datasets should point to a directory - all json files become candidates to be interpeted as
ingestiontions.

The system will not run a full ingestion - but to simplify usage recognizing an ingestion seemed like the easiest way.

Example:

!use druidtest://?componentSupplier=StandardComponentSupplier&datasets=sql/src/test/quidem/sampledataset

Locations are relative to the projectroot - for both the datasets option and for the LocalInputSource-es looking for their inputs inside the ingestions.

Setting datasets to a value supresses the initialization of the componentsupplier's builtin datasets.

return walker;
}

@Provides
@LazySingleton
public List<TestDataSet> buildCustomTables(ObjectMapper objectMapper, TempDirProducer tdp,

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'tdp' is never used.
Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, not major, but the Guice named annotation thing should probably be addressed. That's a pattern that I'd prefer not exist in the codebase.

TestClusterQuerySegmentWalker(
Map<String, VersionedIntervalTimeline<String, ReferenceCountingSegment>> timelines,
@Named(TIMELINES_KEY) Map<String, VersionedIntervalTimeline<String, ReferenceCountingSegment>> timelines,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "good" usage of a named annotation is when it is completely encapsulated inside of a singular module (i.e. there's a binding to the named item and then a provider method in the same module that depends on the naming). This is because it's too easy for named things to proliferate and it becomes hard to figure out how things are actually being setup.

In this case, it's unclear to me why you need the annotation at all. Though, from looking at it, it looks like you are trying to push a map around to multiple different things and have them all use the same map. In that case, I'd suggest that you create a relatively simple TestSegmentsBroker class which all of the places that would use this map can depend on and then you bind that. This makes it more clear that you have a singular object that multiple things are potentially updating the state of and reading the state from.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in those static methods this map was used to back-communicate...didn't wanted to make them much different ; so just added this Named tag so it will be less likely that it surprises someone :)
by using the constant as the name I was thinking that people interested in its usage could look it up if needed.

I guess sometimes later this will be dropped and probably TestTimelineServerView could take its place?

butfor now I've packed this Map into a TestSegmentsBroker

Comment on lines +50 to +53
ObjectMapper om = objectMapper.copy();
om.registerSubtypes(new NamedType(MyIOConfigType.class, "index_parallel"));
FakeIndexTask indexTask = om.readValue(src, FakeIndexTask.class);
FakeIngestionSpec spec = indexTask.spec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably would've used the object mapper to read it in as a Map<String, Object>, then get on the keys I want to check and use objectMapper.convertValue() to convert into the types that you want. What you've done works as well, but it's a little more difficult to read this code and figure out what structure it wants in the JSON.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went thru a few of those approaches

java code
  public static TestDataSet makeDS(ObjectMapper objectMapper, File src)
  {
    try {
      Map<String, Object> map = objectMapper.readValue(
          src,
          new TypeReference<>(){}
      );
      Map<String, Object> spec = objectMapper.convertValue(
          Preconditions.checkNotNull(map.get("spec"),"spec not specified"),
          new TypeReference<>(){}
      );
      DataSchema dataSchema = objectMapper.convertValue(
          Preconditions.checkNotNull(spec.get("dataSchema"), "dataschema not specified"),
          DataSchema.class
      );
      Map<String, Object> ioConfig = objectMapper.convertValue(
          Preconditions.checkNotNull(spec.get("ioConfig"), "ioConfig not specified"),
          new TypeReference<>(){}
      );
      InputSource inputSource0 = objectMapper.convertValue(
          Preconditions.checkNotNull(ioConfig.get("inputSource"), "inputSource not specified"),
          InputSource.class
      );

      InputFormat inputFormat = objectMapper.convertValue(
          Preconditions.checkNotNull(ioConfig.get("inputFormat"), "inputFormat not specified"),
          InputFormat.class
      );
      InputSource inputSource = relativizeLocalInputSource(
          inputSource0, ProjectPathUtils.PROJECT_ROOT
      );
      TestDataSet dataset = new InputSourceBasedTestDataset(
          dataSchema,
          inputFormat,
          inputSource
      );
      return dataset;
    }
    catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

  public static TestDataSet makeDS3(ObjectMapper objectMapper, File src)
  {
    try {
      Map<String, Object> map = objectMapper.readValue(
          src,
          new TypeReference<>(){}
      );
      Map<String, Object> spec = objectMapper.convertValue(
          Preconditions.checkNotNull(map.get("spec"),"spec not specified"),
          new TypeReference<>(){}
      );
      DataSchema dataSchema = objectMapper.convertValue(
          Preconditions.checkNotNull(spec.get("dataSchema"), "dataschema not specified"),
          DataSchema.class
      );
      ObjectMapper om = objectMapper.copy();
      om.registerSubtypes(new NamedType(MyIOConfigType.class, "index_parallel"));
      MyIOConfigType ioConfig = om.convertValue(
          Preconditions.checkNotNull(spec.get("ioConfig"), "ioConfig not specified"),
          MyIOConfigType.class
      );

      InputSource inputSource = relativizeLocalInputSource(
          ioConfig.inputSource, ProjectPathUtils.PROJECT_ROOT
      );
      TestDataSet dataset = new InputSourceBasedTestDataset(
          dataSchema,
          ioConfig.inputFormat,
          inputSource
      );
      return dataset;
    }
    catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

  public static TestDataSet makeDS2(ObjectMapper objectMapper, File src)
  {
    try {
      Map<String, Object> map = objectMapper.readValue(
          src,
          new TypeReference<>()
          {
          }
      );
      ObjectMapper om = objectMapper.copy();
      om.registerSubtypes(new NamedType(MyIOConfigType.class, "index_parallel"));
      FakeIngestionSpec spec = om
          .convertValue(Preconditions.checkNotNull(map.get("spec"), "spec not specified"), FakeIngestionSpec.class);
      InputSource inputSource = relativizeLocalInputSource(
          spec.getIOConfig().inputSource, ProjectPathUtils.PROJECT_ROOT
      );
      TestDataSet dataset = new InputSourceBasedTestDataset(
          spec.getDataSchema(),
          spec.getIOConfig().inputFormat,
          inputSource
      );
      return dataset;
    }
    catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

  public static TestDataSet makeDS1(ObjectMapper objectMapper, File src)
  {
    try {
      ObjectMapper om = objectMapper.copy();
      om.registerSubtypes(new NamedType(MyIOConfigType.class, "index_parallel"));
      FakeIndexTask indexTask = om.readValue(src, FakeIndexTask.class);
      FakeIngestionSpec spec = indexTask.spec;
      InputSource inputSource = relativizeLocalInputSource(
          spec.getIOConfig().inputSource, ProjectPathUtils.PROJECT_ROOT
      );
      TestDataSet dataset = new InputSourceBasedTestDataset(
          spec.getDataSchema(),
          spec.getIOConfig().inputFormat,
          inputSource
      );
      return dataset;
    }
    catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

There are a few "middle" solutions which still use the MyIoConfigType ; but I think the cleanest is the original one - it might be harder to understand it at first - but if something will be missing later; it will be easier to adapt.
There is also no need to prepare for missing values and such things as jackson will do the usual checks.

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. This will simplify testing a lot. Thanks.

@cryptoe cryptoe merged commit bba25a6 into apache:master Feb 10, 2025
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants