-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable dataset support for quidem tests #17699
Conversation
This reverts commit 821c3d90a34e31347a86433eb5543d5fe8cdf0a5.
This reverts commit 336b7c0.
server/src/test/java/org/apache/druid/sql/calcite/util/datasets/NumFoo.java
Fixed
Show fixed
Hide fixed
server/src/test/java/org/apache/druid/sql/calcite/util/datasets/NumFoo.java
Fixed
Show fixed
Hide fixed
return walker; | ||
} | ||
|
||
@Provides | ||
@LazySingleton | ||
public List<TestDataSet> buildCustomTables(ObjectMapper objectMapper, TempDirProducer tdp, |
Check notice
Code scanning / CodeQL
Useless parameter Note test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, not major, but the Guice named annotation thing should probably be addressed. That's a pattern that I'd prefer not exist in the codebase.
TestClusterQuerySegmentWalker( | ||
Map<String, VersionedIntervalTimeline<String, ReferenceCountingSegment>> timelines, | ||
@Named(TIMELINES_KEY) Map<String, VersionedIntervalTimeline<String, ReferenceCountingSegment>> timelines, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only "good" usage of a named annotation is when it is completely encapsulated inside of a singular module (i.e. there's a binding to the named item and then a provider method in the same module that depends on the naming). This is because it's too easy for named things to proliferate and it becomes hard to figure out how things are actually being setup.
In this case, it's unclear to me why you need the annotation at all. Though, from looking at it, it looks like you are trying to push a map around to multiple different things and have them all use the same map. In that case, I'd suggest that you create a relatively simple TestSegmentsBroker
class which all of the places that would use this map can depend on and then you bind that. This makes it more clear that you have a singular object that multiple things are potentially updating the state of and reading the state from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in those static methods this map was used to back-communicate...didn't wanted to make them much different ; so just added this Named
tag so it will be less likely that it surprises someone :)
by using the constant as the name I was thinking that people interested in its usage could look it up if needed.
I guess sometimes later this will be dropped and probably TestTimelineServerView
could take its place?
butfor now I've packed this Map
into a TestSegmentsBroker
ObjectMapper om = objectMapper.copy(); | ||
om.registerSubtypes(new NamedType(MyIOConfigType.class, "index_parallel")); | ||
FakeIndexTask indexTask = om.readValue(src, FakeIndexTask.class); | ||
FakeIngestionSpec spec = indexTask.spec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably would've used the object mapper to read it in as a Map<String, Object>
, then get
on the keys I want to check and use objectMapper.convertValue()
to convert into the types that you want. What you've done works as well, but it's a little more difficult to read this code and figure out what structure it wants in the JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went thru a few of those approaches
java code
public static TestDataSet makeDS(ObjectMapper objectMapper, File src)
{
try {
Map<String, Object> map = objectMapper.readValue(
src,
new TypeReference<>(){}
);
Map<String, Object> spec = objectMapper.convertValue(
Preconditions.checkNotNull(map.get("spec"),"spec not specified"),
new TypeReference<>(){}
);
DataSchema dataSchema = objectMapper.convertValue(
Preconditions.checkNotNull(spec.get("dataSchema"), "dataschema not specified"),
DataSchema.class
);
Map<String, Object> ioConfig = objectMapper.convertValue(
Preconditions.checkNotNull(spec.get("ioConfig"), "ioConfig not specified"),
new TypeReference<>(){}
);
InputSource inputSource0 = objectMapper.convertValue(
Preconditions.checkNotNull(ioConfig.get("inputSource"), "inputSource not specified"),
InputSource.class
);
InputFormat inputFormat = objectMapper.convertValue(
Preconditions.checkNotNull(ioConfig.get("inputFormat"), "inputFormat not specified"),
InputFormat.class
);
InputSource inputSource = relativizeLocalInputSource(
inputSource0, ProjectPathUtils.PROJECT_ROOT
);
TestDataSet dataset = new InputSourceBasedTestDataset(
dataSchema,
inputFormat,
inputSource
);
return dataset;
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public static TestDataSet makeDS3(ObjectMapper objectMapper, File src)
{
try {
Map<String, Object> map = objectMapper.readValue(
src,
new TypeReference<>(){}
);
Map<String, Object> spec = objectMapper.convertValue(
Preconditions.checkNotNull(map.get("spec"),"spec not specified"),
new TypeReference<>(){}
);
DataSchema dataSchema = objectMapper.convertValue(
Preconditions.checkNotNull(spec.get("dataSchema"), "dataschema not specified"),
DataSchema.class
);
ObjectMapper om = objectMapper.copy();
om.registerSubtypes(new NamedType(MyIOConfigType.class, "index_parallel"));
MyIOConfigType ioConfig = om.convertValue(
Preconditions.checkNotNull(spec.get("ioConfig"), "ioConfig not specified"),
MyIOConfigType.class
);
InputSource inputSource = relativizeLocalInputSource(
ioConfig.inputSource, ProjectPathUtils.PROJECT_ROOT
);
TestDataSet dataset = new InputSourceBasedTestDataset(
dataSchema,
ioConfig.inputFormat,
inputSource
);
return dataset;
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public static TestDataSet makeDS2(ObjectMapper objectMapper, File src)
{
try {
Map<String, Object> map = objectMapper.readValue(
src,
new TypeReference<>()
{
}
);
ObjectMapper om = objectMapper.copy();
om.registerSubtypes(new NamedType(MyIOConfigType.class, "index_parallel"));
FakeIngestionSpec spec = om
.convertValue(Preconditions.checkNotNull(map.get("spec"), "spec not specified"), FakeIngestionSpec.class);
InputSource inputSource = relativizeLocalInputSource(
spec.getIOConfig().inputSource, ProjectPathUtils.PROJECT_ROOT
);
TestDataSet dataset = new InputSourceBasedTestDataset(
spec.getDataSchema(),
spec.getIOConfig().inputFormat,
inputSource
);
return dataset;
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
public static TestDataSet makeDS1(ObjectMapper objectMapper, File src)
{
try {
ObjectMapper om = objectMapper.copy();
om.registerSubtypes(new NamedType(MyIOConfigType.class, "index_parallel"));
FakeIndexTask indexTask = om.readValue(src, FakeIndexTask.class);
FakeIngestionSpec spec = indexTask.spec;
InputSource inputSource = relativizeLocalInputSource(
spec.getIOConfig().inputSource, ProjectPathUtils.PROJECT_ROOT
);
TestDataSet dataset = new InputSourceBasedTestDataset(
spec.getDataSchema(),
spec.getIOConfig().inputFormat,
inputSource
);
return dataset;
}
catch (IOException e) {
throw new RuntimeException(e);
}
}
There are a few "middle" solutions which still use the MyIoConfigType
; but I think the cleanest is the original one - it might be harder to understand it at first - but if something will be missing later; it will be easier to adapt.
There is also no need to prepare for missing values and such things as jackson will do the usual checks.
This reverts commit 83ae6ac.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. This will simplify testing a lot. Thanks.
Enables to use
datasets
in druidtest uri-s.The
datasets
should point to a directory - all json files become candidates to be interpeted asingestiontions.
The system will not run a full ingestion - but to simplify usage recognizing an ingestion seemed like the easiest way.
Example:
Locations are relative to the projectroot - for both the
datasets
option and for theLocalInputSource
-es looking for their inputs inside the ingestions.Setting
datasets
to a value supresses the initialization of the componentsupplier's builtin datasets.