-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hdfs datasource tasker implementation #91
base: main
Are you sure you want to change the base?
Conversation
src/main/java/com/teragrep/pth_06/ArchiveMicroStreamReader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/ArchiveMicroStreamReader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/ArchiveMicroStreamReader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/ArchiveMicroStreamReader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/ArchiveMicroStreamReader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/HdfsTopicPartitionOffsetMetadata.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/planner/offset/DatasourceOffset.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/planner/offset/HdfsOffset.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/planner/offset/HdfsOffset.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/planner/offset/SerializedDatasourceOffset.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/scheduler/HdfsBatchSliceCollection.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/task/hdfs/HdfsRecordConversion.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/task/HdfsMicroBatchInputPartitionReader.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/task/hdfs/HdfsRecordConversionImpl.java
Outdated
Show resolved
Hide resolved
src/main/java/com/teragrep/pth_06/planner/HdfsQueryProcessor.java
Outdated
Show resolved
Hide resolved
serializedKafkaOffset = new HashMap<>(offset.size()); | ||
for (Map.Entry<TopicPartition, Long> entry : offset.entrySet()) { | ||
|
||
serializedKafkaOffset.put(entry.getKey().toString(), entry.getValue()); // offset | ||
} | ||
this.stub = stub; | ||
} | ||
|
||
public KafkaOffset(String s) { | ||
Gson gson = new Gson(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be moved up and use primary constructor with this
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map<TopicPartition, Long> is not serializable which is why it is converted to serializable Map<String, Long> in the constructor. And because of this logic being inside the constructors the KafkaOffset(String s) which is used by serialization testing can't be simply refactored to a secondary constructor.
The code must be refactored so the conversion to serializable map is done before the object is initialized, that way the constructor can be refactored into a secondary constructor.
Solved identical issue of HdfsOffset in commit: b928695
Solved in commit: cc07931
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this potentially use a decorator object or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class extends the abstract class Offset and implements the interface Serializable. AFAIK, because the class is serializable it can't encapsulate the Map<TopicPartition, Long> and the same goes for decorators.
It would be possible to use transient keyword with the Map<TopicPartition, Long> which makes serialization ignore the class field. This way Map<TopicPartition, Long> could be used as input parameter for the constructor, and a new method can be added for converting the Map<TopicPartition, Long> into a Map<String, Long> before the serialization is done. That way at least the code duplication for converting the map to serializable format would be solved.
Implemented serializable Map conversion to HdfsOffset and KafkaOffset in commit 014cc64
…a and HdfsRecordConversionImpl.java to make all objects immutable. Time based inclusion refactoring is WIP.
…erializedDatasourceOffset.java to reduce the amount of if-statements.
…fset.java to remove null usages. Fixed error in initialOffset() method.
…empty BatchSliceCollection. Replaced initialization of BatchSliceCollection as null with StubBatchSliceCollection.
… inclusion functionality.
…tead of separate clear() method. Removed clear() from AvroRead interface.
…ructor. Refactored KafkaOffset constructors.
…ded missing hashCode() overrides and other minor fixes.
…ded missing default keywords to switches.
…factored Exceptions to conform to Checkstyle standards.
…ap conversion to HdfsOffset and KafkaOffset, solving code duplication of the conversion. Implemented additional OffsetInterface.java for HdfsOffset and KafkaOffset.
5d59cdf
to
d7550fa
Compare
Rebased again to current main branch. Applied Checkstyle cleanup to the new files from rebase. |
HDFS datasource feature to allow for querying the semi latest data from HDFS. Latest records that are not yet present in HDFS are queried from Kafka, old data is queried from S3 as usual.
The included planner and scheduler components were already reviewed outside GitHub.
Includes:
Missing: