-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(java): Chunk by chunk predictive map serialization protocol (WIP) #1722
base: main
Are you sure you want to change the base?
feat(java): Chunk by chunk predictive map serialization protocol (WIP) #1722
Conversation
} | ||
} | ||
} | ||
protected MethodHandle constructor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you format code by mvn spotless:apply
? Those changes can then be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Spotless version 2.41.1 does not support JDK 8, at least JDK version 11 is required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, spotless needs jdk11 now
Hi @Hen1ng , I noticed we created a MapChunkWriter for serialization/deserialization everytime when we serialize a map. This will introduce extra object allocation cose. And I found that we use |
Another thing I found is that we seems not support predict same type in this PR. For example: public static void main(String[] args) {
Map<String, Integer> map =new HashMap<>(20);
for (int i = 0; i < 20; i++) {
map.put("Key"+i, i);
}
Fury fury = Fury.builder().withChunkSerializeMapEnable(true).build();
byte[] result = null;
for (int i = 0; i < 1000000000; i++) {
result = fury.serialize(map);
}
fury.deserialize(result);
} With this PR, we still needs to write key type and value type for every element, could we optimize this in current PR? |
Benchmark (size) (tracking) Mode Cnt Score Error Units generalWrite benchmark compare, benchmark code @State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {
private Fury furyMapChunk;
private Fury fury;
Map<Integer, Integer> map;
MapBean mapBean = new MapBean();
Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
Bean bean;
@Param({"64", "128", "256", "512"})
int size;
@Param({"true", "false"})
boolean tracking = false;
@Setup(Level.Trial)
public void init() {
furyMapChunk =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(true)
.requireClassRegistration(false)
.build();
fury =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(false)
.requireClassRegistration(false)
.build();
map = new HashMap<>(size);
for (int i = 0; i < size; i++) {
map.put(i, i);
beanBBeanBMap.put(new BeanB(), new BeanB());
}
bean = new Bean();
bean.setMap(map);
mapBean.setMap(beanBBeanBMap);
}
@Benchmark
public void testGeneralChunkWrite() {
final byte[] serialize = furyMapChunk.serialize(map);
}
@Benchmark
public void testGeneralWrite() {
final byte[] serialize = fury.serialize(map);
}
} |
Benchmark (size) (tracking) Mode Cnt Score Error Units finalWrite benchmark compare, benchmark code is below @State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {
private Fury furyMapChunk;
private Fury fury;
Map<Integer, Integer> map;
MapBean mapBean = new MapBean();
Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
Bean bean;
@Param({"64", "128", "256", "512"})
int size;
@Param({"true", "false"})
boolean tracking = false;
@Setup(Level.Trial)
public void init() {
furyMapChunk =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(true)
.requireClassRegistration(false)
.build();
fury =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(false)
.requireClassRegistration(false)
.build();
map = new HashMap<>(size);
for (int i = 0; i < size; i++) {
map.put(i, i);
}
bean = new Bean();
bean.setMap(map);
}
@Benchmark
public void testFinalChunkWrite() {
final byte[] serialize = furyMapChunk.serialize(bean);
}
@Benchmark
public void testFinalWrite() {
final byte[] serialize = fury.serialize(bean);
}
}
|
HnBenchmark.testNoFinalGenericChunkWrite 64 true avgt 15 5173.244 ± 551.374 ns/op testNoFinalGenericWrite benchmark compare, benchmark code is below @State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {
private Fury furyMapChunk;
private Fury fury;
Map<Integer, Integer> map;
MapBean mapBean = new MapBean();
Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
Bean bean;
@Param({"64", "128", "256"})
int size;
@Param({"true", "false"})
boolean tracking = false;
@Setup(Level.Trial)
public void init() {
furyMapChunk =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(true)
.requireClassRegistration(false)
.build();
fury =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(false)
.requireClassRegistration(false)
.build();
map = new HashMap<>(size);
for (int i = 0; i < size; i++) {
map.put(i, i);
beanBBeanBMap.put(new BeanB(), new BeanB());
}
bean = new Bean();
bean.setMap(map);
mapBean.setMap(beanBBeanBMap);
}
@Benchmark
public void testNoFinalGenericChunkWrite() {
final byte[] serialize = furyMapChunk.serialize(mapBean);
}
@Benchmark
public void testNoFinalGenericWrite() {
final byte[] serialize = fury.serialize(mapBean);
}
} |
Benchmark (size) (tracking) Mode Cnt Score Error Units @State(Scope.Benchmark)
@BenchmarkMode({Mode.AverageTime})
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 3, time = 5)
@Threads(1)
@Fork(5)
public class HnBenchmark {
private Fury furyMapChunk;
private Fury fury;
Map<Integer, Integer> map;
MapBean mapBean = new MapBean();
Map<BeanB, BeanB> beanBBeanBMap = new HashMap<>();
Bean bean;
@Param({"64", "128", "256"})
int size;
@Param({"true", "false"})
boolean tracking = false;
@Setup(Level.Trial)
public void init() {
furyMapChunk =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(true)
.requireClassRegistration(false)
.build();
fury =
Fury.builder()
.withLanguage(Language.JAVA)
.withRefTracking(tracking)
.withCodegen(false)
.withChunkSerializeMapEnable(false)
.requireClassRegistration(false)
.build();
map = new HashMap<>(size);
for (int i = 0; i < size; i++) {
if ( i == 0) {
map.put(null, i);
beanBBeanBMap.put(null, new BeanB());
continue;
}
if (i == 10) {
map.put(i, null);
beanBBeanBMap.put(new BeanB(), null);
continue;
}
map.put(i, i);
beanBBeanBMap.put(new BeanB(), new BeanB());
}
bean = new Bean();
bean.setMap(map);
mapBean.setMap(beanBBeanBMap);
}
@Benchmark
public void testGeneralChunkWriteWithNull() {
final byte[] serialize = furyMapChunk.serialize(map);
}
@Benchmark
public void testGeneralWriteWithNull() {
final byte[] serialize = fury.serialize(map);
}
} |
The performance are basically same, it's a little unexpected. Could you share some profiling data here? |
|
} | ||
} | ||
|
||
private void javaChunkWriteWithKeySerializers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is too big, jvm jit can't inline such big method, could we split it into multiple smaller methods? especially for less frequent code path?
@@ -1380,6 +1389,10 @@ public Class<? extends Serializer> getDefaultJDKStreamSerializerType() { | |||
return config.getDefaultJDKStreamSerializerType(); | |||
} | |||
|
|||
public boolean isChunkSerializeMapEnabled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In future this will be the only way to serialize map, we'd better not make this as an option. We can add an option in AbstractMapSerializer directly, and remove that option when we support chunk based serialization in JIT mode
What does this PR do?
Implement chunk based map serialization in #925. This pr doesn't provide JIT support, it will be implemented in later PR.
Related issues
Does this PR introduce any user-facing change?
Benchmark