-
Notifications
You must be signed in to change notification settings - Fork 70
Add RawSource.asFlow transformer #487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
I need some help with updateLegacyAbi. It doesn't work for the |
- Introduced converter from `RawSource` to Kotlin Flows. - Added `StreamingDecoder` interface and its implementation `DelimitingByteStreamDecoder` for processing byte streams with a specified delimiter. - Added tests for decoder, and flow behavior cases.
f21bd51 to
246f77b
Compare
|
(Note: I'm not a maintainer of kotlinx-io, so my thoughts here are mostly based on my experience of working with kotlinx-io in cryptography-kotlin and rsocket-kotlin) Do you mind sharing more use cases apart from It feels like existing inline fun Source.forEachLine(block: (String) -> Unit): Unit = use {
while (true) block(it.readLine() ?: break)
}
val source = Buffer().apply {
writeString("name:Alice\n")
writeString("age:30\n")
writeString("city:NewYork\n")
}
source.forEachLine {
println("received: $it")
}
// or if flow is needed - NOTE: the flow can be consumed only once!!!
flow {
source.forEachLine {
emit(it)
}
}.map { line -> anyTransformation(line) }.collect { transformed ->
println("received: $transformed")
}Or, if you need delimiter-based, it's possible to use // those could be rather easily implemented via `indexOf` inside or outside of kotlinx-io
public fun Source.readUntilDelimiter(delimiter: Char): String? = TODO()
public fun Source.readUntilDelimiter(delimiter: Byte): ByteString? = TODO()
// and `forEach` variant
public fun Source.forEachDelimited(delimiter: Char, block: (String) -> Unit): Unit = use {
while (true) block(it.readUntilDelimiter(delimiter) ?: break)
}
// and use it
Buffer().apply {
writeString("first|second|third|")
}.forEachDelimited('|') {
println("received: $it")
}It's even possible to convert this into In all of those examples (and in the PR), the most problematic part is that |
|
I think maybe for introducing coroutines to kotlinx-io we'd want a sub-library or something that's more integrated with async I/O calls on the OS, and we'd want to introduce something similar to the coroutine I/O types in Ktor. I like the idea of supporting search / transformations for |
JakeWharton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the motivating case is here, but this design feels both too inflexible and too inefficient in its current iteration. I also do not think the core artifact should expose a coroutines dependency.
| decoder: StreamingDecoder<T>, | ||
| readBufferSize: Long = READ_BUFFER_SIZE | ||
| ): Flow<T> = | ||
| channelFlow { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary as opposed to a normal flow { }.
| decoderClosed = true | ||
| } catch (exception: IOException) { | ||
| // IO error: try to emit remaining data, then close with error | ||
| runCatching { decoder.onClose { send(it) } }.onSuccess { decoderClosed = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure closing resources should be conflated with completion, since nothing differentiates the normal case vs. the error case.
| * Defaults to the newline character (`'\n'`). | ||
| */ | ||
| public class DelimitingByteStreamDecoder( | ||
| public val delimiter: Byte = '\n'.code.toByte(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting only a single byte makes this pretty inflexible. It should probably take a ByteString, at the very least.
| if (buffer.size > 0) { | ||
| byteConsumer.invoke(buffer.readByteArray()) | ||
| } | ||
| buffer.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a no-op
| var startIndex = 0 | ||
| for (i in bytes.indices) { | ||
| if (bytes[i] == delimiter) { | ||
| buffer.write(bytes, startIndex, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design of this API causes a lot of data copies which undermines the overall design of the library which is built around avoiding copies.
Here, the flow function would create a ByteArray and copies data out of the segment to invoke this function. This function then copies the data back into a new segment, and then proceeds to create a new ByteArray and copies all the data back out. This is a very inefficient usage of the backing mechanism.
|
First of all, thank you all for feedback! 🤗
As noted in the description, decoders can be quite general and may handle tasks such as converting bytes to strings, applying gzip, buffering, and splitting. This is similar in spirit to Netty’s ByteToMessageDecoder. My goal is to separate reading from the source from the transformation and consumption of the byte stream, each in its own coroutine context, so building pipelines becomes simpler. You’re right that resource usage isn’t fully optimized yet — the Buffer could be used more efficiently. And I agree that this API belongs in a separate artifact so the core doesn’t depend on coroutines. |
eabbb19 to
f09e06c
Compare
Add RawSource.asFlow transformer
RawSourceto Kotlin Flows.StreamingDecoderinterface and its implementationDelimitingByteStreamDecoderfor processing byte streams with a specified delimiter. Inspired by Netty. Decoders can be very generic, e.g. can handle decoding (byte->String/Gzip), splitting, buffering, etc.Example:
Related to: #486, #447, #454, #421