-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Piping consumer -> producer kafkacat
for binary data
#142
Comments
I have this issue too, and I've found a PR that is supposed to allow for multi byte delimiters on the consumer side (#150). However, it hasn't been merged yet and for me the fork segfaults. |
Hi! Thx for sharing the shell script @mjuric . I might have missed something but for me it's not working.
I just wonder, if the |
It supports multi-byte delimiters in consumer mode, but not in when producing. At least that was the state as of a year ago when I wrote the hack, I'm not sure whether something changed in the meantime? |
Maybe a more general solution to this problem is to allow loading/dumping data in base64 format. It is inefficient, but robust and easy to implement. |
The other approach I was hoping would work would be to use the JSON wrapper, but the Producer doesn't seem to unwrap from it. If I do:
The binary gets encoded into newline delimited JSON messages. But if I then pipe that into a producer with:
The messages on mytopic2 are still JSON encoded. If the producer honored the -J flag, ignoring the topic, partition, offset, and timestamp while producing based on the decoded key and payload, you'd be able to handle binary data accordingly. |
Also, I'm not sure a multicharacter delimiter would be a huge help in my case, as I'm not sure I can identify a sequence of characters that could not appear in my messages. |
It has been implemented on both consumer and producer as |
I have created a pull request for this. Vote for me. #295 |
Bump. It's now 2023 and this is still not possible? It should be a no-brainer. |
Hi,
Sometimes (usually for test purposes) I want to quickly copy a certain number of messages from one topic to another. If the messages are text, I can use
kafkacat
to do something likeThis doesn't seem to work when the messages are binary, though, as any appearance of
\n
in the message byte stream will be interpreted as a delimiter by the producer.I tried changing the delimiter to something unlikely to be found in the bytestream (e.g., an equivalent of
-D "__ENDMESSAGE__"
). The consumer understands this nicely, but it looks like the producer just takes the first character passed with the-D
option, not the full string.My current workaround is to set a delimiter on the consumer end with
-D
, then pipe the output toawk
to write it out into a set of temporary files (one per-message), and then run the producer with the list of those files (see here for the resulting shell script). This requires the files to hit the disk, though, and isn't quite as elegant as it could be.My question -- is there a better/easier way to do this? If not, are there obstacles to extending
-D
in producer mode to take a string as a delimiter, rather than a single character?The text was updated successfully, but these errors were encountered: