Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base64 streaming encoder #56

Closed
wants to merge 1 commit into from
Closed

Conversation

brianm
Copy link
Contributor

@brianm brianm commented Mar 19, 2018

Implements std::io::Write transforming inputs to base64, for use in io chains. Part of #20

Implementation is based on Golang's

@brianm
Copy link
Contributor Author

brianm commented Mar 19, 2018

Code is kind of rough, but I am kind of new to rust, so am fighting my understanding to get it polished. Help would be hugely appreciated.

I have the Read side as well, and will add it to the PR once we get the write side compiling (with your stricter lint rules, and with the module setup I don't really understand.

@marshallpierce
Copy link
Owner

Thanks; I should be able to take a close look at this in the next couple of days. Perhaps this can re-use ChunkedEncoder, or we can improve one or the other so we don't two similar-ish implementations of encoding partial input.

I'm fine with just getting the Write side sorted in this PR if that's OK with you. It might be easier to experiment with different structures, optimizations, etc if we aren't trying to change both ends. Or, if there's meaningful amounts of code to be shared or something like that, it's fine to do both in one if that's easier. Point is -- I wouldn't stress about squeezing Read into this if it's not a natural fit. Anyway, I'll probably branch off of your branch and tinker with things, some of which may be useful. :)

Food for thought -- we should leave room nomenclature-wise, etc, for the decoding side (Read and Write), too. Perhaps akin to https://docs.rs/flate2/1.0.1/flate2/ and its various submodules?

@brianm
Copy link
Contributor Author

brianm commented Mar 20, 2018

re: just the write side -- works for me. I dropped the read side of my impl in this gist, complete with debug output and lots of room for improvement :-)

@marshallpierce
Copy link
Owner

I've started tinkering in https://github.com/alicemaz/rust-base64/tree/brianm-encoder. I'll add benchmarks and some more tests next.

@marshallpierce
Copy link
Owner

Argh, line endings continue to be a pain. Go's encoder doesn't attempt to handle the line wrapping requirements of things like MIME-flavored base64, so of course this implementation doesn't either. I really regret ever adding them to this library. It enormously complicates things. Noted in https://github.com/marshallpierce/rust-base64-experimental to not do that again...

Anyway for this task I think it's probably acceptable to simply delegate to another Write proxy that just inserts line endings, and bar the usage of line wrapping in configs that are handled by the streaming encoder.

@brianm
Copy link
Contributor Author

brianm commented Mar 22, 2018

We can inspect the config and insert the Write proxy... if I understand correctly.

Btw, learned a ton looking at your cleanups, thank you :-)

@marshallpierce
Copy link
Owner

Happy to help. Feel free to interactively hassle me on irc, etc. :)

If we insert another layer then we couldn't have devirtualized types since the return type of some hypothetical builder would be polymorphic. :'(

I'm considering ripping line wrapping out of Config entirely and instead providing a line wrapping helper method for after-the-fact wrapping (which is what it does for normal encodes anyway, so no perf loss), and a line wrapping Write that could be layered similarly. We're still pre 1.0 and I don't mind breaking things. On the other hand, I have much more disruptive breakage in mind to support optional use of SIMD, so perhaps I should wait given that this library is fairly widely used...

I'm currently kinda stuck thinking about how to handle encoding the leftover 0-2 bytes. flush() feels wrong because the user's expectation of flush() is that it should be more or less harmless to call repeatedly, but it definitely isn't if we're potentially writing padding bytes into the middle of a stream. I suspect that flush() should be a no-op. However, it also feels wrong to implement Drop. What if, for instance, a user created a short lived encoder, used it on an underlying Write for a while, then lets it go out of scope and creates another one? That seems reasonable (i.e. when the user doesn't want to give up the ability to manipulate the Write for too long because our encoder grabs &mut), but if we use Drop to be the signal to write padding, we might be splatting padding where it wasn't needed.

On the other hand, if we have an extra finish_stuff_and_maybe_write_padding_or_whatever(), then inevitably users won't call it appropriately, especially since Rust users are not used to having to manually close resources.

If we do end up using Drop, we would need to have a way to swap out the underlying Write so we can relinquish control, which also sounds messy...

@marshallpierce
Copy link
Owner

Sorry for the delay, was traveling for a while. I'll be back on this in the next couple of days; I'd like to get this shipped in some form soon.

@brianm
Copy link
Contributor Author

brianm commented Mar 31, 2018

No need to apologize to me, I have likewise been busy. I hugely appreciate you paying attention this! Thank you!

@brianm
Copy link
Contributor Author

brianm commented Mar 31, 2018

re: finishing -- I have been of both thoughts. We don't have a close() to attach to, the closest rusty thing I know of is Drop so it speaks to that. OTOH -- flushing a base64 encoder sure seems like it should add the padding -- it is the only way it can flush.

I think I fall on the side of flush doing it, as otherwise we break flush's implied contract (that all buffered bytes get pushed through). We are going to have edge cases of possibly unexpected behavior either way, I think. We just need to document this pretty clearly, and consider panic'ing on write-after-flush.

The alt, of course, is that we make it configurable, which seems like a weak punt.

@marshallpierce
Copy link
Owner

We're kinda damned if we do, damned if we don't.

  1. Write padding on flush()
    • Breaks relatively typical Write usage that flush()s at intermediate points in the stream. I daresay
    • Consistent with flush() docs:
    • Flush this output stream, ensuring that all intermediately buffered contents reach their destination.

    • Panic on >1 flush() calls, or write() after flush()
    • flush() in drop() if not called already
  2. Write padding on drop()
    • The inverse of the above: somewhat counter to docs (we can flush everything but 1 or 2 bytes)
    • Allows multi-flush() code to work
  3. Have a separate flush_padded(), pending bikeshedding on name
    • Don't flush padded leftovers on normal flush()
    • Panic if called >1 or on subsequent write()
    • Call automatically in drop()

It looks like flate2 went with option 3, in spirit at least: https://github.com/alexcrichton/flate2-rs/blob/master/src/gz/write.rs

drop() calls try_finish(), which is public, but flush() does not.

@marshallpierce
Copy link
Owner

I asked Alex about the approach he took in flate2; he said it was fine for synchronous i/o and a pain for async, but didn't have any magical solutions to make async smoother. So, since no further inspiration has struck, I'm favoring option 3.

@brianm
Copy link
Contributor Author

brianm commented Apr 6, 2018

option 3 works, amounts to flush on drop for the naive case, which wil lprobbly cause things to normally work, and folks will look at docs in abnormal case.

@marshallpierce marshallpierce self-assigned this Apr 10, 2018
@marshallpierce
Copy link
Owner

Sorry for the continued delay; I'm really busy but I haven't forgotten about this! The open tab haunts me...

@marshallpierce
Copy link
Owner

I made a bit more progress on this a while back, and also discovered that Java's Base64 class doesn't do line wrapping either, which lends further weight to the idea of ripping that out (at least as a core part of encoding).

@marshallpierce
Copy link
Owner

I realize I'm letting the perfect be the enemy of the good here; hopefully the lack of this isn't ruining anyone's day/week/month.

@brianm
Copy link
Contributor Author

brianm commented Aug 30, 2018

It is okay, I am not blocked, my... much worse implementation solved my immediate problem.

Is their aught I can do to help?

@marshallpierce
Copy link
Owner

Find me an extra 4h/day? :)

@nox
Copy link
Contributor

nox commented Oct 2, 2018

What's the status on this?

@marshallpierce
Copy link
Owner

Embarrassingly delayed? :( The project that's been keeping me busy all summer asks every hour of me that I can give, but it will be done in the next few weeks, and then I can finally knock this off my to do list.

Basically what this needs is more tests, possibly #60 or something like it, and some naming reorganization to make room (ala flate2, for instance) for all 4 read|write x encode|decode options.

@nox
Copy link
Contributor

nox commented Oct 2, 2018

I kinda need this for Servo, do you want me to carry that patch to completion?

@marshallpierce
Copy link
Owner

Having someone actually need this is a good motivator! Can it wait until the weekend? If so, I will do my utmost to get to it then, and if not, you have my blessing to charge ahead. (I've been thinking about this feature off and on for quite a while, and I have some ideas I'd like to flesh out.)

@nox
Copy link
Contributor

nox commented Oct 2, 2018

It can wait for this weekend for sure. I can also review your code if you would like.

@marshallpierce
Copy link
Owner

That'd be excellent, looking forward to it.

@marshallpierce
Copy link
Owner

BTW, do you have thoughts on #60?

@nox
Copy link
Contributor

nox commented Oct 2, 2018

Well, it's a nice number, multiple of 2, 3 and 5.

Joke aside, nope, we use base64 mostly for data: URLs in Servo.

@marshallpierce
Copy link
Owner

Finally making headway on this in #74.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants