-
-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add char[] versions for NumberInput parseFloat, parseDouble, parseBigInteger #1229
Comments
Might make sense, esp. if performance benchmark shows meaningful improvement from avoided allocations (or rather, gc time). Challenges:
Then again, if and when we are moving to default to FastDoubleParser, it could be a win. Would def want to know something about improvement tho. Anyway, +1 for experimentation. |
Yeah. At least in the case of int/long, the JDK has static int parseStringAsInt(JsonParser parser) throws IOException {
if (parser.hasTextCharacters()) {
final int len = parser.getTextLength();
final CharSequence cs = CharBuffer.wrap(parser.getTextCharacters(), parser.getTextOffset(), len);
return Integer.parseInt(cs, 0, len, 10);
} else {
return Integer.parseInt(parser.getText());
}
} A bit surprising the JDK is missing similar methods for Would a PR adding these new methods to |
int/long is sort of less relevant as we implement optimized decoding ourselves and do not rely on JDK (for cases that matter -- parsing int/long from String is not on critical path). As to adding methods... I think performance numbers would be good. Can be stand-along micro-benchmark, no need to be integrated via parser (altho obv integrated one would be the ultimate thing). The reason I am slightly skeptical is that the real heavy part for |
Oh btw, forgot to say -- given that you do have high-performance streaming use case, this is probably quite relevant. So just to make sure: I am not against improvements like this. I just want to emphasize verifying improvements; in this case showing for your actual use case (replica thereof) would be absolutely awesome. I actually have -- interestingly enough -- need for fast |
You can use https://github.com/wrandelshofer/FastDoubleParser directly - It has support for Char Array, Char Sequence and String. We can update NumberInput too but I'd prefer if that was done with perf tests that prove that it improves Jackson performance. I'd prefer not to delay the 2.17 release so any Jackson changes in this area are likely not to be released until 2.18 or beyond. |
One thing I may have missed -- thank you @pjfanning for sort of hinting at -- And yeah, definitely none of changes would go in 2.17 but 2.18 or later. |
I understand there is nuance in assuming "public" means "public API". Looking through some related comments though, rallyhealth/weePickle#117 (comment) @pjfanning seems to suggest that I'm very familiar with https://github.com/wrandelshofer/FastDoubleParser (we use it as part of https://github.com/deephaven/deephaven-csv), it's great :D. Good point though, I can use it directly in the meantime. |
The methods that we have in NumberInput - we can't stop you using them. We are not going to add new methods to NumberInput unless they are useful for the internal Jackson code. |
Exactly, what @pjfanning said: primary user is Jackson itself, and anything that benefits it is potentially good idea. Making it more useful for other use cases is not a priority and could be considered counter-productive (due to added code to maintain, or to limit possibility of changes to API, implementation). So, I don't consider And even more importantly: embedded/shaded |
Thanks for the additional context. Ultimately, I think I'll be making a case for additional methods on /**
* Assumes {@link JsonToken#VALUE_STRING}, parses the string as a <b>float</b>.
*/
public float getStringAsFloat() throws IOException;
// ...
public double getStringAsDouble() throws IOException;
// ...
public BigInteger getStringAsBigInteger() throws IOException;
// ...
public BigDecimal getStringAsBigDecimal() throws IOException; I would assume that Edit: |
There are already |
I've got some of my own quantitative numbers motivating this. I'm essentially seeing an uplift from 14mm doubles / second -> 20mm doubles / second when using |
(Sorry if you saw my previous comment which I deleted, it was incorrect.) I think there's an opportunity to also improve the performance of Essentially, I'm in a position where the JsonParser API lets me parse double strings faster than doubles, which is an odd position to be in. (Unless I'm mistaken and there is a way for me to get |
@devinrsmith I am 100% in favor of ensuring that (for 2.18 branch once we get there) |
Ok. This is awkward: I was assuming This should be fixed, but timeline is such that it's too late for 2.17 -- after rc1 I think there's too much risk for edge case handling to go awry (deferred number handling was added for a reason and although I hope unit test coverage is sufficient I am not confident). |
With #1230 now merged, we could tackle this issue. But looking at Conceptually, there shouldn't be a problem with more eager access: if and when |
Re-created as #1284 to address performance issue; closing this one since I do not think new methods are needed, just better implementations of existing ones. |
@devinrsmith Now that #1284 is complete in So... if it was possible for you to test 2.18.0-SNAPSHOT (from Sonatype snapshot repo or build locally), that'd be great. |
@cowtowncoder Looks great, thanks! Quick test reading 100mm doubles out of an array, it looks like 2.18.0-SNAPSHOT parses doubles without extraneous allocation; went from ~6GiB to <100MiB. |
@devinrsmith Excellent! Thank you for super fast follow-up. |
I've got a high performance streaming use case where I'm handling some of the lower-level details. For example, translating a
VALUE_STRING
into adouble
. I'm relying onNumberInput
with the appropriateuseFastParser
config, and would like to avoid extraneousString
allocation when possible. TheBigDecimal
case is currently the most complete, as it works withJsonParser#hasTextCharacters
:The other similar cases aren't able to take advantage since the underlying
char[]
parsing methods aren't exposed:It would be great to add
char[]
versionsNumberInput#parseFloat
,NumberInput#parseDouble
, andNumberInput#parseBigInteger
.The text was updated successfully, but these errors were encountered: