-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix xs:float text number parse/unparse edge cases #1133
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -651,13 +651,19 @@ object NodeInfo extends Enum { | |
trait PrimNumericFloat extends PrimNumeric { self: Numeric.Kind => | ||
def min: Double | ||
def max: Double | ||
private lazy val minBD = new JBigDecimal(min) | ||
private lazy val maxBD = new JBigDecimal(max) | ||
def minStr: String | ||
def maxStr: String | ||
private lazy val minBD = new JBigDecimal(minStr) | ||
private lazy val maxBD = new JBigDecimal(maxStr) | ||
|
||
def isValid(n: java.lang.Number): Boolean = n match { | ||
case bd: JBigDecimal => { | ||
bd.compareTo(minBD) >= 0 && bd.compareTo(maxBD) <= 0 | ||
} | ||
case bi: JBigInt => { | ||
val bd = new JBigDecimal(bi) | ||
bd.compareTo(minBD) >= 0 && bd.compareTo(maxBD) <= 0 | ||
} | ||
case _ => { | ||
val d = n.doubleValue | ||
(d.isNaN || d.isInfinite) || (d >= min && d <= max) | ||
|
@@ -681,6 +687,10 @@ object NodeInfo extends Enum { | |
protected override def fromNumberNoCheck(n: Number): DataValueFloat = n.floatValue | ||
override val min = -JFloat.MAX_VALUE.doubleValue | ||
override val max = JFloat.MAX_VALUE.doubleValue | ||
// we cannot use min/max.toString for minStr/maxStr because min/max are doubles so | ||
// toString would have a precision different than Float.MaxValue.toString | ||
override val minStr = "-" + JFloat.MAX_VALUE.toString | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See comment below about minStr |
||
override val maxStr = JFloat.MAX_VALUE.toString | ||
override val width: MaybeInt = MaybeInt(32) | ||
} | ||
|
||
|
@@ -698,6 +708,8 @@ object NodeInfo extends Enum { | |
protected override def fromNumberNoCheck(n: Number): DataValueDouble = n.doubleValue | ||
override val min = -JDouble.MAX_VALUE | ||
override val max = JDouble.MAX_VALUE | ||
override val minStr = "-" + JDouble.MAX_VALUE.toString | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason these aren't just min.toString and max.toString? The fact that they are not suggests to me something funny is going on that requires us not to call toString on the min because it's negated or something. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, the scala> new java.math.BigDecimal(Float.MaxValue.doubleValue.toString)
val res0: java.math.BigDecimal = 3.4028234663852886E+38 That value is much smaller than the actual Float max of scala> new java.math.BigDecimal(Float.MaxValue.toString)
val res0: java.math.BigDecimal = 3.4028235E+38 |
||
override val maxStr = JDouble.MAX_VALUE.toString | ||
override val width: MaybeInt = MaybeInt(64) | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -140,8 +140,7 @@ | |
<xs:element name="floatTextFail" type="xs:float" | ||
dfdl:lengthKind="implicit" /> | ||
<xs:element name="floatTextDelim" type="xs:float" | ||
dfdl:representation="text" dfdl:lengthKind="delimited" | ||
dfdl:terminator="%SP;"/> | ||
dfdl:representation="text" dfdl:lengthKind="delimited" /> | ||
|
||
<xs:element name="nonNegIntBin" type="xs:nonNegativeInteger" | ||
dfdl:representation="binary" dfdl:lengthKind="explicit" dfdl:length="{ 16 }" /> | ||
|
@@ -3722,10 +3721,14 @@ | |
</tdml:infoset> | ||
</tdml:parserTestCase> | ||
|
||
<tdml:parserTestCase name="float_text_delim" root="floatTextDelim" | ||
<tdml:parserTestCase name="float_text_max" root="floatTextDelim" | ||
model="SimpleTypes-Embedded.dfdl.xsd" description="Section 5 Simple type-float - DFDL-5-008R"> | ||
|
||
<tdml:document><![CDATA[3.4028235E38]]></tdml:document> | ||
<!-- | ||
We would really prefer to use "3.4028235E38" as the document data (wich would parse exactly the same), | ||
but textNumberPattern is "#,##0.###" so this unparses with all 39 digits plus commas instead of using | ||
an exponent. In order to round trip correctly, we set the document to the more verbose form | ||
--> | ||
<tdml:document><![CDATA[340,282,350,000,000,000,000,000,000,000,000,000,000]]></tdml:document> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had to stare at this a while and go look at GeneralFormatBase to decide this is ok. Insert a comment that this is 39 digits long because the textNumberPattern is "#,###.###" which doesn't allow for use of an exponent, so it has to output all the integer digits. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do. This was needed to roundtrip--it parses correctly since it's lax parsing and supports exponents, but since the textNumberPattern is what you say, it unparses without exponents. |
||
<tdml:infoset> | ||
<tdml:dfdlInfoset> | ||
<floatTextDelim>3.4028235E38</floatTextDelim> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Does constructing BigDecimal(string) work if the string uses exponential notation as in "3.4E38" ?
If so why didn't they provide a constructor that takes a float?
I guess it has to work, or your tests would be failing.
We could bypass all of this if the float is an integer because none of this is required if it's an integer. The issue with double conversion only matters if there are real non-zero fraction digits. That said, I think the only way to tell if a float is an integer is to round it to one and see if it is equal.
Nah. Not worth it. Plus I am not sure the interactions with the number pattern "@" character stuff, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, here's the grammar it supports: https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html#BigDecimal-java.lang.String-
ICU "natively" supports
BigInteger
,BigDecimal
,long
, anddouble
. AnyNumber
that isn't one of those is converted to adouble
(viaNumber.doubleValue
) and formats it using thedouble
logic:https://github.com/unicode-org/icu/blob/main/icu4j/main/core/src/main/java/com/ibm/icu/text/NumberFormat.java#L269-L289
This is fine for int/short/byte since they can all be exactly represented by a double. And it means float is supported, by when a float is converted to a double, more decimal places are sometimes needed when converted to a string. This Java and not ICU, but it shows the same issue more clearly:
So ICU's logic of converting to a double is maybe questionable--you can't just convert a float to a double and expect to get the same string. Note that if you then parse that double string as a float you should get the original float back, so the thing in the infoset should come out the same, but the data doesn't round trip.
I couldn't find a way to convert a float to a double in a way that would result in the same string value (without doing what I did), though it does seem like a better algorithm should exist.
Also, if a user is expecting floats, it probably is less likely integers to appear in the data/infoset, so it might be an optimization that isn't hit that often.
Probably the best thing to do is suggest that if you're dealing with text numbers containing decimals, then just always use xs:double--its much less likely to have precision issues and it avoids this hack and should be more performant.