Removing unintuitive coercions #8792
DavidSnider
started this conversation in
Ideas
Replies: 1 comment 3 replies
-
I would argue that for purposes of string concatenation, coercing numeric types to strings is intuitive and is something many languages do. If we could consider keeping that behavior in place it would be a big help, we certainly rely on this heavily in our Hack codebase. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Definition of coercive operation for this context: an operation that changes a value to a different type/representation as part of an operation other than an explicit assignment. Note that this does not require writing back to the original reference.
General plan of action
Obviously don’t do this all at once. Start with when types don’t match (this is trivially true of “unary” operations that always coerce to a specific type). Once that is done, all non-container primitives are handled. On top of that, roll out the changes to each of these sections individually.
Once we get through all of this, we can figure out what exactly we want to do with containers and objects with the same type that get compared using
==
,<
, etc. We don’t need to front-load that solution, but we can discuss it contemporaneously. See the HIP for ==/===.Note that for cases where it is reasonable to have a binary operator where one expression is an int and the other is a float (such as
+
), we choose not to trigger logs.Note that we don’t mention cases where Class pointers and LazyClass pointers are coerced to strings if strings would not be further coerced.
String Concatenation (
.
,.=
, interpolation)Coercions occur when trying to concat/interpolate any of the following types:
null, bool, int, double, object (though not in www), resource
When to log
Log if either the LHS or RHS are not strings. In the case of ConcatN, log if any aren’t strings.
Open question: Should we log if they’re ints, floats? None? Some?
The Fix
For
.
, wrap problematic expression in PHPism_FIXME::stringCastForConcatenation which explicitly casts the result to a string.For
.=
if the RHS is the issue, do the same as for.
. If the LHS is the issue insert a statement right before this that is a simple assignment to stringCastForConcatenation.For interpolation, pull the expression out the interpolation into a temporary variable that is assigned to PHPism_FIXME::stringCastForInterpolation. Then interpolate that.
Bitwise operations (
&
,|
,^
,~
,<<
,>>
) and assignment equivalentsCoercions occur when trying to operate any of the following types:
&
,|
,^
: null, double, string (unless both operands are strings), AnyArray, Object, resource, classptr, lazyclassptr~
: double<<
,>>
: null, double, string, AnyArray, Object, resource, classptr, lazyclassptrWhen to log
For
&
,|
,^
, and assignment versions, log if either side are any of the coercive types and AND it is not the case that both sides are strings/lazyclassptr/classptr.For
~
, log if type is a doubleFor
<<
,>>
, log if type is any of the coercive typesThe Fix
For non-assignment versions, wrap problematic expression in PHPism_FIXME::intCastForBitwiseOp which explicitly casts the result to an int.
For assignment versions, if the RHS is the issue, do the same as for non-assignment versions. If the LHS is the issue add in an assignment in the previous statement that casts the LHS.
Pre/post increment/decrement
note: these are no longer usable within expressions. They should probably just be wholesale replaced with += and -=
This has different applications based on the situation
When to log
Log under the above situations when types change, or
The Fix
For the “pre” case, inject a statement immediately prior to this one that is an assignment to the result of PHPism_FIXME::coerceForIncOrDec. Alternatively, we could replace the pre-inc/dec with a call to something like PHPism_FIXME::coerceForAndIncDec which would accept either an int or a bool/enum to tell it whether to do the inc or dec case.
For the post case, remove the postincrement and inject a statement immediately following this one that is an assignment to the result of PHPism_FIXME::coerceForIncOrDec +/- 1.
coerceForIncOrDec works as follows, based on type of input:
Mathematical operations (
+
,-
,*
,/
,%
,**
) and assignment equivalents+
,-
,*
,/
Coercions occur when trying to operate any of the following types:
null, bool, classptr, lazyclassptr, string, object, resource
When to log
log if either type is one of the coercive types
The Fix
Wrap the expression in PHPism_FIXME::coerceForMath. For the assignment version where we have logs for the LHS, just do this as a simple assignment prior to the use of the operator.
coerceForMath works as follows, based on type of input:
%
Coercions occur when trying to operate any of the following types:
null, double, string (unless both operands are strings), AnyArray, Object, resource, Classptr, Lazyclassptr
When to log
Log if type is any of the coercive types
The Fix
Wrap the expression in PHPism_FIXME::coerceToIntForMod. For the assignment version where we have logs for the LHS, just do this as a simple assignment prior to the use of
%
.coerceToIntForMod is just a universal int cast.
**
Coercions occur when trying to operate any of the following types:
anyArray, null, bool, object, resource, string, classptr, lazyclassptr, funptr
When to log
Log if type is any of the coercive types
The Fix
Wrap the expression in PHPism_FIXME::coerceForPow. For the assignment version where we have logs for the LHS, just do this as a simple assignment prior to the use of
**
.coerceForPow is implemented as follows, based on the type of input:
Comparison operations (
<
,<=
,>
,>=
,<=>
)#define
str_to_num
if string is numeric, convert to int or float. Else, convert to 0#define
class_meth_check
!RuntimeOption::EvalHackArrDVArrs && RO::EvalIsCompatibleClsMethType
Note that for all of the classmeth and funptr stuff, we have to also handle the reified versions.
When coercions happen, based on types of LHS and RHS:
class_meth_check
convert LHS totrue
str_to_num
LHS.str_to_num
.class_meth_check
convert LHS totrue
and RHS tofalse
str_to_num
RHSstr_to_num
RHStrue
and RHS tofalse
class_meth_check
convert LHS totrue
and RHS tofalse
class_meth_check
convert LHS to varrayfalse
and RHS totrue
false
and RHS totrue
class_meth_check
convert LHS tofalse
and RHS totrue
false
and RHS totrue
class_meth_check
convert LHS totrue
and RHS tofalse
false
RuntimeOption::EvalHackArrDVArrs
→ convert LHS to vecfalse
false
class_meth_check
convert LHS tofalse
and RHS totrue
class_meth_check
convert LHS totrue
class_meth_check
convert JHS to varrayRuntimeOption::EvalHackArrDVArrs && RO::EvalIsCompatibleClsMethType
convert RHS to vectrue
and RHS tofalse
str_to
_num
true
true
and RHS tofalse
RuntimeOption::EvalHackArrDVArrs
convert LHS totrue
and RHS tofalse
class_meth_check
convert LHS totrue
and RHS tofalse
true
and RHS tofalse
false
When to log
log whenever a coercion happens, based on the above list
The Fix
Inject explicit casts using PHPism_FIXME::TCastForComp for each necessary T.
Equality operations (
==
,!=
)When coercions happen, based on types of LHS and RHS (and is different than previous section):
When to log
log whenever a coercion happens, based on the above (and the above, above) list.
The Fix
Inject explicit casts using PHPism_FIXME::TCastForEQ for each necessary T.
Switch
The HHBC emitted can contain both the switch/sswitch bytecodes AND the jmp BCs / unoptimized switch info.
Switch opcodes implement == internally effectively, so we’ll probably need to add a new
switcheq
opcode that we can emit for switches and change in line with the internal implementation of the switch opcodes. Alternatively, we can do switch at the exact same time as we do==
.Inside the opcodes, the following conversions happens, based on type of input:
switch
false
→ 0true
→ SPECIAL matches first non-zero intsswitch
both interpreter and jit shell out directly to the implementation of
==
(i.e. tvEqual)When to log
For the opcodes, log when a conversion happens
For the jmp and eq style, either
==
logging handle iteq
but wouldn’t withsame
.I think the first option is best, but that does require coupling the changes to switch with the changes to
==
There are two options for how to deal with the logs.
The Fix
OPTION 1:
For a given type of the case tested against
T
and an expression that will be coercedexpr
, appearing in the switch, wrap the expression with Phpism_FIXME::T
CastIfEQ(expr
,value_that_is_true_with_coercions
).T
CastIfEQ casts the first input to aT
ifexpr === value_that_is_true_with_coercions
.OPTION 2:
if we see a ‘1’ as the value for $x, we trigger a log. We then convert the switch to something like this:
Complications
OPTION 1:
If a switch tests against multiple types (more specifically when multiple cases are
==
each other), we could end up in an infinite loop of wrapping. We should simply fail to autofix when we don’t know that every case in a switch is the same type. Those should be relatively rare and be straightforwardly manually fixed.If the switched expression gets too large, it could become unwieldy, but we can also test against that if it’s a serious concern.
OPTION 2:
With this change, we would still trigger a log when $x is ‘1’, since the first case will match. If we flip the cases, then we have the opposite problem.
The solution here is to teach HHVM that if it hits the path where it would have logged, it should check the immediately following cases (specifically where there is no code between them) and if another matches under ===, don’t trigger a log.
Doing this efficiently will be fun.
Beta Was this translation helpful? Give feedback.
All reactions