You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FastDoom could benefit from the optimisation of the modulo % operation. Internally WC generates an idiv operation.
The optimization is similar to the divisions by a constant because it is almost the same thing:
ie:
Modulo 10 for signed ints, according to GCC (fastcall convention)
int Mod10(int x)
mov eax, 1717986919
imul ecx
mov eax, edx
mov edx, ecx
sar edx, 31
sar eax, 2
sub eax, edx
lea edx, [eax+eax*4]
mov eax, ecx
add edx, edx
sub eax, edx
ret
; Unsigned version:
unsigned ModU10(unsigned eax)
mov eax, ecx
mov edx, -858993459
mul edx
mov eax, edx
shr eax, 3
lea edx, [eax+eax*4]
mov eax, ecx
add edx, edx
sub eax, edx
ret
There are very few places where the x % constant is used and it would almost not affect much performances.
Also there are some missing optimizations: signed and unsigned divisions should be separated and when you are dividing a value you know to be positive (ie: monster damage), then an unsigned variant of the division should be used:
example for unsigned Div10:
; Again fastcall convention:
unsigned Div10u(unsigned x)
mov eax, ecx
mov edx, -858993459
mul edx
mov eax, edx
shr eax, 3
ret
unsigned versions save a few instructions and mul is faster than imul (presumably?)
I simply use GCC with -O2 -march=i386 and -mtune=generic
Those should not be hard to inline with OpenWatcom.
EDIT: I did not check properly all instances of DivXX() clls, maybe you never need unsigned versions...
The text was updated successfully, but these errors were encountered:
For modulo by constants, masking the upper bits accomplishes that for powers of two: 2, 4, 8.... 256, etc. It may be worth it to build a separate lookup table for other values. Multiplication and division on these old, 1970s microprocessors was so slow that some CISCs didn't even bother with the microcode for them, let alone dedicated hardware.
FastDoom could benefit from the optimisation of the modulo
%
operation. Internally WC generates an idiv operation.The optimization is similar to the divisions by a constant because it is almost the same thing:
ie:
There are very few places where the
x % constant
is used and it would almost not affect much performances.Also there are some missing optimizations: signed and unsigned divisions should be separated and when you are dividing a value you know to be positive (ie: monster damage), then an unsigned variant of the division should be used:
example for unsigned Div10:
unsigned versions save a few instructions and mul is faster than imul (presumably?)
I simply use GCC with -O2 -march=i386 and -mtune=generic
Those should not be hard to inline with OpenWatcom.
EDIT: I did not check properly all instances of DivXX() clls, maybe you never need unsigned versions...
The text was updated successfully, but these errors were encountered: