-
-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RV64_DYNAREC] Added AVX.0F 10/11 opcode #1967
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested changes. Also, would be great to have clang-format
ted for all the newly added RV64 codes, thanks.
(Again, AVX infra for RV64 is not ready yet, things might change dramatically, would be great if you could wait for the infra to finish if you want to get involved. Meanwhile, SSE vector infra is very mature and welcomes contributions for new opcodes).
SSE_LOOP_MV_Q(x3); | ||
if(vex.l) { | ||
GETGY(); | ||
GETEY(x2, 0, 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this seems not correct, how is the 16 calculated? Also, GETEY
seems (very) expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this seems not correct, how is the 16 calculated? Also,
GETEY
seems (very) expensive.
Because the operation is on 256 bits of data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be 24. but like I said, the GETEY is too expensive and probably wrong, just reuse the result from GETEX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running you current GETEY
after GETEX
will endup running geted
2 times for the opcode, wich might result is reading more bytes from the opcode than actually exist. You should really looks athow it's done on the ARM64 side: geted is only called at the GETEX side, not on GETEY.
Also, with what program or game are you testing your changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running you current
GETEY
afterGETEX
will endup runninggeted
2 times for the opcode, wich might result is reading more bytes from the opcode than actually exist. You should really looks athow it's done on the ARM64 side: geted is only called at the GETEX side, not on GETEY. Also, with what program or game are you testing your changed?
It is also possible to call it only on the GETGX side, the fixedaddress address is the same, but fixedaddress +=16 is correct, offset by 128 to get the value of ymm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running you current
GETEY
afterGETEX
will endup runninggeted
2 times for the opcode, wich might result is reading more bytes from the opcode than actually exist. You should really looks athow it's done on the ARM64 side: geted is only called at the GETEX side, not on GETEY. Also, with what program or game are you testing your changed?
A test demo I wrote myself
#include <stdio.h>
int main() {
float src[8] = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f};
float dst[8] = {0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f};
__asm__ volatile (
"vmovups (%1), %%ymm0\n"
"vmovups %%ymm0, (%0)\n"
:
: "r" (dst), "r" (src)
: "%ymm0"
);
printf("Source data: %f %f %f %f %f %f %f %f\n", src[0], src[1], src[2], src[3], src[4], src[5], src[6], src[7]);
printf("Destination data: %f %f %f %f %f %f %f %f\n", dst[0], dst[1], dst[2], dst[3], dst[4], dst[5], dst[6], dst[7]);
return 0;
}
~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a valid sample code (did you look at the box64 dump to see what is generated code for it?).
Feel free to add a few more opcode and add this the box64 test suite if it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a valid sample code (did you look at the box64 dump to see what is generated code for it?). Feel free to add a few more opcode and add this the box64 test suite if it makes sense.
Ok, geted should indeed be executed once, I've made the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the VMOVUPS didn't have SIB or Offset involved in this sample, that's why calling 2* times geted still worked. It would have crash and burn if an offset was involved in the opcode for example. Maybe try to use only 1 single array, instead of two?
(pro tips: use less -R dump.txt
to see the dump in all it's colorfull glory)
SSE_LOOP_MV_Q2(x3); | ||
if(vex.l) { | ||
GETGY(); | ||
GETEY(x2, 0, 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this seems not correct, how is the 16 calculated? Also, GETEY seems (very) expensive.
#if STEP > 1 | ||
static const int8_t mask_shift8[] = { -7, -6, -5, -4, -3, -2, -1, 0 }; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not used.
Yes, it's not being used at the moment
switch(opcode) { | ||
|
||
case 0x10: | ||
INST_NAME("VMOVUPS Gx,Ex"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Gx,Ex/Gx, Ex/
if(!vex.l) YMM0(gd); | ||
break; | ||
case 0x11: | ||
INST_NAME("VMOVUPS Ex,Gx"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Ex,Gx/Ex, Gx/
SMREAD(); \ | ||
ed = 16; \ | ||
addr = geted(dyn, addr, ninst, nextop, &wback, a, x3, &fixedaddress, rex, NULL, I12, D); \ | ||
fixedaddress+=16; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do fixedaddress += 16
here, the value will overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do
fixedaddress += 16
here, the value will overflow.
When fetching 256 bits of data, it needs to be offset by 128 bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do
fixedaddress += 16
here, the value will overflow.
It won't overflow, I did test it with the demo and the result is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make sure you understand what the i12 is for.
gback = xEmu; \ | ||
gdoffset = offsetof(x64emu_t, ymm[gd]) | ||
|
||
#define GETEY(a, D, I12) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I12
is useless.
INST_NAME("VMOVUPS Gx,Ex"); | ||
nextop = F8; | ||
GETGX(); | ||
GETEX(x2, 0, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 is not correct, it should be 24.
this PR is still not correct, the extcache should be kept up to date. |
@zohanzephyr there are still some changes to be done on this PR. |
No description provided.