Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RV64_DYNAREC] Added AVX.0F 10/11 opcode #1967

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

zohanzephyr
Copy link
Contributor

No description provided.

Copy link
Collaborator

@ksco ksco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes. Also, would be great to have clang-formatted for all the newly added RV64 codes, thanks.

(Again, AVX infra for RV64 is not ready yet, things might change dramatically, would be great if you could wait for the infra to finish if you want to get involved. Meanwhile, SSE vector infra is very mature and welcomes contributions for new opcodes).

SSE_LOOP_MV_Q(x3);
if(vex.l) {
GETGY();
GETEY(x2, 0, 16);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this seems not correct, how is the 16 calculated? Also, GETEY seems (very) expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this seems not correct, how is the 16 calculated? Also, GETEY seems (very) expensive.

Because the operation is on 256 bits of data

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be 24. but like I said, the GETEY is too expensive and probably wrong, just reuse the result from GETEX.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running you current GETEY after GETEX will endup running geted 2 times for the opcode, wich might result is reading more bytes from the opcode than actually exist. You should really looks athow it's done on the ARM64 side: geted is only called at the GETEX side, not on GETEY.
Also, with what program or game are you testing your changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running you current GETEY after GETEX will endup running geted 2 times for the opcode, wich might result is reading more bytes from the opcode than actually exist. You should really looks athow it's done on the ARM64 side: geted is only called at the GETEX side, not on GETEY. Also, with what program or game are you testing your changed?

It is also possible to call it only on the GETGX side, the fixedaddress address is the same, but fixedaddress +=16 is correct, offset by 128 to get the value of ymm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running you current GETEY after GETEX will endup running geted 2 times for the opcode, wich might result is reading more bytes from the opcode than actually exist. You should really looks athow it's done on the ARM64 side: geted is only called at the GETEX side, not on GETEY. Also, with what program or game are you testing your changed?

A test demo I wrote myself

#include <stdio.h>

int main() {
float src[8] = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f, 6.0f, 7.0f, 8.0f};
float dst[8] = {0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f};

__asm__ volatile (
    "vmovups (%1), %%ymm0\n"  
    "vmovups %%ymm0, (%0)\n"  
    :                        
    : "r" (dst), "r" (src)  
    : "%ymm0" 
);

printf("Source data: %f %f %f %f %f %f %f %f\n", src[0], src[1], src[2], src[3], src[4], src[5], src[6], src[7]);
printf("Destination data: %f %f %f %f %f %f %f %f\n", dst[0], dst[1], dst[2], dst[3], dst[4], dst[5], dst[6], dst[7]);

return 0;

}
~

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a valid sample code (did you look at the box64 dump to see what is generated code for it?).
Feel free to add a few more opcode and add this the box64 test suite if it makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a valid sample code (did you look at the box64 dump to see what is generated code for it?). Feel free to add a few more opcode and add this the box64 test suite if it makes sense.

Ok, geted should indeed be executed once, I've made the change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a valid sample code (did you look at the box64 dump to see what is generated code for it?). Feel free to add a few more opcode and add this the box64 test suite if it makes sense.

2024-10-28_17-33-11

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the VMOVUPS didn't have SIB or Offset involved in this sample, that's why calling 2* times geted still worked. It would have crash and burn if an offset was involved in the opcode for example. Maybe try to use only 1 single array, instead of two?

(pro tips: use less -R dump.txt to see the dump in all it's colorfull glory)

SSE_LOOP_MV_Q2(x3);
if(vex.l) {
GETGY();
GETEY(x2, 0, 16);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this seems not correct, how is the 16 calculated? Also, GETEY seems (very) expensive.

Comment on lines 59 to 61
#if STEP > 1
static const int8_t mask_shift8[] = { -7, -6, -5, -4, -3, -2, -1, 0 };
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not used.

Yes, it's not being used at the moment

switch(opcode) {

case 0x10:
INST_NAME("VMOVUPS Gx,Ex");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Gx,Ex/Gx, Ex/

if(!vex.l) YMM0(gd);
break;
case 0x11:
INST_NAME("VMOVUPS Ex,Gx");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Ex,Gx/Ex, Gx/

SMREAD(); \
ed = 16; \
addr = geted(dyn, addr, ninst, nextop, &wback, a, x3, &fixedaddress, rex, NULL, I12, D); \
fixedaddress+=16; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do fixedaddress += 16 here, the value will overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do fixedaddress += 16 here, the value will overflow.

When fetching 256 bits of data, it needs to be offset by 128 bits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do fixedaddress += 16 here, the value will overflow.

It won't overflow, I did test it with the demo and the result is correct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure you understand what the i12 is for.

gback = xEmu; \
gdoffset = offsetof(x64emu_t, ymm[gd])

#define GETEY(a, D, I12) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I12 is useless.

INST_NAME("VMOVUPS Gx,Ex");
nextop = F8;
GETGX();
GETEX(x2, 0, 8);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 is not correct, it should be 24.

@ksco
Copy link
Collaborator

ksco commented Oct 29, 2024

this PR is still not correct, the extcache should be kept up to date.

@ptitSeb
Copy link
Owner

ptitSeb commented Nov 1, 2024

@zohanzephyr there are still some changes to be done on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants