I want to add sse4 opcode but need some help.Thank you! #692
Replies: 24 comments
-
Before trying the dynarec part, I suggest you first implement the interpretor part. It's easier and allow to have a reference build (so inside Now, in your code:
Now you are ready to unrol the opcode. q0 is XMM0, v0 is reg "xmm1" and v1 is r/w "xmm2" |
Beta Was this translation helpful? Give feedback.
-
Thank you! |
Beta Was this translation helpful? Give feedback.
-
That's the tricky part. Unless you specificaly write a unit test (like the one for mmx test), there isn't much thing. You need to run program that use the opcodes, check the behavour is correct. |
Beta Was this translation helpful? Give feedback.
-
@ptitSeb
Does q0 refer to XMM register?Is it 128 bit? |
Beta Was this translation helpful? Give feedback.
-
so, q0, v0 are just int. The point is: ARM NEON have 32 double precision register (d0..d31), that can also be viewed as 16 quad-precision regs (q0..q15). SSE is 16 queq precision register xmm0..xmm15 In the case of Is that more clear? |
Beta Was this translation helpful? Give feedback.
-
@ptitSeb Thank you ptitSeb! |
Beta Was this translation helpful? Give feedback.
-
It's only because of
|
Beta Was this translation helpful? Give feedback.
-
For example, let's say |
Beta Was this translation helpful? Give feedback.
-
@ptitSeb
|
Beta Was this translation helpful? Give feedback.
-
I don't think that will do what you expect. the VMUL will do:
I think this one needs some VTBLX instead |
Beta Was this translation helpful? Give feedback.
-
Again, this is a tricky opcode. Did you write the interpretor version first? |
Beta Was this translation helpful? Give feedback.
-
(because you don't have to write the Dynarec version, the interpretor version is enough to get stuff running. It will be slower, but that's a start) |
Beta Was this translation helpful? Give feedback.
-
I'm sorry because I don't know the interpretor version. |
Beta Was this translation helpful? Give feedback.
-
The interpretor version is in |
Beta Was this translation helpful? Give feedback.
-
@ptitSeb
And some interpretor versions that I don't know how to write dynarec version,like this:
Are they right?Thank you! |
Beta Was this translation helpful? Give feedback.
-
For the first block:
For the second block, my understanding is that the opcode should be indeed
but again, that cannot be put in the simple multiply you put. |
Beta Was this translation helpful? Give feedback.
-
@ptitSeb |
Beta Was this translation helpful? Give feedback.
-
Help you with what?
After that |
Beta Was this translation helpful? Give feedback.
-
Hi,@ptitSeb
Another problem is that there are crc32c instructions in arm. There is a difference between the crc32c instruction and the x86 instruction, polynomial 0X11EDC6F41 and 0x1EDC6F41. What is the difference between them? |
Beta Was this translation helpful? Give feedback.
-
The m16 or m32 depend on the type of segment selector actualy run from, and the prefix. We are running from 32bits segment here, so it's m32. Unless a 66 or 67 prefix is used. The arm crc32c is not available on every processor, so a test must be done before using. Also, if it's different, it's different, donc use it. |
Beta Was this translation helpful? Give feedback.
-
So,this order is m8 or m32? |
Beta Was this translation helpful? Give feedback.
-
m8 stays m8, no mater the size of the segment selector or 66/67 prefix. |
Beta Was this translation helpful? Give feedback.
-
Thank you! |
Beta Was this translation helpful? Give feedback.
-
Converting this to a discussion. |
Beta Was this translation helpful? Give feedback.
-
Since many programs need to use SSE4, I want to add some operations.
Like this:
How do I judge the value of V0?And how to write a self constructed immediate value or other ways to assign value to V1 .
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions