-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Add support for software floating point in the C-compiler #150
Comments
Hi Michael - I like your idea of a proprietary format a lot and spending 3 words for one FP number might speed up things considerably. Having 32 bits for the mantissa is plenty so that we might want to ignore the standard hidden bit feature, i.e. we could store just the MSB = 1 instead of treating it as implicitly set. This would further simplify things as we would not have to reserve an exponent value like 0 to denote the absolute value zero of the FP number. We also could just say that the mantissa as well as the exponent are two's complement numbers which would further simplify the software implementation. Of course, this is at odds with nearly every FP implementation but the only problem we would have with something like that is that we have to change the C compiler in order to convert FP constants into our format. What do you think? |
I think we should go for a format that makes implementation easy (and fast). With 3 words, we have plenty of bits to use, so storing the mantissa in full 32-bit two's complement seems like a good idea. I would assume changing the C-compilers handling of FP constants is a small task. |
Sounds great, gentlemen :-) |
About the compiler's floating point constant representationFor your convenience, I did the following experiment. I wrote this C program here:
When we are done with our implementation, we would expect this output:
Here is the assembler code that VBCC generates. For you gentlemen to investigate the FP constant handling:
Obviously, our
And it also looks like, the compiler transforms floats to 64-bit before passing them to
Attached the You compile it like this: And due to not having yet implemented the floating point C library functions, this is the output:
|
Dear Mirko, dear Michael - |
As mentioned in our last meeting, adding software emulated floating point is a first step before implementing hardware support. That way we can gauge the speed of the floating point calculations and better evaluate the need for hardware support.
We talked about different floating point formats, and I have here yet another suggestion. The idea is to choose a format that gives reasonable accuracy and avoids unnecessary bit shifts etc. So here goes:
Proposal for floating point format
Examples
In other words:
1.0 <= |Mantissa (real)| < 2.0.
Mantissa (binary) = (Mantissa (real)-1) * 0x8000
for positive numbers.Mantissa (binary) = |Mantissa (real)| * 0x8000
for negative numbers.The value 0 is represented by setting the exponent = 0x0000.
Having 16 bits for the exponent is certainly a luxury, but this avoids some bit shifting.
What do you think? Is this too much? Should we prefer a 32-bit floating point number, where 8 bits are the exponent and 24 bits the mantissa ?
The text was updated successfully, but these errors were encountered: