-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segfaults at EOF for scan_includes, and add token/newline checks #1068
Conversation
Can you give an example of the situations you were up against, where this was a significant issue? Nobody likes this processor, and IMO it ideally wouldn't be used at all, but since we have it, it's kept as simple as possible within reason, in order to make it fast and easier to think about, avoiding weird logic bugs that are later hard to fix as rgbds syntax evolves. The logic you introduce here isn't inmediately obvious to me, making me a bit nervous. |
So, the segfaults themselves can happen because when ptr is set to 0 from the strchr, then the "for statement" increments ptr, making it so if(ptr) is true regrdless. This means that you end up reading from bad addresses. The gotos quit the for loop once nothing can be done. For the include/incbin tokenization, while talking to @Rangi42 about the segfaults themselves, it came to light that having macros like "include_size" (or the opposite) was problematic, so tokenization should fix that. Lastly, I made it so if there is a newline after an incbin/include and before a ", the process of searching for a " is interrupted, as it could be problematic (Rangi was also talking about that when discussing include_size). Hope this clarifies everything! If we were stricter, I should also make it so encountering a ; interrupts the " search process. It is something I could easily add. |
@mid-kid The original minimal fix in Prism was by @aaaaaa123456789 : ptr = strchr(ptr, '\n');
if (!ptr) {
fprintf(stderr, "%s: no newline at end of file\n", filename);
+ free(contents);
+ return;
}
break; And had this explanation:
There was some further discussion about the other changes but I'm not clear on which ones are essential for avoiding segfaults in actual input files, which ones are for the sake of different/better error messages, and which ones are just refactoring. @Lorenzooone maybe you can clarify? Anyway, in theory what we'd want is to use this regex (case-insensitive): # Match INCLUDE or INCBIN with an optional label before it.
# Don't support edge cases like /* */ comments midway, or line continuations,
# or STR*() functions, or triple-quoted strings, or other rgbasm fanciness.
^
\s*
(?:
[A-Z_][A-Z0-9_]*::? # global label with one or two colons
| \.[A-Z_][A-Z0-9_]*(?:\s|::?) # local label with one or two colons or space
| : # anonymous label
)?
\s*
INC(?:LUDE|BIN)
\s*
"([^"]+)" This would avoid bugs on code like this, which tries to include MACRO include_footprint_top
INCBIN \1, 0, 2 * LEN_1BPP_TILE
println "hello world"
ENDM
include_footprint_top "gfx/footprints/bulbasaur.1bpp"
include_footprint_top "gfx/footprints/ivysaur.1bpp"
|
This is the anti-segfaults part: ee66031 (There is also a ++ being converted to a +1, but that's not important) |
@Lorenzooone Thanks for pointing out the multiple commits. I got rid of the stderr reporting for "comment without newline at EOF", "unterminated string", and "INCLUDE/INCBIN without file path", because none of them are strictly relevant to |
Looks good to me! You may also want to have special handling for INC(LUDE/BIN) followed by ; before a ", as comments should be treated in a special way. But I don't think it's a realistic case. Just something which may be good for consistency. |
So, I added back the "not strictly relevant" warning messages along with your new "no file path" one, because I hadn't realized they were already present and want to keep this change minimal.
Should they? Wouldn't this be appropriately covered by the "no file path after INCLUDE" warning?
|
Not with the current logic, from what I can see. As the tokenization just looks at the char right after, not at what it instantly finds. |
Hmm. Can you give an example input file that you think should have this special handling? |
@aaaaaa123456789 first noticed this for comments, but there are multiple.