Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfaults at EOF for scan_includes, and add token/newline checks #1068

Merged
merged 14 commits into from
Aug 14, 2023

Conversation

Lorenzooone
Copy link
Contributor

@Lorenzooone Lorenzooone commented Aug 13, 2023

@aaaaaa123456789 first noticed this for comments, but there are multiple.

@Rangi42 Rangi42 self-requested a review August 13, 2023 11:33
@Lorenzooone Lorenzooone changed the title Fix segfaults at EOF for scan_includes Fix segfaults at EOF for scan_includes + Newline protections Aug 13, 2023
@mid-kid
Copy link
Member

mid-kid commented Aug 14, 2023

Can you give an example of the situations you were up against, where this was a significant issue?

Nobody likes this processor, and IMO it ideally wouldn't be used at all, but since we have it, it's kept as simple as possible within reason, in order to make it fast and easier to think about, avoiding weird logic bugs that are later hard to fix as rgbds syntax evolves. The logic you introduce here isn't inmediately obvious to me, making me a bit nervous.

@Lorenzooone
Copy link
Contributor Author

Lorenzooone commented Aug 14, 2023

So, the segfaults themselves can happen because when ptr is set to 0 from the strchr, then the "for statement" increments ptr, making it so if(ptr) is true regrdless. This means that you end up reading from bad addresses. The gotos quit the for loop once nothing can be done.

For the include/incbin tokenization, while talking to @Rangi42 about the segfaults themselves, it came to light that having macros like "include_size" (or the opposite) was problematic, so tokenization should fix that.

Lastly, I made it so if there is a newline after an incbin/include and before a ", the process of searching for a " is interrupted, as it could be problematic (Rangi was also talking about that when discussing include_size).

Hope this clarifies everything!

If we were stricter, I should also make it so encountering a ; interrupts the " search process. It is something I could easily add.

@Rangi42
Copy link
Member

Rangi42 commented Aug 14, 2023

Can you give an example of the situations you were up against, where this was a significant issue?

@mid-kid The original minimal fix in Prism was by @aaaaaa123456789 :

 			ptr = strchr(ptr, '\n');
 			if (!ptr) {
 				fprintf(stderr, "%s: no newline at end of file\n", filename);
+				free(contents);
+				return;
 			}
 			break;

And had this explanation:

If the last line of a file is a comment, and the file doesn't end in a newline, then strchr(ptr, '\n') returns a null pointer. Since break only breaks out of the switch, not the outer for loop, this leads to undefined behavior in the loop test, since ptr - contents is doing pointer arithmetic on a null pointer.

There was some further discussion about the other changes but I'm not clear on which ones are essential for avoiding segfaults in actual input files, which ones are for the sake of different/better error messages, and which ones are just refactoring. @Lorenzooone maybe you can clarify?

Anyway, in theory what we'd want is to use this regex (case-insensitive):

# Match INCLUDE or INCBIN with an optional label before it.
# Don't support edge cases like /* */ comments midway, or line continuations,
# or STR*() functions, or triple-quoted strings, or other rgbasm fanciness.
^
\s*
(?:
    [A-Z_][A-Z0-9_]*::?           # global label with one or two colons
  | \.[A-Z_][A-Z0-9_]*(?:\s|::?)  # local label with one or two colons or space
  | :                             # anonymous label
)?
\s*
INC(?:LUDE|BIN)
\s*
"([^"]+)"

This would avoid bugs on code like this, which tries to include "Hello world":

MACRO include_footprint_top
  INCBIN \1, 0, 2 * LEN_1BPP_TILE
  println "hello world"
ENDM
  include_footprint_top "gfx/footprints/bulbasaur.1bpp"
  include_footprint_top "gfx/footprints/ivysaur.1bpp"

@Lorenzooone
Copy link
Contributor Author

Lorenzooone commented Aug 14, 2023

This is the anti-segfaults part: ee66031
This alone already fixes them. Both the one found by ax6, and the ones that are in the other switch cases.

(There is also a ++ being converted to a +1, but that's not important)

@Rangi42
Copy link
Member

Rangi42 commented Aug 14, 2023

@Lorenzooone Thanks for pointing out the multiple commits. I got rid of the stderr reporting for "comment without newline at EOF", "unterminated string", and "INCLUDE/INCBIN without file path", because none of them are strictly relevant to scan_includes' functionality. I also did some refactoring, reformatting, and added comments, which I hope clarify the parsing logic.

@Lorenzooone
Copy link
Contributor Author

Lorenzooone commented Aug 14, 2023

Looks good to me! You may also want to have special handling for INC(LUDE/BIN) followed by ; before a ", as comments should be treated in a special way. But I don't think it's a realistic case. Just something which may be good for consistency.

@Rangi42
Copy link
Member

Rangi42 commented Aug 14, 2023

So, I added back the "not strictly relevant" warning messages along with your new "no file path" one, because I hadn't realized they were already present and want to keep this change minimal.

You may also want to have special handling for INC(LUDE/BIN) followed by ; before a ", as comments should be treated in a special way. But I don't think it's a realistic case. Just something which may be good for consistency.

Should they? Wouldn't this be appropriately covered by the "no file path after INCLUDE" warning?

INCLUDE "foo.asm"
SomeLabel: INCLUDE ; lol "this" is commented
INCLUDE "bar.asm"

tools/scan_includes.c Outdated Show resolved Hide resolved
@Rangi42 Rangi42 changed the title Fix segfaults at EOF for scan_includes + Newline protections Fix segfaults at EOF for scan_includes, and add token/newline checks Aug 14, 2023
@Lorenzooone
Copy link
Contributor Author

So, I added back the "not strictly relevant" warning messages along with your new "no file path" one, because I hadn't realized they were already present and want to keep this change minimal.

You may also want to have special handling for INC(LUDE/BIN) followed by ; before a ", as comments should be treated in a special way. But I don't think it's a realistic case. Just something which may be good for consistency.

Should they? Wouldn't this be appropriately covered by the "no file path after INCLUDE" warning?

INCLUDE "foo.asm"
SomeLabel: INCLUDE ; lol "this" is commented
INCLUDE "bar.asm"

Not with the current logic, from what I can see. As the tokenization just looks at the char right after, not at what it instantly finds.

@Rangi42
Copy link
Member

Rangi42 commented Aug 14, 2023

Not with the current logic, from what I can see. As the tokenization just looks at the char right after, not at what it instantly finds.

Hmm. Can you give an example input file that you think should have this special handling?

@Rangi42 Rangi42 merged commit 8e71632 into pret:master Aug 14, 2023
1 check passed
Idain pushed a commit to Idain/pokecrystal that referenced this pull request Sep 11, 2023
Idain pushed a commit to Idain/pokecrystal that referenced this pull request Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants