Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 23999 - literal suffixes dont mix well with template instan… #15339

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ntrel
Copy link
Contributor

@ntrel ntrel commented Jun 21, 2023

…tiations

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @ntrel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
23999 enhancement literal suffixes dont mix well with template instantiations

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15339"

Copy link
Contributor

@dkorpel dkorpel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a special case that's impossible to express in the lexical grammar.

@ntrel
Copy link
Contributor Author

ntrel commented Jun 21, 2023

@dkorpel Then can it just be a dmd diagnostic error rather than a language error?

@ntrel
Copy link
Contributor Author

ntrel commented Jun 21, 2023

Is the current dangling else error part of the grammar?

Edit: I see that's actually a warning, not an error.

@tim-dlang
Copy link
Contributor

This is a special case that's impossible to express in the lexical grammar.

It could be expressed in the lexical grammar by allowing arbitrary identifiers as a suffix and checking it later:

StringPostfix:
    Identifier

The lexer or another part of the compiler could later verify that the suffix is supported.

@ntrel
Copy link
Contributor Author

ntrel commented Jun 22, 2023

@tim-dlang That might work for strings, but for an integer literal and float literal to use too (to also solve the comment in bugzilla), that would make it ambiguous for a digit character followed by an IdentifierStart token, from the grammar POV.

That said, we already have 2 special case rules, which are both for floating point literals which don't obey maximal munch (see end of this section):
https://dlang.org/spec/lex.html#source_text

Those rules actually change the meaning of the tokens! So we could add a special rule saying:

$(P A $(GLINK StringLiteral), $(GLINK IntegerLiteral) or $(GLINK FloatLiteral)
which ends with an $(GLINK IdentifierStart) character cannot be followed by an 
*IdentifierStart* character at the start of the next token.)

And that rule would only forbid certain patterns rather than redefining them.

I would much rather do that and make it an error, as gcc and clang do. That's because people probably often don't use the -w switch to show warnings.

More importantly, we can never add any new literal suffixes without breaking code if we don't have an error.

@tim-dlang
Copy link
Contributor

@ntrel I don't think it would be ambigous for integer or float literals. The lexer would accept as many identifier characters as possible, but later error on invalid characters.

For comparison, C/C++ use multiple phases in the compiler. First a number is lexed as a pp-number, which allows an arbitrary suffix. A later phase distinguishes between integer-literal and floating-literal, which is more strict for the suffix, but can not split the token.
Multiple phases are not necessary for D, because there is no preprocessor.

But I also think it would be best to just add a special case and make it an error.

@dkorpel
Copy link
Contributor

dkorpel commented Jun 22, 2023

It could be expressed in the lexical grammar by allowing arbitrary identifiers as a suffix and checking it later:

That's pretty smart!

@dkorpel
Copy link
Contributor

dkorpel commented Jun 22, 2023

More importantly, we can never add any new literal suffixes without breaking code if we don't have an error.

Well, you would be breaking code now without adding new literal suffixes. That being said, I'd much rather add an error than a warning. Warnings are bad.

@ntrel ntrel marked this pull request as draft June 22, 2023 20:26
ntrel added a commit to ntrel/dlang.org that referenced this pull request Jun 22, 2023
ntrel added a commit to ntrel/dlang.org that referenced this pull request Jun 22, 2023
This is for dlang/dmd#15339.

I have ignored the ImaginarySuffix FloatLiteral variants,
as they are deprecated.
@ntrel
Copy link
Contributor Author

ntrel commented Jun 22, 2023

@tim-dlang The grammar you linked for floating-literal seems to require either a . or e, in D you can have float literals that don't: 1F - that would conflict with e.g. 1L integer literals. I think I have solved this:
dlang/dlang.org#3646

That grammar change doesn't try to disallow a hex literal from having an identifier immediately following it, because:

  • I would only disallow those if the last digit is a-f, not when it's a decimal digit.
  • It's not as confusing as the string/decimal suffix running into an identifier, as at least you have the leading 0x to remind you it's hex.

So if the spec pull is OK, I need to update this pull to remove the hex literal warning and make the others errors again.

@@ -1972,6 +1972,13 @@ class Lexer
case 'd':
t.postfix = *p;
p++;
// diagnose e.g. `@r"_"dtype var;`
if (!Ccompile && (isidchar(*p) || *p & 0x80))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for *p & 0x80 will also produce a warning for unicode line and paragraph separators. I don't know if anybody uses them, but they are currently allowed: https://dlang.org/spec/lex.html#end_of_line

  Foo!q{foo}c
  c;
  Foo!q{foo}b
  b;

@tim-dlang
Copy link
Contributor

@tim-dlang The grammar you linked for floating-literal seems to require either a . or e, in D you can have float literals that don't: 1F - that would conflict with e.g. 1L integer literals. I think I have solved this: dlang/dlang.org#3646

Yes, 1F makes it more complicated.

That grammar change doesn't try to disallow a hex literal from having an identifier immediately following it, because:

I think it would be more consistent if hex literals behave the same. Consider the following example:

  Foo!0x321On;

Depending on the font, you may not easily see if it is 0x231 On or 0x3210 n, so an error message could be helpful. The same problem exists for normal integer literals. Currently the pull request does not reject 321On.

@ntrel
Copy link
Contributor Author

ntrel commented Jun 23, 2023

@tim-dlang I'm going to focus just on suffixed literals for this pull. Also good point about the unicode whitespace characters, I think I'll just drop the unicode detection.

@ntrel ntrel marked this pull request as ready for review July 10, 2023 20:34
Also allow digit after string postfix or numeric suffix.
This could cause a false positive for unicode line endings.
ntrel added a commit to ntrel/dlang.org that referenced this pull request Jul 12, 2023
This is for dlang/dmd#15339.

I have ignored the ImaginarySuffix FloatLiteral variants,
as they are deprecated.
ntrel added a commit to ntrel/dlang.org that referenced this pull request Jul 14, 2023
This is for dlang/dmd#15339.

I have ignored the ImaginarySuffix FloatLiteral variants,
as they are deprecated.
@ntrel
Copy link
Contributor Author

ntrel commented Jul 14, 2023

This is ready to go now.

@dkorpel
Copy link
Contributor

dkorpel commented Jul 16, 2023

I'll ask what Walter thinks of this

@WalterBright
Copy link
Member

I note that the way this is implemented, we can never add any additional suffix characters. I suggest instead that the check should be for any suffixes that are not valid suffixes.

@@ -1973,6 +1973,13 @@ class Lexer
case 'd':
t.postfix = *p;
p++;
// disallow e.g. `@r"_"dtype var;`
if (!Ccompile && isalpha(*p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's already in not Ccompile land.

if (!Ccompile && isalpha(*p))
{
const loc = loc();
error(loc, "identifier character cannot follow string `%c` postfix without whitespace",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say "invalid suffix character %c", because other syntax is not an issue.

break;
default:
// disallow e.g. `Foo!5Luvar;`
if (!Ccompile && flags >= FLAGS.unsigned && isalpha(*p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need Ccompile check or flags check

continue;
default:
break;
break LIntegerSuffix;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop seems more complicated than necessary

gotSuffix = true;
}
// disallow e.g. `Foo!5fvar;`
if (!Ccompile && gotSuffix && isalpha(*p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think it would be a problem if Ccompile was true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think gotSuffix is needed, just check for invalid suffix alpha

@WalterBright
Copy link
Member

this code is fairly out of sync with the current lexer. Please rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants