Fix Issue 23999 - literal suffixes dont mix well with template instan… #15339

ntrel · 2023-06-21T12:19:37Z

…tiations

dlang-bot · 2023-06-21T12:19:42Z

Thanks for your pull request and interest in making D better, @ntrel! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close	Bugzilla	Severity	Description
✓	23999	enhancement	literal suffixes dont mix well with template instantiations

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#15339"

dkorpel

This is a special case that's impossible to express in the lexical grammar.

ntrel · 2023-06-21T15:25:49Z

@dkorpel Then can it just be a dmd diagnostic error rather than a language error?

ntrel · 2023-06-21T15:28:21Z

Is the current dangling else error part of the grammar?

Edit: I see that's actually a warning, not an error.

tim-dlang · 2023-06-22T16:17:14Z

This is a special case that's impossible to express in the lexical grammar.

It could be expressed in the lexical grammar by allowing arbitrary identifiers as a suffix and checking it later:

StringPostfix:
    Identifier

The lexer or another part of the compiler could later verify that the suffix is supported.

ntrel · 2023-06-22T17:24:02Z

@tim-dlang That might work for strings, but for an integer literal and float literal to use too (to also solve the comment in bugzilla), that would make it ambiguous for a digit character followed by an IdentifierStart token, from the grammar POV.

That said, we already have 2 special case rules, which are both for floating point literals which don't obey maximal munch (see end of this section):
https://dlang.org/spec/lex.html#source_text

Those rules actually change the meaning of the tokens! So we could add a special rule saying:

$(P A $(GLINK StringLiteral), $(GLINK IntegerLiteral) or $(GLINK FloatLiteral)
which ends with an $(GLINK IdentifierStart) character cannot be followed by an 
*IdentifierStart* character at the start of the next token.)

And that rule would only forbid certain patterns rather than redefining them.

I would much rather do that and make it an error, as gcc and clang do. That's because people probably often don't use the -w switch to show warnings.

More importantly, we can never add any new literal suffixes without breaking code if we don't have an error.

tim-dlang · 2023-06-22T17:54:18Z

@ntrel I don't think it would be ambigous for integer or float literals. The lexer would accept as many identifier characters as possible, but later error on invalid characters.

For comparison, C/C++ use multiple phases in the compiler. First a number is lexed as a pp-number, which allows an arbitrary suffix. A later phase distinguishes between integer-literal and floating-literal, which is more strict for the suffix, but can not split the token.
Multiple phases are not necessary for D, because there is no preprocessor.

But I also think it would be best to just add a special case and make it an error.

dkorpel · 2023-06-22T17:59:30Z

It could be expressed in the lexical grammar by allowing arbitrary identifiers as a suffix and checking it later:

That's pretty smart!

dkorpel · 2023-06-22T18:01:35Z

More importantly, we can never add any new literal suffixes without breaking code if we don't have an error.

Well, you would be breaking code now without adding new literal suffixes. That being said, I'd much rather add an error than a warning. Warnings are bad.

This is for dlang/dmd#15339.

This is for dlang/dmd#15339. I have ignored the ImaginarySuffix FloatLiteral variants, as they are deprecated.

ntrel · 2023-06-22T21:00:13Z

@tim-dlang The grammar you linked for floating-literal seems to require either a . or e, in D you can have float literals that don't: 1F - that would conflict with e.g. 1L integer literals. I think I have solved this:
dlang/dlang.org#3646

That grammar change doesn't try to disallow a hex literal from having an identifier immediately following it, because:

I would only disallow those if the last digit is a-f, not when it's a decimal digit.
It's not as confusing as the string/decimal suffix running into an identifier, as at least you have the leading 0x to remind you it's hex.

So if the spec pull is OK, I need to update this pull to remove the hex literal warning and make the others errors again.

tim-dlang · 2023-06-23T15:32:07Z

compiler/src/dmd/lexer.d

@@ -1972,6 +1972,13 @@ class Lexer
        case 'd':
            t.postfix = *p;
            p++;
+            // diagnose e.g. `@r"_"dtype var;`
+            if (!Ccompile && (isidchar(*p) || *p & 0x80))


Checking for *p & 0x80 will also produce a warning for unicode line and paragraph separators. I don't know if anybody uses them, but they are currently allowed: https://dlang.org/spec/lex.html#end_of_line

Foo!q{foo}c  c; Foo!q{foo}b  b;

tim-dlang · 2023-06-23T16:02:08Z

@tim-dlang The grammar you linked for floating-literal seems to require either a . or e, in D you can have float literals that don't: 1F - that would conflict with e.g. 1L integer literals. I think I have solved this: dlang/dlang.org#3646

Yes, 1F makes it more complicated.

That grammar change doesn't try to disallow a hex literal from having an identifier immediately following it, because:

I think it would be more consistent if hex literals behave the same. Consider the following example:

  Foo!0x321On;

Depending on the font, you may not easily see if it is 0x231 On or 0x3210 n, so an error message could be helpful. The same problem exists for normal integer literals. Currently the pull request does not reject 321On.

ntrel · 2023-06-23T16:14:42Z

@tim-dlang I'm going to focus just on suffixed literals for this pull. Also good point about the unicode whitespace characters, I think I'll just drop the unicode detection.

…tiations

Also allow digit after string postfix or numeric suffix.

This could cause a false positive for unicode line endings.

This is for dlang/dmd#15339. I have ignored the ImaginarySuffix FloatLiteral variants, as they are deprecated.

ntrel · 2023-07-14T18:09:26Z

This is ready to go now.

dkorpel · 2023-07-16T18:23:43Z

I'll ask what Walter thinks of this

WalterBright · 2024-03-21T06:11:37Z

I note that the way this is implemented, we can never add any additional suffix characters. I suggest instead that the check should be for any suffixes that are not valid suffixes.

WalterBright · 2024-03-21T06:12:15Z

compiler/src/dmd/lexer.d

@@ -1973,6 +1973,13 @@ class Lexer
        case 'd':
            t.postfix = *p;
            p++;
+            // disallow e.g. `@r"_"dtype var;`
+            if (!Ccompile && isalpha(*p))


it's already in not Ccompile land.

WalterBright · 2024-03-21T06:13:03Z

compiler/src/dmd/lexer.d

+            if (!Ccompile && isalpha(*p))
+            {
+                const loc = loc();
+                error(loc, "identifier character cannot follow string `%c` postfix without whitespace",


I'd say "invalid suffix character %c", because other syntax is not an issue.

WalterBright · 2024-03-21T06:13:43Z

compiler/src/dmd/lexer.d

+                break;
+            default:
+                // disallow e.g. `Foo!5Luvar;`
+                if (!Ccompile && flags >= FLAGS.unsigned && isalpha(*p))


Don't need Ccompile check or flags check

WalterBright · 2024-03-21T06:14:17Z

compiler/src/dmd/lexer.d

-                continue;
-            default:
-                break;
+                break LIntegerSuffix;


this loop seems more complicated than necessary

WalterBright · 2024-03-21T06:15:16Z

compiler/src/dmd/lexer.d

+            gotSuffix = true;
+        }
+        // disallow e.g. `Foo!5fvar;`
+        if (!Ccompile && gotSuffix && isalpha(*p))


don't think it would be a problem if Ccompile was true

don't think gotSuffix is needed, just check for invalid suffix alpha

WalterBright · 2024-03-21T06:16:26Z

this code is fairly out of sync with the current lexer. Please rebase.

dlang-bot added the Severity:Enhancement label Jun 21, 2023

dkorpel requested changes Jun 21, 2023

View reviewed changes

ntrel force-pushed the string-postfix-ident branch from 3bf4767 to e12bd3d Compare June 21, 2023 15:54

ntrel requested a review from dkorpel June 21, 2023 15:55

ntrel marked this pull request as draft June 22, 2023 20:26

ntrel added a commit to ntrel/dlang.org that referenced this pull request Jun 22, 2023

[spec/lex] Optional IdentifierStart after string/numeric literal suffix

a41afcf

This is for dlang/dmd#15339.

ntrel added a commit to ntrel/dlang.org that referenced this pull request Jun 22, 2023

[spec/lex] Optional IdentifierStart after string/numeric literal suffix

8d9bb15

This is for dlang/dmd#15339. I have ignored the ImaginarySuffix FloatLiteral variants, as they are deprecated.

ntrel mentioned this pull request Jun 22, 2023

[spec/lex] Optional IdentifierStart after string/numeric literal suffix dlang/dlang.org#3646

Open

tim-dlang reviewed Jun 23, 2023

View reviewed changes

ntrel force-pushed the string-postfix-ident branch from e12bd3d to 0a2e8d0 Compare June 23, 2023 16:08

dlang-bot added the Review:Needs Rebase label Jun 26, 2023

ntrel force-pushed the string-postfix-ident branch from 0a2e8d0 to 7a8459c Compare July 10, 2023 16:42

dlang-bot removed the Review:Needs Rebase label Jul 10, 2023

ntrel marked this pull request as ready for review July 10, 2023 20:34

ntrel added 5 commits July 12, 2023 10:42

Fix Issue 23999 - literal suffixes dont mix well with template instan…

5c4d348

…tiations

Ignore C files

188ac59

Update hex float test

cf700e6

Only error after numeric suffix

d558168

Also allow digit after string postfix or numeric suffix.

Remove unicode detection

ce69adf

This could cause a false positive for unicode line endings.

ntrel force-pushed the string-postfix-ident branch from 9681e65 to ce69adf Compare July 12, 2023 09:42

ntrel added a commit to ntrel/dlang.org that referenced this pull request Jul 12, 2023

[spec/lex] Optional IdentifierStart after string/numeric literal suffix

4b27269

This is for dlang/dmd#15339. I have ignored the ImaginarySuffix FloatLiteral variants, as they are deprecated.

ntrel added a commit to ntrel/dlang.org that referenced this pull request Jul 14, 2023

[spec/lex] Optional IdentifierStart after string/numeric literal suffix

e89fece

This is for dlang/dmd#15339. I have ignored the ImaginarySuffix FloatLiteral variants, as they are deprecated.

dlang-bot added the Review:stalled label Oct 15, 2023

WalterBright reviewed Mar 21, 2024

View reviewed changes

compiler/src/dmd/lexer.d

continue;

default:

break;

break LIntegerSuffix;

Copy link

Member

WalterBright Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loop seems more complicated than necessary

WalterBright reviewed Mar 21, 2024

View reviewed changes

dlang-bot removed the Review:stalled label Mar 21, 2024

dlang-bot added the Review:stalled label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Issue 23999 - literal suffixes dont mix well with template instan… #15339

Fix Issue 23999 - literal suffixes dont mix well with template instan… #15339

ntrel commented Jun 21, 2023

dlang-bot commented Jun 21, 2023

dkorpel left a comment

ntrel commented Jun 21, 2023

ntrel commented Jun 21, 2023 •

edited

Loading

tim-dlang commented Jun 22, 2023

ntrel commented Jun 22, 2023

tim-dlang commented Jun 22, 2023

dkorpel commented Jun 22, 2023

dkorpel commented Jun 22, 2023

ntrel commented Jun 22, 2023

tim-dlang Jun 23, 2023

tim-dlang commented Jun 23, 2023

ntrel commented Jun 23, 2023

ntrel commented Jul 14, 2023

dkorpel commented Jul 16, 2023

WalterBright commented Mar 21, 2024

WalterBright Mar 21, 2024

WalterBright Mar 21, 2024

WalterBright Mar 21, 2024

WalterBright Mar 21, 2024

WalterBright Mar 21, 2024

WalterBright Mar 21, 2024

WalterBright commented Mar 21, 2024

Fix Issue 23999 - literal suffixes dont mix well with template instan… #15339

Are you sure you want to change the base?

Fix Issue 23999 - literal suffixes dont mix well with template instan… #15339

Conversation

ntrel commented Jun 21, 2023

dlang-bot commented Jun 21, 2023

Bugzilla references

Testing this PR locally

dkorpel left a comment

Choose a reason for hiding this comment

ntrel commented Jun 21, 2023

ntrel commented Jun 21, 2023 • edited Loading

tim-dlang commented Jun 22, 2023

ntrel commented Jun 22, 2023

tim-dlang commented Jun 22, 2023

dkorpel commented Jun 22, 2023

dkorpel commented Jun 22, 2023

ntrel commented Jun 22, 2023

tim-dlang Jun 23, 2023

Choose a reason for hiding this comment

tim-dlang commented Jun 23, 2023

ntrel commented Jun 23, 2023

ntrel commented Jul 14, 2023

dkorpel commented Jul 16, 2023

WalterBright commented Mar 21, 2024

WalterBright Mar 21, 2024

Choose a reason for hiding this comment

WalterBright Mar 21, 2024

Choose a reason for hiding this comment

WalterBright Mar 21, 2024

Choose a reason for hiding this comment

WalterBright Mar 21, 2024

Choose a reason for hiding this comment

WalterBright Mar 21, 2024

Choose a reason for hiding this comment

WalterBright Mar 21, 2024

Choose a reason for hiding this comment

WalterBright commented Mar 21, 2024

ntrel commented Jun 21, 2023 •

edited

Loading