-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add action templates functionality #4345
base: dev
Are you sure you want to change the base?
Conversation
This enables users to write cross-target grammars which make use of actions, by enabling users to provide action templates on the command line. By providing different action templates for each target language, users can provide a different implementation of the action logic for each target. Signed-off-by: David Gregory <2992938+DavidGregory084@users.noreply.github.com>
Signed-off-by: David Gregory <2992938+DavidGregory084@users.noreply.github.com>
4553aa8
to
573908d
Compare
Mmmm... this IS an "entirely new actions language". The fact that it leverages templates makes it look like not (and is elegant), but we'd still have to opine on the syntax and more importantly: the supported set of 'keywords', both characteristics of a language... |
Hmm can you explain what you mean by that? The syntax is that of StringTemplate, and all of the template identifiers used are user-defined - they’re simply supplied as StringTemplate groups. |
Ah I think I understand - I’m not proposing that there is a centrally managed collection of StringTemplate groups shipped with ANTLR - this PR adds a command line option via which users can supply their own templates which they have written themselves. Writing cross-target grammars is a rather advanced use case and I think it’s reasonable to expect users to learn StringTemplate syntax if they want to do that. |
Why StringTemplate?
|
I have a few observations and questions.
|
I think the Fortran90 grammar is actually a good example in itself of the need for some kind of solution to target-agnostic actions. The target-agnostic format described there relies upon In many cases it would be really difficult for users to integrate such a script into their workflow (e.g. when the ANTLR generator is called by a build tool plugin or as part of some other complex workflow).
ST is already a dependency of the ANTLR tool, so I guess to some extent the question of whether to embed it into ANTLR has already been decided. The expansion of ST templates in this PR is completely optional and is only active if the user provides the I suppose the boundary here is the point of template expansion, which happens early in the ANTLR generation process just after parsing the grammar file itself. It's also where the most troubling interactions occur, in terms of showing users error messages relating to the template expansion process.
I think that is a moot point in this implementation as no model binding is performed - the user has to write all of the groups that they want to use in their |
Signed-off-by: David Gregory <2992938+DavidGregory084@users.noreply.github.com>
c89adcc
to
616d764
Compare
Usually, I would say that ST should be supported outside the tool, but after thinking about this a bit, I think you are right. The main reason why it should be supported in the tool is because people would say a grammar that requires further processing shouldn't be called an Antlr grammar. In addition, the grammar could be tagged as requiring a certain version of Antlr. I will try out some examples shortly. |
Semantic predicates are actions, too. It's not expanding template references there, but it is working elsewhere.
|
Ah thanks for trying it out - I will take a look at semantic predicates tomorrow |
Looks like you need to test for SEMPRED in addition to ACTION in Otherwise, I think solution is better than the others we've thought of up to now. It fixes the "p" vs "l" problem, as well as the "this." vs "$this->" vs "self." vs "this->" problem. And it doesn't result in a proliferation of I don't think we have any more non-OO targets. The Go runtime is slammed into something that resembles an OO framework. It just wan't implemented well to begin with. But, ST actions could come in handy with a new target like C. This doesn't take care of the target-specific code in local and return clauses. But, we don't use those features in grammars-v4. You got my vote. @parrt Please consider this PR. It goes a long way to fixing the problems in writing grammars over different targets. |
A similar but more independent approach would be to write the grammar as a template and apply the transforms in order to produce a target specific g4.
Less convenient but does not require any change to antlr4 tool.
|
The problem with writing the grammar as STG-format file is that it is then not an Antlr grammar. The format could not be parsed using the grammars-v4 antlr4 grammar. Existing VSCode or VSIDE extensions would not work. People would not call it an Antlr grammar. The use of the template calls would not be restricted to just the actions. Not a good solution. You could implement this as a separate command-line tool called "antlr4-plus", processing the .g4 as a ST formatted file where ST attributes are referenced only in the actions, the proposed format for the grammar. This tool could parse, then render a normal Antlr4 grammar. VSCode and VSIDE extensions would still work. But, people would not call the grammar an "Antlr4 grammar", but an "Antlr4-plus" grammar. It would be labeled a third-party tool, and any grammars in the format would probably get a new extension, like ".g4plus". The template group file reference is directly listed as an option in the grammar. It's an explicitly listed dependency, which is a much better thing than hiding it. If the grammar option contains Folks are very hesitant to use third-party tools. They won't know how to install, screw up on install. Antlr4-tools is official, and directly referenced in antlr.org, nothing further to do. Nobody trusts my Trash toolkit--years in development, used in grammars-v4, extremely useful. But, it's not listed in antlr.org, won't ever get there and only modified by me. Several other tools that are Antlr based, and they're also not listed there, like XText (although that is Antlr3-based). I'm sure there are others. |
@kaby76 re the template not being a valid grammar, that's a fair point. |
@@ -80,6 +80,7 @@ public class Grammar implements AttributeResolver { | |||
parserOptions.add("TokenLabelType"); | |||
parserOptions.add("tokenVocab"); | |||
parserOptions.add("language"); | |||
parserOptions.add("actionTemplates"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we want to support this as a grammar option, it should be a cmd line option only
@@ -1178,6 +1179,10 @@ public String getLanguage() { | |||
return getOptionString("language"); | |||
} | |||
|
|||
public String getActionTemplates() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
Well, I could solve the explicit dependency by applying a naming standard to the .g4's and .stg's, e.g., Python3Lexer.g4/Python3Lexer.stg and Python3Parser.g4/Python3Parser.stg. I already apply a coding standard for the location of target-specific files over in grammars-v4. It took years to straighten it all out, but all target-specific files are in sub-directories Cpp/, CSharp/, Dart/, Go/, Java/, JavaScript/, PHP/, Python3/, and TypeScript/. So, these .stg's would go into each of the sub-directories. That's fine because that's where the But, why is an equivalent |
All the more that it should be explicitly listed in the grammar file, and not pushed to the build tool. You won't have to change the build tool, except that the grammar now requires a new version of the Antlr tool which people do anyways. |
@kaby76 Sorry to disagree, but that would have to be a new entry in the pom file. I am not supportive of an approach that requires the tool to resolve the stg file path in the first place, and differently for each target. |
First, At least in grammars-v4, nothing in the .g4 would need to be changed per target. Just place the .stg's in the same directory as the .g4's. The command line to generate the target-specific recognizers would still be the same: The target-specific nature of the templates is laid out in the coding standard--the directory structure--after years of arguing about it in grammars-v4. But, as far as I know, the Maven plugin does not work for other targets besides Java. But, for trgen, the target specific files are overlayed over the files where the .g4's reside. If you really want to place the .g4's and .stg's in separate directories, the Antlr tool already has a way to do that with Nothing in the pom.xml would need to change whatsoever other than now adding the |
Signed-off-by: David Gregory <2992938+DavidGregory084@users.noreply.github.com>
I agree that a target-specific option does not make a lot of sense in the grammar file - the only reason that I have done it this way is that the "language" option is already a grammar-level option and "actionTemplates" seems to belong wherever "language" is defined - happy to change it. I expect most folks would provide "actionTemplates" as a command line option. EDIT: I see that @kaby76 has a slightly different pattern in mind but both usage patterns would work with a grammar-level option I think?
Not at the moment - I think that would require changes to the Maven plugin and I haven't looked at that part of the project yet. The Gradle plugin already requires manually manipulating the command line options for any advanced use-cases, e.g. here's the invocation from my current project: tasks.generateGrammarSource {
// See: https://github.com/antlr/antlr4/issues/2335
val outputDir = file("build/generated-src/antlr/main/org/mina_lang/parser")
// the directory must exist or ANTLR bails
doFirst { outputDir.mkdirs() }
arguments = arguments + listOf(
"-visitor",
"-no-listener",
// the lexer tokens file can't be found by the parser without this
"-lib", outputDir.absolutePath,
"-package", "org.mina_lang.parser")
} Having said that, I don't really know which use cases would require providing different arguments to the same build tool? e.g. in my case, I will probably move my grammar to a top-level directory, and symlink it into my Gradle source directories and my npm source directories. Then each build tool will invoke the ANTLR tool with different command line arguments to generate the target that's relevant for that build tool - Java for my Gradle build, TypeScript for my npm build. |
assertEquals( | ||
"State: Generate; \n" + | ||
"error(211): " + grammarFile + ":2:14: error compiling action template: 3:3: invalid character '¢'\n" + | ||
"error(211): " + grammarFile + ":2:14: error compiling action template: 3:0: mismatched input ' ' expecting EOF\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is a little weird - I'm not sure why changing the action template to be multiline triggers this error message from StringTemplate
Thanks for this. I’m all for making this a cmd line only option. We don’t went to embed any target directory logic. Re non compiling parser when the option is missing, it doesn’t sound difficult to detect a string template and throw an error if no template is provided.Envoyé de mon iPhoneLe 29 juil. 2023 à 19:04, Terence Parr ***@***.***> a écrit :
Very interesting and well thought out proposal. Away from my computer, but I am positively inclined to approve something in this nature. So it sounds like the grammar itself would mention the name of a template file, and then the -lib would indicate a directory with a template file. But what is the convention for changing the name of the template according to the language? we definitely don’t want to name the target language inside the grammar file as that defeats the purpose. Seems to me that you might simply want to have a command line option that indicates the specific template file to use, but then it means we don’t need to specify the template file name inside the grammar. But that has the negative that users will be mystified when the code generator does not compile as you point out
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
If you want to declare the .stg/.g4 pairing outside the grammar file--in a bash script, makefile, Maven file, etc--that's your choice. But, please don't force the coding style on me. I would prefer to declare the actionTemplates option in the grammar because it has to be specified somewhere. In my example, there is no mention of target in either of the .g4's: If the pairing of top-level .stg with the .g4 is not specified in the grammar, I'll have to devise a way to know when to add the |
Yes, that's right, It seems that @ericvergnaud is strongly against keeping this as a grammar-level option and @kaby76 is strongly for it, so I hope you can be a tie breaker. 😅 I will note though that I don't see any harm in leaving this at the grammar level if it suits use cases whereby the relative folder structure or |
Since -lib describes an input directory, I’m fine with the tool trying to locate the template file in that directory.But I will strongly oppose any logic where the template file name needs to be inferred from the target.Envoyé de mon iPhoneLe 30 juil. 2023 à 16:17, David Gregory ***@***.***> a écrit :
So it sounds like the grammar itself would mention the name of a template file, and then the -lib would indicate a directory with a template file.
Yes, that's right, actionTemplates is currently a grammar-level option, but note that grammar-level options can always be provided to the ANTLR tool via the command line, i.e. antlr4 -DactionTemplates=Actions.stg.
It seems that @ericvergnaud is strongly against keeping this as a grammar-level option and @kaby76 is strongly for it, so I hope you can be a tie breaker. 😅
I will note though that I don't see any harm in leaving this at the grammar level if it suits use cases whereby the relative folder structure or -lib argument is used to determine where a fixed template file name mentioned in the grammar file is picked up from.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I agree! We need a change for finding the .stg along the -lib path.
|
@parrt this is not ready to merge, but I guess we have your green light to complete the work ? |
I am still waiting for the |
mmm... I think we would need to add support for this in the maven and the IntelliJ plugins at minimal, and antlr-lab |
@kaby76 I asked you a question about this in my last comment:
Which revision are you using? |
I was using the latest. I always use the latest. But of course, mistakes sometimes happen. But not this time. I created a repo (https://github.com/kaby76/do-over-and-over.git) with a script (https://github.com/kaby76/do-over-and-over/blob/main/do-over-and-over.sh) that clones your repo, does a build and run for three different versions.
|
Cool thanks for confirming - I'll take a look as soon as I can! |
@DavidGregory084 hi there. It's been a while. Any chance you will progress this ? |
@DavidGregory084 we'd like to include this feature in ANTLR5. |
Apologies @ericvergnaud I have not had a great deal of time for open source over the past few months. What are the timelines like for ANTLR5? I will try to make some time for this next week! |
The overall timeline is flexible, but this feature will rapidly become critical due to the new architecture. |
ec89478
to
5912bef
Compare
… unit test Signed-off-by: David Gregory <2992938+DavidGregory084@users.noreply.github.com>
5912bef
to
0ae16da
Compare
@ericvergnaud apologies for the delay - @kaby76 was absolutely right about the cause of the issue, but I wanted to write a unit test to exercise the I have incorporated @kaby76's fix now and added a test. Next I will migrate the PR to the antlr5 repo. Will you still accept this PR on antlr4 as well, or should we close it here? |
@DavidGregory084 I wouldn't close it. Once it is successfully implemented in antlr5, its usefulness might be proven well enough for Ter to accept it. |
Opened a PR for the port at antlr/antlr5#51 |
Note, templates could be applied elsewhere in the grammar e.g., |
This is an idea about how to resolve #4067 without developing an entirely new actions language.
This PR enables users to write cross-target grammars which make use of actions, by enabling users to provide action templates as StringTemplate
.stg
group files on the command line.By providing different action templates for each target language, users can provide a different implementation of the action logic for each target.
Java.stg
:Javascript.stg
:The example below is the motivating example for me - I have a grammar that I'd like to use in both a JVM-based compiler and a VS Code extension:
Example.g4
:Thanks to my employer @opencastsoftware for sponsoring this work!