Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Improve whitespace handling #4668

Open
macsux opened this issue Nov 13, 2024 · 0 comments
Open

Proposal: Improve whitespace handling #4668

macsux opened this issue Nov 13, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@macsux
Copy link
Contributor

macsux commented Nov 13, 2024

What problem are you trying to solve?

Current LST modelling of whitespace/comments is very awkward, difficult to reason about, and requires use of "padding" classes. The problem arises in a number of areas:

  • Since currently LSTs do not have a representation for tokens (ex: language specific keywords, semicolons, commas, brackets, etc), when whitespace need to be captured around those parts of syntax, the tree node has to have "additional" Space element to put it. We currently do this by wrapping the whole tree node into JRightPadded or JLeftPadded element which serves no other purpose other then to model this extra whitespace. It is not actually part of a logical "semantic" tree and is a transient node most do not care about. It's presence is why all LSTs expose it's underlying element for all get/with methods, and only give access to it via a "buddy" padding class.
  • Due to "left-side" whitespace by default approach, certain whitespace / comments end up belonging to nodes they don't logically belong to. Example
var a = 1; // about a
var b = 2;

would result in //about b comment being part of white space belonging to var b = 2;, which is clearly not what it's logically related to.

  • Since it is desirable to make LSTs work in multiple languages, this approach causes issues when other languages make use of tokens not accounted for in original language, often preventing their direct reuse and require them to be wrapped in language specific LST just to be able to deal with extra tokens and the requirement around extra white space modelling.
  • In some cases optional language specific tokens that result in specific keywords and associated whitespace to be emitted. This modeling is "awkward", inconsistent and implicit in rather then explicit in its effects. An example of this is varargs on VariableDeclarations used to model the handling of whitespace around ... token in cases like this void printNumbers(int... numbers). While the type is Space, the rendering of token itself ... is implicit since it is tied if the Space is null. An unsuspecting user may set it to Space.Empty assuming it would not any change in rendering, but implicit logic tied to this argument would cause new token to be rendered, even though it's not explicitly present in LST. This problem often comes up when modelling support for other languages as well where optional tokens and keywords may appear.

Describe the solution you'd like

  • Make Tree have two Space properties: leadingSpace and trailingSpace. This would clean white space modelling that doesn't require "wrapping" nodes of JLeftPadded and JRightPadded, and remove the need for padding class. It would also be possible to correctly associate white space to it's logical node.
  • Provide explicit support for tokens while making them abstract enough to be applied cross language by adding TokenSet tokens to Tree. Each token would have a semantic meaning (aka Kind) rather then explicit character to express it's intended use in LST composition. For example something that is a semicolon in java may be a comma in C#, yet their intended use is the same - separating multiple repeating sibling LST elements in a list type manner. This would facilitate for lossless transfer of token based information without having to capture it in any "wrapping nodes", and "generic" recipes can work with token information relying on its semantic meaning in expression composition rather then language specific semantics.

Another area where this would work well is ability to model optional, language specific tokens without having to "pollute" other language modelling (like java's) with things that don't appear there. Consider example of a method declaration between java and c#:

Java:

public void MyMethod()

C#:

public async void MyMethod()

Here we have async keyword that may appear in c# version. Atm, it requires the whole J.MethodDeclaration to be wrapped in Cs.MethodDeclaration just to allow for this extra keyword capture. However, if Tree contained "grab bag of tokens" such as TokenSet tokens, it can be used to place a Token of type Async (and the related whitespace associated with it).

C# specific LSTs can be augmented with get/with methods to help manipulate this keyword as if it was a property for more user friendly access, yet use the underlying "token grabbag" as the storage mechanism. Ex

public Token? AsyncKeyword => this.Tokens.Find(TokenKind.Async);
public MethodDeclaration WithAsyncKeyword(AsyncKeyword? token) => this.Tokens.Set(token);

(TokenSet.Set method would remove it from the TokenSet if value is null, or add/replace existing value)

@macsux macsux added the enhancement New feature or request label Nov 13, 2024
@macsux macsux changed the title Proposal: Improve whitespace and non-syntax related metadata handling Proposal: Improve whitespace handling Nov 13, 2024
@timtebeek timtebeek moved this to Backlog in OpenRewrite Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

1 participant