You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current LST modelling of whitespace/comments is very awkward, difficult to reason about, and requires use of "padding" classes. The problem arises in a number of areas:
Since currently LSTs do not have a representation for tokens (ex: language specific keywords, semicolons, commas, brackets, etc), when whitespace need to be captured around those parts of syntax, the tree node has to have "additional" Space element to put it. We currently do this by wrapping the whole tree node into JRightPadded or JLeftPadded element which serves no other purpose other then to model this extra whitespace. It is not actually part of a logical "semantic" tree and is a transient node most do not care about. It's presence is why all LSTs expose it's underlying element for all get/with methods, and only give access to it via a "buddy" padding class.
Due to "left-side" whitespace by default approach, certain whitespace / comments end up belonging to nodes they don't logically belong to. Example
vara = 1; // about avarb = 2;
would result in //about b comment being part of white space belonging to var b = 2;, which is clearly not what it's logically related to.
Since it is desirable to make LSTs work in multiple languages, this approach causes issues when other languages make use of tokens not accounted for in original language, often preventing their direct reuse and require them to be wrapped in language specific LST just to be able to deal with extra tokens and the requirement around extra white space modelling.
In some cases optional language specific tokens that result in specific keywords and associated whitespace to be emitted. This modeling is "awkward", inconsistent and implicit in rather then explicit in its effects. An example of this is varargs on VariableDeclarations used to model the handling of whitespace around ... token in cases like this void printNumbers(int... numbers). While the type is Space, the rendering of token itself ... is implicit since it is tied if the Space is null. An unsuspecting user may set it to Space.Empty assuming it would not any change in rendering, but implicit logic tied to this argument would cause new token to be rendered, even though it's not explicitly present in LST. This problem often comes up when modelling support for other languages as well where optional tokens and keywords may appear.
Describe the solution you'd like
Make Tree have two Space properties: leadingSpace and trailingSpace. This would clean white space modelling that doesn't require "wrapping" nodes of JLeftPadded and JRightPadded, and remove the need for padding class. It would also be possible to correctly associate white space to it's logical node.
Provide explicit support for tokens while making them abstract enough to be applied cross language by adding TokenSet tokens to Tree. Each token would have a semantic meaning (aka Kind) rather then explicit character to express it's intended use in LST composition. For example something that is a semicolon in java may be a comma in C#, yet their intended use is the same - separating multiple repeating sibling LST elements in a list type manner. This would facilitate for lossless transfer of token based information without having to capture it in any "wrapping nodes", and "generic" recipes can work with token information relying on its semantic meaning in expression composition rather then language specific semantics.
Another area where this would work well is ability to model optional, language specific tokens without having to "pollute" other language modelling (like java's) with things that don't appear there. Consider example of a method declaration between java and c#:
Java:
publicvoidMyMethod()
C#:
publicasyncvoidMyMethod()
Here we have async keyword that may appear in c# version. Atm, it requires the whole J.MethodDeclaration to be wrapped in Cs.MethodDeclaration just to allow for this extra keyword capture. However, if Tree contained "grab bag of tokens" such as TokenSet tokens, it can be used to place a Token of type Async (and the related whitespace associated with it).
C# specific LSTs can be augmented with get/with methods to help manipulate this keyword as if it was a property for more user friendly access, yet use the underlying "token grabbag" as the storage mechanism. Ex
What problem are you trying to solve?
Current LST modelling of whitespace/comments is very awkward, difficult to reason about, and requires use of "padding" classes. The problem arises in a number of areas:
Space
element to put it. We currently do this by wrapping the whole tree node intoJRightPadded
orJLeftPadded
element which serves no other purpose other then to model this extra whitespace. It is not actually part of a logical "semantic" tree and is a transient node most do not care about. It's presence is why all LSTs expose it's underlying element for allget/with
methods, and only give access to it via a "buddy" padding class.would result in
//about b
comment being part of white space belonging tovar b = 2;
, which is clearly not what it's logically related to.varargs
onVariableDeclarations
used to model the handling of whitespace around...
token in cases like thisvoid printNumbers(int... numbers)
. While the type isSpace
, the rendering of token itself...
is implicit since it is tied if theSpace
is null. An unsuspecting user may set it toSpace.Empty
assuming it would not any change in rendering, but implicit logic tied to this argument would cause new token to be rendered, even though it's not explicitly present in LST. This problem often comes up when modelling support for other languages as well where optional tokens and keywords may appear.Describe the solution you'd like
Tree
have twoSpace
properties:leadingSpace
andtrailingSpace
. This would clean white space modelling that doesn't require "wrapping" nodes ofJLeftPadded
andJRightPadded
, and remove the need for padding class. It would also be possible to correctly associate white space to it's logical node.TokenSet tokens
toTree
. Each token would have a semantic meaning (akaKind
) rather then explicit character to express it's intended use in LST composition. For example something that is a semicolon in java may be a comma in C#, yet their intended use is the same - separating multiple repeating sibling LST elements in a list type manner. This would facilitate for lossless transfer of token based information without having to capture it in any "wrapping nodes", and "generic" recipes can work with token information relying on its semantic meaning in expression composition rather then language specific semantics.Another area where this would work well is ability to model optional, language specific tokens without having to "pollute" other language modelling (like java's) with things that don't appear there. Consider example of a method declaration between java and c#:
Java:
C#:
Here we have
async
keyword thatmay
appear in c# version. Atm, it requires the wholeJ.MethodDeclaration
to be wrapped inCs.MethodDeclaration
just to allow for this extra keyword capture. However, ifTree
contained "grab bag of tokens" such asTokenSet tokens
, it can be used to place a Token of typeAsync
(and the related whitespace associated with it).C# specific LSTs can be augmented with
get/with
methods to help manipulate this keyword as if it was a property for more user friendly access, yet use the underlying "token grabbag" as the storage mechanism. Ex(
TokenSet.Set
method would remove it from theTokenSet
if value is null, or add/replace existing value)The text was updated successfully, but these errors were encountered: