Consuming white spaces #134

kjbron · 2018-11-22T09:44:22Z

kjbron
Nov 22, 2018

Hi,

I am trying to parse a very simple language with PEGTL. I think I have found the problem, but don't understand why; white spaces are not ignored. I understand it must be possible to not ignore white space so that indentation-aware languages can also be parsed. But I couldn't find a mechanism to "eat" white spaces by default. Given:

struct kw_enum : tao::pegtl::string<'e', 'n', 'u', 'm'> { };
struct enum_decl : tao::pegtl::seq<kw_enum, tao::pegtl::identifier, tao::pegtl::one<';'>> { };

the following cannot be parsed:

enum thing;

If I add pegtl::space between each token explicitly, then it works. But it would be major burden to do it in the entire grammar.

How can white spaces be ignored/eaten/skipped for the entire grammar, without specifying those explicitly? Do I need a new control class?

Answered by ColinH

Nov 22, 2018

There is no way to generally ignore whitespace as it is one of the inherent properties of the PEG approach to not have a separate tokenisation pass.

I'm also unsure what exactly eating spaces for the "entire grammar" would actually mean; if you change the control class to skip whitespace after every successful rule match you would probably allow whitespace in places you don't really want that.

One could of course use another class template to control after which rules whitespace is to be skipped, but that might not be much easier than to embed the appropriate rules in the grammar.

Let us know if you find a solution that could be of general use, we also haven't really thought much about th…

View full answer

ColinH · 2018-11-22T11:32:56Z

ColinH
Nov 22, 2018
Maintainer

There is no way to generally ignore whitespace as it is one of the inherent properties of the PEG approach to not have a separate tokenisation pass.

I'm also unsure what exactly eating spaces for the "entire grammar" would actually mean; if you change the control class to skip whitespace after every successful rule match you would probably allow whitespace in places you don't really want that.

One could of course use another class template to control after which rules whitespace is to be skipped, but that might not be much easier than to embed the appropriate rules in the grammar.

Let us know if you find a solution that could be of general use, we also haven't really thought much about this question because our usual approach is to embed the whitespace rules throughout the grammar.

For example look at how the wss and wsp rules are used in the taocpp/config grammar.

1 reply

brouhaha Nov 15, 2021

The link seems to be outdated. In case anyone else wants to look at it, I found the files at https://github.com/taocpp/config/tree/main/include/tao/config/internal, with several grammar files including config_grammar.hpp using the wss and wsp rules, and the definitions of those rules in json.hpp

brouhaha · 2021-11-15T21:44:47Z

brouhaha
Nov 15, 2021

pyparsing is also a PEG parser, but supports "ignore" rules whcih can be used for whitespace and comments. I don't know how it prevents the ignore rules from being matched in inappropriate places, such as in the middle of identifiers, keywords, or numeric literals.

1 reply

ColinH Nov 16, 2021
Maintainer

Looking at pyparsing it doesn't look like their approach fits well with the PEGTL as even pyparsing's simple inverse "Hello, World!" example

...will also handle "Hello,World!", "Hello , World !", etc.

And I'm not quite sure how their ignore could be integrated either, they have a parsing engine that includes all the actions which allows them to be customised together with aspects of the grammar while the PEGTL is more of a bottom-up highly modular approach.

It might still be possible to find some way of layering the desired functionality on to of the low-level building blocks of the PEGTL in a way that is compatible with its design philosophy, but it might not be that easy. Any suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consuming white spaces #134

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Consuming white spaces #134

kjbron Nov 22, 2018

Replies: 2 comments · 2 replies

ColinH Nov 22, 2018 Maintainer

brouhaha Nov 15, 2021

brouhaha Nov 15, 2021

ColinH Nov 16, 2021 Maintainer

kjbron
Nov 22, 2018

Replies: 2 comments 2 replies

ColinH
Nov 22, 2018
Maintainer

brouhaha
Nov 15, 2021

ColinH Nov 16, 2021
Maintainer