how to use the PEGTL parser with a separate lexer? #235
Replies: 20 comments
-
The short answer is that the PEGTL is not currently in a shape to do this. I also couldn't say how deep the problems run since I haven't tried this myself ... yet. We could support you if you want to give it a try, it might only need small tweaks to the core library to make such an input possible, or it might be too much of a hassle and have to wait for the big input layer rewrite. |
Beta Was this translation helpful? Give feedback.
-
I think it should be possible. It is the first time I tried PEGTL, so maybe I don't understand what it all means, but I did some experiments, and looks like it just doesn't increment the iterator now, otherwise it should work: |
Beta Was this translation helpful? Give feedback.
-
Oops, I missed an "&" after the "auto", now the last example works. I will try the rest now and report back. |
Beta Was this translation helpful? Give feedback.
-
@ColinH This is the full example: |
Beta Was this translation helpful? Give feedback.
-
This looks reasonably promising, though there will be pitfalls around many corners. The actions are not called because your |
Beta Was this translation helpful? Give feedback.
-
Here's some minimal code to get the actions with inputs working, too: https://pastebin.com/axqyPxUS |
Beta Was this translation helpful? Give feedback.
-
@d-frey We could check whether we want this in contrib or as an example; even though only partial it appears to be useful for some things. |
Beta Was this translation helpful? Give feedback.
-
I think this could be useful for parsing binary files, like PNG. A funny thing would be to define a token that encapsulates one bit. But probably more useful would be a higher level, like the header and chunks of PNG. Would be a pretty nice looking PNG reader. |
Beta Was this translation helpful? Give feedback.
-
We already do binary parsing with the normal inputs using the rules for binary values or similar, see |
Beta Was this translation helpful? Give feedback.
-
That's nice, but looks too much coupled to bytes, at least for the cbor parser: This concept would be difficult for true bitstream formats like SWF, see page 17 for an example: |
Beta Was this translation helpful? Give feedback.
-
I could test now the token_action_input with a more complex grammar, and it works. But there are 2 problems: When I use
then it works. The other problem is when I try to create a tree and dot file like this:
I'll get an error about a missing type named |
Beta Was this translation helpful? Give feedback.
-
The missing The position is more complicated to fix, that's where "the PEGTL is currently not well prepared for this" comes from. The contents of a position are hard-coded throughout the library although line and column numbers don't make sense when parsing tokens or binary data, at least not in the way they are currently managed. Doing this properly might require a lot of changes to the core library. If you just want to get something running you can jeep your position function that returns some dummy values. Not pretty, but you can at least continue until you get to the next stumbling block - or too annoyed by the position information being wrong. |
Beta Was this translation helpful? Give feedback.
-
I added the using statement, but I get another compile error now. This is the code: |
Beta Was this translation helpful? Give feedback.
-
Adding the |
Beta Was this translation helpful? Give feedback.
-
When trying to compile the pastebin code, I get this error message, with g++ 8.3.0 on Linux:
|
Beta Was this translation helpful? Give feedback.
-
Compile command line: |
Beta Was this translation helpful? Give feedback.
-
The |
Beta Was this translation helpful? Give feedback.
-
Ok, I thought I solved it with the lines 137-140. At least this solved the problem for |
Beta Was this translation helpful? Give feedback.
-
While using a plain pointer to a token as an iterator is enough for other parts of the PEGTL, the parse tree does need more information right now. Specifically, it will try to access a member from the iterator ( |
Beta Was this translation helpful? Give feedback.
-
Here is a simple example:
https://stackoverflow.com/questions/65057971/how-to-use-the-pegtl-parser-with-a-separate-lexer
If I understand it correctly, I need to implement my own input type. I tried it like this:
https://pastebin.com/8rSkwCk1
But this results in 156 lines of compiler error. Is there an easy way to use a vector of my own type as an input for the parser, and specifically how would I do this for my example?
Beta Was this translation helpful? Give feedback.
All reactions