Joining Repetion of Rule into One #200
-
Good day, First of all, thank you for a great library, I'm trying to parse a C++ string splitted into multiple lines ex: return "First"
"Second\"with middle quote"
"Third"; Is it possible you parse this to get Thank you very much, |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Multi-part C/C++ string literals like this are relatively easy to parse, there is also an unescape action for C-strings that can be used. What exactly does your input look like, do you need to parse the strings in arbitrary C++ source? PS: It appears that you forgot a backslash before one of the quotation marks, it should read |
Beta Was this translation helpful? Give feedback.
-
Thank you for you're quick response, You're right about the missing slash, the input looks like I've posted, nothing too fance, if I can get the parser to apply escape substitution would be great, otherwise I' can do after. Right now I've come up with, which parses, but gives me multiple lines in the parse tree, I would like to join them and use substitute the escaped characters struct Semicolon : one<';'>
{};
struct Quote : one<'\"'>
{};
struct EscapedQuote : seq<one<'\\'>, Quote>
{};
struct Return : TAO_PEGTL_KEYWORD( "return" )
{};
struct String : star<not_at<Quote>, sor<EscapedQuote, any>>
{};
struct Strings : if_must<Return, star<not_at<Semicolon>, pad<seq<Quote, String, Quote>, space>>, Semicolon>
{}; |
Beta Was this translation helpful? Give feedback.
-
There are multiple ways to achieve what you are looking for, but I can't decide which one is best for you. The fundamental problem, however, is that by default the PEGTL actions as well as the parse tree nodes only reference portions of the input. If the input contains That said, you mentioned that you are working with a parse tree. So here's a small example I wrote which also makes use of only selecting the nodes you are actually interested in and omitting the intermediate nodes: #include <iostream>
#include <string>
#include <tao/pegtl.hpp>
#include <tao/pegtl/contrib/parse_tree.hpp>
#include <tao/pegtl/contrib/parse_tree_to_dot.hpp>
using namespace TAO_PEGTL_NAMESPACE;
namespace example
{
// the grammar
// clang-format off
struct Semicolon : one<';'> {};
struct Quote : one<'\"'> {};
struct EscapedQuote : seq<one<'\\'>, Quote> {};
struct Return : TAO_PEGTL_KEYWORD( "return" ) {};
struct String : star<not_at<Quote>, sor<EscapedQuote, any> > {};
struct Strings : if_must<Return, star<not_at<Semicolon>, pad<seq<Quote, String, Quote>, space> >, Semicolon> {};
// clang-format on
template< typename Rule >
using MySelector = tao::pegtl::parse_tree::selector<
Rule,
tao::pegtl::parse_tree::store_content::on< any, String >,
tao::pegtl::parse_tree::remove_content::on< Strings, Semicolon, EscapedQuote, Return > >;
} // namespace example
int main( int argc, char** argv )
{
if( argc != 2 ) {
std::cerr << "Usage: " << argv[ 0 ] << " EXPR\n"
<< "Generate a 'dot' file from expression.\n\n"
<< "Example: " << argv[ 0 ] << " \"(2*a + 3*b) / (4*n)\" | dot -Tpng -o parse_tree.png\n";
return 1;
}
argv_input in( argv, 1 );
try {
const auto root = parse_tree::parse< example::Strings, example::MySelector >( in );
parse_tree::print_dot( std::cout, *root );
return 0;
}
catch( const parse_error& e ) {
const auto p = e.positions.front();
std::cerr << e.what() << std::endl
<< in.line_at( p ) << std::endl
<< std::string( p.byte_in_line, ' ' ) << '^' << std::endl;
}
catch( const std::exception& e ) {
std::cerr << e.what() << std::endl;
}
return 1;
} After compilation, you can call it with my_example 'return "foo" "b\"ar";' | dot -Tsvg -o parse_tree.svg && eog parse_tree.svg (well, on my Linux machine, anyways). You can then walk the parse tree and collect the node content one-by-one. It's not efficient, but it should help to get you started. |
Beta Was this translation helpful? Give feedback.
-
This was the piece I was missing, and it makes perfect sense, since the rules just reference the underline buffer, thus we can't remove the "Quote spaces Quote" the connects the two lines. So I'll have to check how to use said convenience helper. Thank you very much for you support |
Beta Was this translation helpful? Give feedback.
There are multiple ways to achieve what you are looking for, but I can't decide which one is best for you. The fundamental problem, however, is that by default the PEGTL actions as well as the parse tree nodes only reference portions of the input. If the input contains
foo\"bar
, we can not reference the stringfoo"bar
. You'll have to assemble it yourself or use a convenience helper.That said, you mentioned that you are working with a parse tree. So here's a small example I wrote which also makes use of only selecting the nodes you are actually interested in and omitting the intermediate nodes: