Skip to content

Commit

Permalink
Merge pull request #40 from jwtowner/develop
Browse files Browse the repository at this point in the history
Merge latest from develop branch
  • Loading branch information
jwtowner authored Jul 4, 2024
2 parents 0005f37 + 3c11ef4 commit b6c08a2
Show file tree
Hide file tree
Showing 15 changed files with 590 additions and 499 deletions.
31 changes: 24 additions & 7 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,42 @@
Checks:
- clang-diagnostic-*
- clang-analyzer-*
- -clang-analyzer-optin.core.EnumCastOutOfRange # interferes with enum bitfield flags
- android-*
- bugprone-*
- cert-*
- -cert-dcl21-cpp
- -cert-dcl21-cpp # this check is deprecated, it is no longer part of the CERT standard
- concurrency-*
- cppcoreguidelines-*
- -cppcoreguidelines-avoid-magic-numbers # revisit after new instruction scheme, maybe only disable for unicode tables
- -cppcoreguidelines-avoid-do-while # if removing do-while does not cause serious performance issues remove this check
- -cppcoreguidelines-avoid-goto # if removing goto does not cause serious performance issues remove this check
- -cppcoreguidelines-pro-bounds-* # requires gsl::at and std::span to suppress, would prefer Standard Library hardening approach
- -cppcoreguidelines-pro-type-union-access # remove after developing new instruction encoding scheme that doesn't use union
- darwin-*
- fuschia-*
- google-*
- -google-build-using-namespace # would require too many invidual using-declarations to satisfy
- -google-readability-braces-around-statements # adversely affects line count
- -google-runtime-int # revisit after new instruction scheme
- hicpp-*
- -hicpp-braces-around-statements
- -hicpp-avoid-goto # if removing goto does not cause serious performance issues remove this check
- -hicpp-braces-around-statements # adversely affects line count
- llvm-namespace-comment
- misc-*
- -misc-include-cleaner
- -misc-include-cleaner # brings in redundant headers that are already included
- modernize-*
- -modernize-use-trailing-return-type
- -modernize-use-constraints # C++20 feature
- -modernize-use-trailing-return-type # stylistic preference, revisit later
- performance-*
- portability-*
- readability-*
- -readability-braces-around-statements
- -readability-identifier-length
- -readability-braces-around-statements # adversely affects line count
- -readability-container-contains # C++20 feature
- -readability-function-cognitive-complexity # grammar::start() and basic_parser::parse() are complex, revisit or suppress only for these functions
- -readability-identifier-length # revisit later
- -readability-magic-numbers # revisit after new instruction scheme, maybe only disable for unicode tables
- -readability-qualified-auto # stylistic preference that unfortunately warns when marking 'auto*' as 'auto* const' or just 'auto const'
WarningsAsErrors: ''
HeaderFileExtensions:
- ''
Expand Down Expand Up @@ -277,7 +294,7 @@ CheckOptions:
misc-header-include-cycle.IgnoredFilesList: ''
misc-include-cleaner.DeduplicateFindings: 'true'
misc-include-cleaner.IgnoreHeaders: ''
misc-non-private-member-variables-in-classes.IgnoreClassesWithAllMemberVariablesBeingPublic: 'false'
misc-non-private-member-variables-in-classes.IgnoreClassesWithAllMemberVariablesBeingPublic: 'true'
misc-non-private-member-variables-in-classes.IgnorePublicMemberVariables: 'false'
misc-throw-by-value-catch-by-reference.CheckThrowTemporaries: 'true'
misc-throw-by-value-catch-by-reference.WarnOnLargeObjects: 'false'
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Unicode Character Database files
tools/ucd/*.txt
tools/ucd/

# User-specific files
*.suo
Expand Down
8 changes: 5 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,31 @@
# Changelog

## Release v0.3.0 (Under Development)
## Release v0.3.0 (July 4, 2024)

* Added list repetition operator `e1 >> e2` to the DSL that is shorthand for `e1 > *(e2 > e1)`.
* Added support for parsing characters and character literals where applicable without explicitly needing to wrap them with `chr()` or `_cx`.
* Symbols now respects `caseless` mode, allowing for case-insensitive matching against symbol definitions.
* Allow for use of variables of all types in attribute bindings and removed the `lug::variable` template class that was used previously. Variable state is automatically saved and restored across rule boundaries.
* Allow for capturing text to a `lug::syntax` object or any string-like object that is convertible from `std::string_view`.
* Allow for capturing text to a `lug::syntax` object or any string-like object that is convertible from `std::string_view`, and renamed `syntax::capture` to `syntax::str` in order to match `std::sub_match::str`.
* Added `lug::source_options::interactive` flag that ignores `eoi` tokens for TTY input sources.
* Rewrote the expression function objects/lambdas as expression template classes. Allows for multiple passes over the expression tree as well as top-down and bottom-up traversal, which was needed when implementing attribute state tracking. This will also allow for additional optimizations to be implemented in the future.
* Renamed `syntactic_capture` to `semantic_capture_action` to reflect that it is executed during the semantic action evaluation phase.
* Make all variations of callables that return a non-void value that can be type-erased by `semantic_action` and `semantic_capture_action` push their result onto the attribute result stack.
* Removed `semantic_response` from the public API as it was only used internally inside of the parser.
* Attempting to bind a variable to a nonexistent value from the attribute result stack now throws an `attribute_stack_error`.
* `implicit_space_rule` no longer causes a compiler warning with Clang, uses RAII to push/pop the thread-local white space rule for grammars.
* Moved `call_depth()`, `prune_depth()` and `escape()` functions into the `lug::environment` class since they are used exclusively during semantic action phase.
* Moved line/column tracking and current match/subject string views to `lug::environment` class, fully removing the environment's dependency on `lug::parser`.
* Turned `lug::parser` into an alias of a new `lug::basic_parser` template class parameterized with an input source strategy. This allows for parsing and capturing of text without making a copy of the input.
* Placed all DSL operator overloads inside of an inline namespace `operators` within `lug::language`. This allows only the operators to be imported into the current scope if desired.
* Enabled `-Wconversion` and `-Wshadow` warnings for Clang and GCC and fixed warnings.
* Full clang-tidy pass on all of the library headers and fixed all warnings.
* Added CMake build support and removed old MSVS solution and vcxproj files.
* Handle situation where compilation with RTTI is disabled.

## Release v0.2.0 (June 21, 2024)

* Implemented new support for context-sensitive grammars with symbol tables and parsing conditions, based on the PEG extensions described in the paper *"A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages"* by Tetsuro Matsumura and Kimio Kuramitsu (2015).
* Implemented new support for context-sensitive grammars with symbol tables and parsing conditions based on the PEG extensions described in the paper *"A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages"* by Tetsuro Matsumura and Kimio Kuramitsu (2015).
* Added an XML Standard 1.0 matcher sample program demonstrating use of symbol tables.
* Finished the BASIC language interpreter sample program, which is now feature complete, using parsing conditions.
* Updated Unicode support to version 15.1.0 and automated Unicode table generation via Makefile build.
Expand Down
18 changes: 9 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
# See LICENSE file for copyright and license details

# distribution version
VERSION = 0.3.0-pre
VERSION = 0.3.0

# paths
PREFIX = /usr/local

# toolchain
CXXSTD = -std=c++17
CXXFLAGS = $(CXXSTD) -pedantic -Wall -Wconversion -Wextra -Wextra-semi -Wshadow -Wsign-conversion -Wsuggest-override -Wno-parentheses -Wno-logical-not-parentheses \
-Os -ffunction-sections -fdata-sections -I. $$(if [ "$(CI_BUILD)" = "1" ]; then echo "-Werror"; fi)
-Os -ffunction-sections -fdata-sections -I.
LDFLAGS = $(CXXSTD) -s
CLANGTIDY = clang-tidy

Expand All @@ -32,8 +32,8 @@ TOOLS = makeunicode
TOOLS_BIN = $(TOOLS:%=tools/%)
TOOLS_OBJ = $(TOOLS:%=tools/%.o)

# dependencies
DEPS = lug/lug.hpp lug/detail.hpp lug/error.hpp lug/unicode.hpp lug/utf8.hpp
# header dependencies
HEADERS = lug/detail.hpp lug/error.hpp lug/unicode.hpp lug/utf8.hpp lug/lug.hpp

# distribution files
DISTFILES = CHANGELOG.md LICENSE.md README.md CMakeLists.txt Makefile runtests.sh .clang-tidy .editorconfig .gitattributes .gitignore .github/ doc/ lug/ samples/ tests/ tools/
Expand All @@ -42,17 +42,17 @@ all: options samples tests

.cpp.o:
@echo CXX $<
@$(CXX) -c $(CXXFLAGS) -o $@ $<
@$(CXX) -c $(CXXFLAGS) $$(if [ "$(CI_BUILD)" = "1" ]; then echo "-Werror"; fi) -o $@ $<

$(SAMPLES_OBJ): $(DEPS)
$(SAMPLES_OBJ): $(HEADERS)

$(SAMPLES_BIN): $(SAMPLES_OBJ)
@echo LD $@
@$(CXX) -o $@ $@.o $(LDFLAGS)

samples: $(SAMPLES_BIN)

$(TESTS_OBJ): $(DEPS)
$(TESTS_OBJ): $(HEADERS)

$(TESTS_BIN): $(TESTS_OBJ)
@echo LD $@
Expand All @@ -64,9 +64,9 @@ check: tests
@sh runtests.sh "tests" $(TESTS_BIN)

lint:
@$(CLANGTIDY) --quiet $(CXXFLAGS:%=--extra-arg=%) lug/detail.hpp
@$(CLANGTIDY) --quiet $(CXXFLAGS:%=--extra-arg=%) $(HEADERS)

$(TOOLS_OBJ): $(DEPS)
$(TOOLS_OBJ): $(HEADERS)

$(TOOLS_BIN): $(TOOLS_OBJ)
@echo LD $@
Expand Down
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,14 @@ A C++ embedded domain specific language for expressing parsers as extended [pars

Features
---
- Natural syntax resembling external parser generator languages.
- Natural syntax resembling external parser generator languages, with support for attributes and semantic actions.
- Ability to handle context-sensitive grammars with symbol tables, conditions and syntactic predicates.
- Generated parsers are compiled to special-purpose bytecode and executed in a virtual parsing machine.
- Clear separation of syntactic and lexical rules, with the ability to customize implicit whitespace skipping.
- Support for direct and indirect left recursion, with precedence levels to disambiguate subexpressions with mixed left/right recursion.
- Extended PEG syntax to include attribute grammars and semantic actions.
- Ability to handle context-sensitive grammars with symbol tables, conditions, and syntactic predicates.
- Full support for UTF-8 text parsing, including Level 1 and partial Level 2 compliance with the UTS #18 Unicode Regular Expressions technical standard.
- Automatic tracking of line and column numbers, with customizable tab width and alignment.
- Header-only library utilizing C++17 language and library features.
- Header-only library utilizing C++17 language and library features. Forward compatible with C++20 and C++23.
- Relatively small with the goal of keeping total line count across all header files under 5000 lines of terse code.

It is based on research introduced in the following papers:
Expand Down Expand Up @@ -69,7 +68,7 @@ Syntax Reference
| One-or-More | `+e` | Repetition matching of expression *e* one or more times. |
| Optional | `~e` | Matches expression *e* zero or one times. |
| Positive Lookahead | `&e` | Matches without consuming input if expression *e* succeeds to match the input. |
| Negative Lookahead | `~e` | Matches without consuming input if expression *e* fails to match the input. |
| Negative Lookahead | `!e` | Matches without consuming input if expression *e* fails to match the input. |
| Cut Before | `--e` | Issues a cut instruction before the expression *e*. |
| Cut After | `e--` | Issues a cut instruction after the expression *e*. |
| Action Scheduling | `e < a` | Schedules a semantic action *a* to be evaluated if expression *e* successfully matches the input. |
Expand Down
37 changes: 17 additions & 20 deletions lug/detail.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,43 +87,43 @@ inline namespace bitfield_ops {
template <class T, class = std::void_t<decltype(T::is_bitfield_enum)>>
[[nodiscard]] constexpr T operator~(T x) noexcept
{
return static_cast<T>(~static_cast<std::underlying_type_t<T>>(x)); // NOLINT(clang-analyzer-optin.core.EnumCastOutOfRange)
return static_cast<T>(~static_cast<std::underlying_type_t<T>>(x));
}

template <class T, class = std::void_t<decltype(T::is_bitfield_enum)>>
[[nodiscard]] constexpr T operator&(T x, T y) noexcept
{
return static_cast<T>(static_cast<std::underlying_type_t<T>>(x) & static_cast<std::underlying_type_t<T>>(y)); // NOLINT(clang-analyzer-optin.core.EnumCastOutOfRange)
return static_cast<T>(static_cast<std::underlying_type_t<T>>(x) & static_cast<std::underlying_type_t<T>>(y));
}

template <class T, class = std::void_t<decltype(T::is_bitfield_enum)>>
[[nodiscard]] constexpr T operator|(T x, T y) noexcept
{
return static_cast<T>(static_cast<std::underlying_type_t<T>>(x) | static_cast<std::underlying_type_t<T>>(y)); // NOLINT(clang-analyzer-optin.core.EnumCastOutOfRange)
return static_cast<T>(static_cast<std::underlying_type_t<T>>(x) | static_cast<std::underlying_type_t<T>>(y));
}

template <class T, class = std::void_t<decltype(T::is_bitfield_enum)>>
[[nodiscard]] constexpr T operator^(T x, T y) noexcept
{
return static_cast<T>(static_cast<std::underlying_type_t<T>>(x) ^ static_cast<std::underlying_type_t<T>>(y)); // NOLINT(clang-analyzer-optin.core.EnumCastOutOfRange)
return static_cast<T>(static_cast<std::underlying_type_t<T>>(x) ^ static_cast<std::underlying_type_t<T>>(y));
}

template <class T, class = std::void_t<decltype(T::is_bitfield_enum)>>
constexpr T& operator&=(T& x, T y) noexcept
{
return (x = x & y); // NOLINT(clang-analyzer-optin.core.EnumCastOutOfRange)
return (x = x & y);
}

template <class T, class = std::void_t<decltype(T::is_bitfield_enum)>>
constexpr T& operator|=(T& x, T y) noexcept
{
return (x = x | y); // NOLINT(clang-analyzer-optin.core.EnumCastOutOfRange)
return (x = x | y);
}

template <class T, class = std::void_t<decltype(T::is_bitfield_enum)>>
constexpr T& operator^=(T& x, T y) noexcept
{
return (x = x ^ y); // NOLINT(clang-analyzer-optin.core.EnumCastOutOfRange)
return (x = x ^ y);
}

} // namespace bitfield_ops
Expand Down Expand Up @@ -161,9 +161,6 @@ using enable_if_char_input_iterator_t = std::enable_if_t<
template <class It, class T = void>
using enable_if_char_contiguous_iterator_t = std::enable_if_t<is_char_contiguous_iterator_v<It>, T>;

template <class... Args>
constexpr void ignore(Args&&...) noexcept {} // NOLINT(cppcoreguidelines-missing-std-forward,hicpp-named-parameter,readability-named-parameter)

struct identity
{
template <class T>
Expand Down Expand Up @@ -224,42 +221,42 @@ template <class MemberPtrType, MemberPtrType MemberPtr, class ObjectIterator>
template <class T>
class dynamic_cast_if_base_of
{
std::remove_reference_t<T>& value_; // NOLINT(cppcoreguidelines-avoid-const-or-ref-data-members)
std::reference_wrapper<std::remove_reference_t<T>> value_;

public:
constexpr explicit dynamic_cast_if_base_of(std::remove_reference_t<T>& x) noexcept : value_{x} {}

template <class U, class = std::enable_if_t<std::is_base_of_v<std::decay_t<T>, std::decay_t<U>>>>
[[nodiscard]] constexpr operator U&() const // NOLINT(hicpp-explicit-conversions)
[[nodiscard]] constexpr operator U&() const noexcept(std::is_same_v<std::decay_t<T>, std::decay_t<U>>) // NOLINT(google-explicit-constructor,hicpp-explicit-conversions)
{
#ifndef LUG_NO_RTTI
if constexpr (std::is_same_v<std::decay_t<T>, std::decay_t<U>>)
#endif // LUG_NO_RTTI
return static_cast<std::remove_reference_t<U>&>(value_);
return static_cast<std::remove_reference_t<U>&>(value_.get());
#ifndef LUG_NO_RTTI
else
return dynamic_cast<std::remove_reference_t<U>&>(value_);
return dynamic_cast<std::remove_reference_t<U>&>(value_.get());
#endif // LUG_NO_RTTI
}
};

template <class Error>
class reentrancy_sentinel
{
bool& value; // NOLINT(cppcoreguidelines-avoid-const-or-ref-data-members)
std::reference_wrapper<bool> value_;

public:
constexpr explicit reentrancy_sentinel(bool& x)
: value{x}
: value_{x}
{
if (value)
if (value_.get())
throw Error();
value = true;
value_.get() = true;
}

~reentrancy_sentinel()
{
value = false;
value_.get() = false;
}

reentrancy_sentinel(reentrancy_sentinel const&) = delete;
Expand Down Expand Up @@ -341,7 +338,7 @@ inline std::size_t push_back_unique(Sequence& s, T&& x)
template <class Sequence>
[[nodiscard]] inline auto pop_back(Sequence& s) -> typename Sequence::value_type
{
typename Sequence::value_type result{std::move(s.back())};
typename Sequence::value_type result{std::move(s.back())}; // NOLINT(misc-const-correctness)
s.pop_back();
return result;
}
Expand Down
2 changes: 1 addition & 1 deletion lug/error.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ class reenterant_read_error : public lug_error { public: reenterant_read_error()
class parse_context_error : public lug_error { public: parse_context_error() : lug_error{"operation valid only inside calling context of parser::parse" } {} };
class accept_context_error : public lug_error{ public: accept_context_error() : lug_error{"operation valid only inside calling context of parser::accept"} {} };
class attribute_stack_error : public lug_error{ public: attribute_stack_error() : lug_error{"incompatible or invalid stack frame"} {} };
class bad_string_expression : public lug_error { public: bad_string_expression(const std::string& s = "invalid string or bracket expression") : lug_error{s} {} };
class bad_string_expression : public lug_error { public: explicit bad_string_expression(std::string const& s = "invalid string or bracket expression") : lug_error{s} {} };
class bad_character_class : public bad_string_expression { public: bad_character_class() : bad_string_expression{"invalid character class"} {} };
class bad_character_range : public bad_string_expression { public: bad_character_range() : bad_string_expression{"character range is reversed"} {} };
class bad_grammar : public lug_error { public: bad_grammar() : lug_error{"invalid or empty grammar"} {} };
Expand Down
Loading

0 comments on commit b6c08a2

Please sign in to comment.