Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic prepared queries #172

Open
alexpovel opened this issue Nov 10, 2024 · 0 comments
Open

Dynamic prepared queries #172

alexpovel opened this issue Nov 10, 2024 · 0 comments

Comments

@alexpovel
Copy link
Owner

Currently, for compatibility with clap, each language provides static "prepared queries". For example, the definition for Python is:

/// Prepared tree-sitter queries for Python.
#[derive(Debug, Clone, Copy, ValueEnum)]
pub enum PreparedQuery {
/// Comments.
Comments,
/// Strings (raw, byte, f-strings; interpolation not included).
Strings,
/// Module names in imports (incl. periods; excl. `import`/`from`/`as`/`*`).
Imports,
/// Docstrings (not including multi-line strings).
DocStrings,
/// Function names, at the definition site.
FunctionNames,
/// Function calls.
FunctionCalls,
/// Class definitions (in their entirety).
Class,
/// Function definitions (*all* `def` block in their entirety).
Def,
/// Async function definitions (*all* `async def` block in their entirety).
AsyncDef,
/// Function definitions inside `class` bodies.
Methods,
/// Function definitions decorated as `classmethod` (excl. the decorator).
ClassMethods,
/// Function definitions decorated as `staticmethod` (excl. the decorator).
StaticMethods,
/// `with` blocks (in their entirety).
With,
/// `try` blocks (in their entirety).
Try,
/// `lambda` statements (in their entirety).
Lambda,
/// Global, i.e. module-level variables.
Globals,
/// Identifiers for variables (left-hand side of assignments).
VariableIdentifiers,
/// Types in type hints.
Types,
/// Identifiers (variable names, ...).
Identifiers,
}

Notice that the enum is a unit enum, i.e. variants do not have associated data (i.e. they aren't tuple or struct variants). The enum is later mapped to tree-sitter queries like:

impl PreparedQuery {
#[allow(clippy::too_many_lines)]
const fn as_str(self) -> &'static str {
match self {
Self::Comments => "(comment) @comment",

Notice how the result is actually a &'static str. It'd be super useful to have this be more dynamic. For example, a definition more like (abbreviated for the example):

#[derive(Debug, ValueEnum)]
enum PreparedQuery {
    Strings,
    Class(Option<String>),
}

which means:

  • we can query for a String in Python, such as "hello world": it does not have a concept of "namedness", so it remains a unit variant

  • Python classes however do have a name:

    class TheName:
        ...

    The Option<String> now says:

    • if it's None, query for all classes, of any name
    • if it's Some(name_pattern), query only classes whose name matches the pattern

    The concept of "can be named" expends to functions, modules etc., while things like "comments" remain unnamed.

    Note: some things could carry multiple names. E.g., assignment like x = 3 could be an enum variant of roughly Assignment(Option<String>, Option<String>), to say "left side of equal signs has to match .0, right side .1. If None, would mean "matches anything" again. This would be a nice-to-have.

    Note: the Option<String> could also be just String, with a default value of .*, aka "matches anything" regex pattern. I use this style here:

    srgn/src/main.rs

    Line 1063 in 635839b

    default_value = GLOBAL_SCOPE,

    aka the CLI argument is a String with a default_value, instead of an Option<String> with more logic attached to it. The former style is simpler and works.

So when we extract a tree-sitter query later on, it would look more like:

impl PreparedQuery {
    fn as_str(&self) -> String {
        match self {
            Self::Strings => "(string_content) @string",
            Self::Class(None) => "(class_definition) @class", // ANY class
            Self::Class(Some(pattern)) => r#"(class_definition name: (identifier) @x (@match? @x "{pattern}"))"#, // only classes whose `name` matches the `pattern`
        }
    }.into()
}

which would open a whole new level of usage. Ideally, this would be a drop-in replacement for

srgn/src/main.rs

Line 1505 in 635839b

python: Vec<python::PreparedQuery>,

which would continue to "just work", just with added benefits. The CLI would then look like:

$ srgn --python strings                     # find all strings in Python source code
$ srgn --python class                       # find all `class`es, anywhere
$ srgn --python class 'Test.+'              # find all `class`es whose name matches this regex
$ srgn --python class -- 'hello .+'         # find the regex 'hello .+' in _any_ class; `--` disambiguates positional arg
$ srgn --python class 'Meta.+' -- 'bye .+'  # find the regex 'bye .+' _only_ inside of classes matching the regex

This seems pretty dynamic, so not sure it could work. It mainly hinges on clap-rs/clap#2621.

Workarounds

All queries as individual flags

Example usage:

$ srgn --python-class
$ srgn --python-class -- bla
$ srgn --python-class 'Test.+' -- bla

with the same logic as above. In source code, it would be something like:

    #[derive(Parser, Debug, Clone)]
    #[group(required = false, multiple = false)]
    struct PythonScope {
        /// A Python class.
        #[arg(long, env, verbatim_doc_comment, default_missing_value = "", num_args=0..=1)]
        python_class: Option<python::Class>,

with a custom impl FromStr for Class. A bit of a lackluster solution:

  • lots of boilerplate
  • manual mapping of the different python_<whatever> options
  • can no longer do pipelining: for srgn --python-class 'Test.+' --python-string, the current logic of srgn is to look for strings only inside of bodies of classes (of name 'Test.+'). With Rust like python_class, I don't think we'll be able to access the order of arguments; we just get the fact they are present or not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant