Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process Hangs Due to Recursive Code Transformations in Rule Graphs #709

Open
sasidv opened this issue Nov 6, 2024 · 14 comments
Open

Process Hangs Due to Recursive Code Transformations in Rule Graphs #709

sasidv opened this issue Nov 6, 2024 · 14 comments

Comments

@sasidv
Copy link

sasidv commented Nov 6, 2024

Thank you again for the great work. I have a question to ensure I fully understand the expectations of the rule graphs. What I'm trying to do is the following change:

before code =

from tensorflow.keras import layers
sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()

after code =

from tensorflow.keras import layers
sp = layers.builder.config("config3", "1").config("config1", "5").config("config2", "5").getOrCreate()

Objective:

The goal is to add .config("config3", "1") to the config chain only if the import statement from tensorflow.keras import layers exists in the file. The change should not occur if the import is different, for examplefrom pytorch.ll import layers. I created a rule graph with two nodes:

  1. The first node checks if the import exists in the file.
  2. If the import is found, the second rule is executed to add .config("config3", "1") to the chain.

Question 1: Infinite Transformation Loop When Adding .config("config3", "1")

Observation:
The process hangs when I add .config("config3", "1") because it matches the search query again. If I change the replace parameter in the extend_method_chain rule to .config("1"), the code changes successfully.


Question 2: Transformation Applied Regardless of Import Presence

Observation:
The second rule, extend_method_chain, is executed and .config("config3", "1") is added to the method chain even when from tensorflow.keras import layers is not present in the file. What I want is the change to happen only if the import exists. Additionally, I also tried to change the scope of the rule to the File, expecting to fix this error, and got below error
pyo3_runtime.PanicException: Could not create scope query for "File".


Blow is the code that hangs

from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges

def main():
    code = """from tensorflow.keras import layers
sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""

    find_from_tensorflow_keras_import_layers = Rule(
        name="find_from_tensorflow_keras_import_layers",
        query="""(
  (import_from_statement
    module_name: (dotted_name
      (identifier) @module_name_part
      (identifier) @submodule_name_part)
    name: (dotted_name
      (identifier) @imported_name)
  (#eq? @module_name_part "tensorflow")
  (#eq? @submodule_name_part "keras")
  (#eq? @imported_name "layers"))
)""",  # cs indicates we are using concrete syntax
        is_seed_rule=True
    )

    extend_method_chain = Rule(
        name="extend_method_chain",
        query="""(
  (call
    function: (attribute
      object: (attribute
        object: (identifier) @obj
        attribute: (identifier) @builder_method
      )
      attribute: (identifier) @config_method
    )
    arguments: (argument_list
      (string) @config_arg1
      (string) @config_arg2
    )
  )
  (#eq? @obj "layers")
  (#eq? @builder_method "builder")
  (#eq? @config_method "config")
)""",  # cs indicates we are using concrete syntax
        replace_node="config_method",
        replace="config(\"config3\", \"1\").@config_method",
        is_seed_rule=True
    )

    edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")

    # Create Piranha arguments
    piranha_arguments = PiranhaArguments(
        code_snippet=code,
        language="python",
        rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge], )
    )

    # Execute Piranha and print the transformed code
    piranha_summary = execute_piranha(piranha_arguments)
    print(piranha_summary[0].content)


if __name__ == "__main__":
    main()

Do you have any suggestions or workarounds to fix both errors? Or am I doing something unexpected?

@danieltrt
Copy link
Collaborator

danieltrt commented Nov 6, 2024

Alright! My first suggestion is to avoid using tree-sitter queries and instead use "concrete syntax," unless you truly need the full expressiveness of tree-sitter queries (such as for alternations or similar complex patterns).
Concrete syntax is straightforward: you write the exact code pattern you want to match and leave placeholders for parts that can vary.

For example:

:[x].foo(:[args+]) would match var1.foo(1,2,3,4).

:[var] matches one node, :[var+] would match multiple.

Now, regarding the infinite loop issue: Piranha operates by applying transformations iteratively until it reaches a fixpoint, meaning it will continue transforming code until there are no further changes. If the code still matches after a rewrite, the same rule will trigger again. This approach is effective for feature-flash cleanup, but not ideal for migrations. To prevent repeated matches, we use what we call filters. Take a look at your code with these changes:

from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter

def main():
    code = """from tensorflow.keras import layers
sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""

    find_from_tensorflow_keras_import_layers = Rule(
        name="find_from_tensorflow_keras_import_layers",
        query="""rgx from tensorflow.keras import layers""",  # rgx indicates regex match
        is_seed_rule=True
    )

    extend_method_chain = Rule(
        name="extend_method_chain",
        query="""cs :[builder].config(:[args+])""",  # cs indicates we are using concrete syntax
        replace_node="builder",
        replace=':[builder].config("config3", "1")',
        is_seed_rule=True,
        filters={Filter(enclosing_node="(assignment) @assign",
                        not_contains=['cs :[other].config("config3", "1")'])} # here is the filter that says "only apply this rule once per assignment"
    )

    edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")

    # Create Piranha arguments
    piranha_arguments = PiranhaArguments(
        code_snippet=code,
        language="python",
        rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge], )
    )

    # Execute Piranha and print the transformed code
    piranha_summary = execute_piranha(piranha_arguments)
    print(piranha_summary[0].content)


if __name__ == "__main__":
    main()

@sasidv
Copy link
Author

sasidv commented Nov 6, 2024

Thank you for the suggestions! I'll try using concrete syntaxes for now, but I might need to explore Tree-sitter queries in the future. The looping issue has been resolved with the Filter , thanks, but when I remove the import tensorflow.keras import layers from the code above, the changes are still applied, which shouldn't happen. Do you have an idea?

from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter

def main():
    code = """sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""

    find_from_tensorflow_keras_import_layers = Rule(
        name="find_from_tensorflow_keras_import_layers",
        query="""rgx from tensorflow.keras import layers""",  # rgx indicates regex match
        is_seed_rule=True
    )

    extend_method_chain = Rule(
        name="extend_method_chain",
        query="""cs :[builder].config(:[args+])""",  # cs indicates we are using concrete syntax
        replace_node="builder",
        replace=':[builder].config("config3", "1")',
        is_seed_rule=True,
        filters={Filter(enclosing_node="(assignment) @assign",
                        not_contains=['cs :[other].config("config3", "1")'])} # here is the filter that says "only apply this rule once per assignment"
    )

    edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")

    # Create Piranha arguments
    piranha_arguments = PiranhaArguments(
        code_snippet=code,
        language="python",
        rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge], )
    )

    # Execute Piranha and print the transformed code
    piranha_summary = execute_piranha(piranha_arguments)
    print(piranha_summary[0].content)


if __name__ == "__main__":
    main()

Expected output

sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()

Actual output

sp = layers.builder.config("config1", "5").config("config3", "1").config("config2", "5").getOrCreate()

@danieltrt
Copy link
Collaborator

The second rule needs to be declared as seed = False

@sasidv
Copy link
Author

sasidv commented Nov 6, 2024

Thank you so much for your help in figuring this out. I tried setting is_seed_rule=False, but when I do that, the rule doesn't execute at all. However, when I set is_seed_rule=True, only the extend_method_chain rule is executed and the change is made. The issue, though, is that when I set it to True, the change is applied regardless of whether from tensorflow.keras import layers is present or not. Do you have any idea?

from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter

def main():
    code = """from tensorflow.keras import layers
sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""

    find_from_tensorflow_keras_import_layers = Rule(
        name="find_from_tensorflow_keras_import_layers",
        query="""rgx from tensorflow.keras import layers""",  # rgx indicates regex match
        is_seed_rule=True
    )

    extend_method_chain = Rule(
        name="extend_method_chain",
        query="""cs :[builder].config(:[args+])""",  # cs indicates we are using concrete syntax
        replace_node="builder",
        replace=':[builder].config("config3", "1")',
        is_seed_rule=False,
        filters={Filter(enclosing_node="(assignment) @assign",
                        not_contains=['cs :[other].config("config3", "1")'])} # here is the filter that says "only apply this rule once per assignment"
    )

    edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")

    # Create Piranha arguments
    piranha_arguments = PiranhaArguments(
        code_snippet=code,
        language="python",
        rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge], )
    )

    # Execute Piranha and print the transformed code
    piranha_summary = execute_piranha(piranha_arguments)
    print(piranha_summary[0].content)


if __name__ == "__main__":
    main()

@danieltrt
Copy link
Collaborator

danieltrt commented Nov 6, 2024

I suspect the reason is the scope. Rules in PolyglotPiranha trigger other rules within scopes. In your code, the rule find_from_tensorflow_keras_import_layers is set to trigger extend_method_chain within the Parent scope (i.e., parent in the AST).

PolyglotPiranha can support multiple scopes (they are user defined for each language). For example, for java, you can find scopes here

Unfortunately, we don't have scopes defined for Python (we would need to add it to the cleanup_rules folder). However, there are two language agnostic scopes you can use right now. Parent and Global. If you switch from Parent to Global in your edge, the rule should now trigger. This isn't exactly what you need, since Global will trigger the rule for the entire codebase, rather than just at the File level. Alternatively, you can set number_of_ancestors_in_parent_scope to a high number such that Parent triggers all the way up to the File Node:

    piranha_arguments = PiranhaArguments(
        code_snippet=code,
        language="python",
        rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge]),
        number_of_ancestors_in_parent_scope=30 # set a big number here
    )

Let me know if this helps!

@sasidv
Copy link
Author

sasidv commented Nov 7, 2024

Thank you again. I tried increasing the value of number_of_ancestors_in_parent_scope to its maximum, but the issue persists, and the rule extend_method_chain still doesn't execute. If there are no other workarounds available, and using the "File" scope is the cleanest solution, I’m happy to implement that feature in Piranha for Python. If you could provide a list of all the changes I need to make in Piranha to implement the "File" scope, I'd be glad to create a PR once I have the details.

from polyglot_piranha import execute_piranha, PiranhaArguments, Rule, RuleGraph, OutgoingEdges, Filter

def main():
    code = """from tensorflow.keras import layers
sp = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()"""

    find_from_tensorflow_keras_import_layers = Rule(
        name="find_from_tensorflow_keras_import_layers",
        query="""rgx from tensorflow.keras import layers""",  # rgx indicates regex match
        is_seed_rule=True
    )

    extend_method_chain = Rule(
        name="extend_method_chain",
        query="""cs :[builder].config(:[args+])""",  # cs indicates we are using concrete syntax
        replace_node="builder",
        replace=':[builder].config("config3", "1")',
        is_seed_rule=False,
        filters={Filter(enclosing_node="(assignment) @assign",
                        not_contains=['cs :[other].config("config3", "1")'])} # here is the filter that says "only apply this rule once per assignment"
    )

    edge = OutgoingEdges("find_from_tensorflow_keras_import_layers", to=["extend_method_chain"], scope="Parent")

    piranha_arguments = PiranhaArguments(
        code_snippet=code,
        language="python",
        rule_graph=RuleGraph(rules=[find_from_tensorflow_keras_import_layers, extend_method_chain], edges=[edge]),
        number_of_ancestors_in_parent_scope=255 # set a big number here
    )

    # Execute Piranha and print the transformed code
    piranha_summary = execute_piranha(piranha_arguments)
    print(piranha_summary[0].content)


if __name__ == "__main__":
    main()

@danieltrt
Copy link
Collaborator

You're right; I had forgotten. Unfortunately, parent scope only applies to nodes that are actual parents and not their context recursively, so the current workaround is to set it to global.

As for the required changes, I would be happy to review your PRs, but I currently don’t have write access to this repository, so I can't promise it will be merged. The change is pretty straightforward; you just have to create a scope config file and then add its relative path to the corresponding language, as shown here:

scopes: vec![],

In src/cleanup_rules/python/scope_config.toml

# Scope generator for python files
[[scopes]]
name = "File"
[[scopes.rules]]
enclosing_node = """
(module) @p_m
"""
scope = "(module) @python_module"

You may also add other scopes, like class, function, method etc

@sasidv
Copy link
Author

sasidv commented Nov 10, 2024

@danieltrt Thank you for your help. I created the file and successfully defined File scope, it allows me to execute the changes as expected. To complete this and create the PR, I also added scopes for Function and Class as shown below. File scope works perfectly fine, but I am getting the following error with the Class and Function scopes. Do you have any idea about this? Is there another change I should do, or anything wrong with scope_config.toml Thanks again

thread '<unnamed>' panicked at src/models/scopes.rs:80:5:
Could not create scope query for "Class"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/Users/xxx/Documents/yyy/src/transformation1.py", line 19, in <module>
    main()
  File "/Users/xxx/Documents/yyysrc/transformation1.py", line 13, in main
    piranha_summary = execute_piranha(piranha_arguments)
pyo3_runtime.PanicException: Could not create scope query for "Class"

Below is the python/scope_config.toml

# Scope generator for python files
[[scopes]]
name = "Function"
[[scopes.rules]]
enclosing_node = """
(
  (function_definition
      name: (_) @n
      parameters: (parameters) @fp
  ) @xdn
)"""
scope = """
(
  [
    (function_definition
        name: (_) @z
        parameters: (parameters) @tp
    )
    (#eq? @z "@n")
    (#eq? @tp "@fp")
  ]
) @qdn
"""

[[scopes]]
name = "Class"
[[scopes.rules]]
enclosing_node = """
((class_definition name: (_) @n) @c)
"""
scope = """
(
  ((class_definition
      name: (_) @z) @qc)
  (#eq? @z "@n")
)
"""


[[scopes]]
name = "File"
[[scopes.rules]]
enclosing_node = """
(module) @p_m
"""
scope = "(module) @python_module"

@danieltrt
Copy link
Collaborator

That seems right to me for your example. PolyglotPiranha cannot find an enclosing class in from tensorflow.keras import layers, hence you get that error.

The function scope seems to have a little problem. This version works for me:

[[scopes]]
name = "Function"
[[scopes.rules]]
enclosing_node = """
(
  (function_definition
      name: (_) @n
      parameters: (parameters) @fp
  ) @xdn
)"""
scope = """
(
  [
    (function_definition
        name: (_) @z
        parameters: (parameters) @tp
    ) @qdn
    (#eq? @z "@n")
    (#eq? @tp "@fp")
  ]
)
"""

Try running it on this code instead:

def test_fn(x):
    from tensorflow.keras import layers
    other = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()
    
first = layers.builder.config("config1", "5").config("config2", "5").getOrCreate()

Hopefully you will observe the expected behavior 😃

@sasidv
Copy link
Author

sasidv commented Nov 11, 2024

It works now—thanks a lot for your help. However, if PolyglotPiranha can’t find an enclosing class or function, should it print a warning or error message instead of raising an exception? Is this the expected behavior?

@danieltrt
Copy link
Collaborator

I agree, I think a warning could have been a better solution here ? I don't remember the rationale for this design decision

@maldil
Copy link

maldil commented Nov 12, 2024

The PR has been opened. Thank you once again for your help.

@ketkarameya
Copy link
Collaborator

I think we can add a flag that optionally prints a warning rather than an throwing an exception.
Throwing an exception gives a quicker feedback on what graph paths are working on the given codebase (so that the rules can be updated), and it also makes behavior more deterministic. Keeping track of warnings without a proper observability infrastructure is difficult.

@maldil
Copy link

maldil commented Nov 18, 2024

Sure, if the expectation is to apply each rule at least once, that makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants