princeton-nlp · nanjiangwill · Nov 27, 2024 · Nov 27, 2024 · Nov 27, 2024 · Nov 27, 2024
diff --git a/commit0_prepare/default_from_url_commit0.yaml b/commit0_prepare/default_from_url_commit0.yaml
@@ -0,0 +1,114 @@
+system_template: |-
+  SETTING: You are an autonomous programmer, and you're working directly in the command line with a special interface.
+
+  The special interface consists of a file editor that shows you {WINDOW} lines of a file at a time.
+  In addition to typical bash commands, you can also use the following commands to help you navigate and edit files.
+
+  COMMANDS:
+  {command_docs}
+
+  Please note that THE EDIT COMMAND REQUIRES PROPER INDENTATION.
+  If you'd like to add the line '        print(x)' you must fully write that out, with all those spaces before the code! Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
+
+  RESPONSE FORMAT:
+  Your shell prompt is formatted as follows:
+  (Open file: <path>) <cwd> $
+
+  You need to format your output using two fields; discussion and command.
+  Your output should always include _one_ discussion and _one_ command field EXACTLY as in the following example:
+  DISCUSSION
+  First I'll start by using ls to see what files are in the current directory. Then maybe we can look at some relevant files to see what they look like.
+  ```
+  ls -a
+  ```
+
+  You should only include a *SINGLE* command in the command section and then wait for a response from the shell before continuing with more discussion and commands. Everything you include in the DISCUSSION section will be saved for future reference.
+  If you'd like to issue two commands at once, PLEASE DO NOT DO THAT! Please instead first submit just the first command, and then after receiving a response you'll be able to issue the second command.
+  You're free to use any other bash commands you want (e.g. find, grep, cat, ls, cd) in addition to the special commands listed above.
+  However, the environment does NOT support interactive session commands (e.g. python, vim), so please do not invoke them.
+instance_template: |-
+  Here is your task:
+
+  You need to complete the implementations for all functions (i.e., those with pass
+  statements) and pass the unit tests.
+
+  Do not change the names of existing functions or classes, as they may be referenced
+  from other code like unit tests, etc.
+
+  When you generate code, you must maintain the original formatting of the function
+  stubs (such as whitespaces), otherwise we will not able to search/replace blocks
+  for code modifications, and therefore you will receive a score of 0 for your generated
+  code.
+
+  Do not install any package as you are in a docker that has all the dependencies installed.
+
+  The source dir you need to edit is in: /testbed/{src_dir}
+
+  Here is the test you can use: {test_cmd} /testbed/{test_dir}
+
+  The first ever thing to do before you do anything else is to go to /testbed and do: {git_reset}
+
+  IMPORTANT TIPS:
+  1. If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it!
+
+  2. If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, don't just use the scroll_down command multiple times. Instead, use the goto 583 command. It's much quicker.
+
+  3. Always make sure to look at the currently open file and the current working directory (which appears right after the currently open file). The currently open file might be in a different directory than the working directory! Note that some commands, such as 'create', open files, so they might change the current  open file.
+
+  4. When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.
+
+  5. You should keep editing files until you pass the tests. Do not discard the changes you made to the code immediately or easily. It normally takes multiple iterations to get the code right.
+
+  6. Even eventually you did not pass the tests, partial progress is still better than no progress. So do not discard partial progress easily.
+
+
+  (Open file: {open_file})
+  (Current directory: {working_dir})
+  bash-$
+next_step_template: |-
+  {observation}
+  (Open file: {open_file})
+  (Current directory: {working_dir})
+  bash-$
+next_step_no_output_template: |-
+  Your command ran successfully and did not produce any output.
+  (Open file: {open_file})
+  (Current directory: {working_dir})
+  bash-$
+demonstration_template: |
+  Here is a demonstration of how to correctly accomplish this task.
+  It is included to show you how to correctly use the interface.
+  You do not need to follow exactly what is done in the demonstration.
+  --- DEMONSTRATION ---
+  {demonstration}
+  --- END OF DEMONSTRATION ---
+state_command:
+  name: state
+  code: |
+    state() {
+      local working_dir="/testbed";
+      if [ -z "$CURRENT_FILE" ]; then
+          echo '{"open_file": "n/a", "working_dir": "'$working_dir'"}';
+      else
+          echo '{"open_file": "'$(realpath "$CURRENT_FILE")'", "working_dir": "'$working_dir'"}';
+      fi
+    };
+parse_function: ThoughtActionParser
+env_variables:
+  WINDOW: 100
+  OVERLAP: 2
+  CURRENT_LINE: 0
+  CURRENT_FILE: ''
+  SEARCH_RESULTS: ()
+  SEARCH_FILES: ()
+  SEARCH_INDEX: 0
+command_files:
+- config/commands/defaults.sh
+- config/commands/search.sh
+- config/commands/edit_linting.sh
+- config/commands/_split_string.py
+- config/commands/commit0/{submit.sh}
+parse_command: ParseCommandDetailed
+history_processor: Last5Observations
+demonstrations:
+- trajectories/demonstrations/replay__marshmallow-code__marshmallow-1867__default__t-0.20__p-0.95__c-2.00__install-1___install_from_source/marshmallow-code__marshmallow-1867.traj
diff --git a/commit0_prepare/empty_repo b/commit0_prepare/empty_repo
diff --git a/commit0_prepare/generate_prompt_for_each_repo.py b/commit0_prepare/generate_prompt_for_each_repo.py
@@ -0,0 +1,155 @@
+from __future__ import annotations
+
+import os
+import shutil
+
+from datasets import load_dataset
+
+
+def copy_to_subdirs(source_file: str, target_dir: str) -> None:
+    """
+    Copy a file to all subdirectories in a given directory.
+
+    Args:
+        source_file: Path to the file to copy
+        target_dir: Directory containing subdirectories to copy to
+    """
+    # Verify source file exists
+    if not os.path.isfile(source_file):
+        raise FileNotFoundError(f"Source file {source_file} does not exist")
+
+    os.makedirs(target_dir, exist_ok=True)
+    # Verify target directory exists
+    if not os.path.isdir(target_dir):
+        raise NotADirectoryError(f"Target directory {target_dir} does not exist")
+
+    # Get source filename
+    filename = os.path.basename(source_file)
+
+    # Get all immediate subdirectories
+    subdirs = [
+        "simply",
+        "wcwidth",
+        "parsel",
+        "chardet",
+        "minitorch",
+        "tinydb",
+        "deprecated",
+        "voluptuous",
+        "cachetools",
+        "imapclient",
+        "marshmallow",
+        "jinja",
+        "cookiecutter",
+        "portalocker",
+        "pyjwt",
+        "babel",
+        "statsmodels",
+        "python-progressbar",
+        "xarray",
+        "imbalanced-learn",
+        "web3.py",
+        "scrapy",
+        "seaborn",
+        "pypdf",
+        "pexpect",
+        "pytest",
+        "pylint",
+        "joblib",
+        "dulwich",
+        "virtualenv",
+        "networkx",
+        "requests",
+        "sphinx",
+        "jedi",
+        "moviepy",
+        "loguru",
+        "paramiko",
+        "geopandas",
+        "bitstring",
+        "fastapi",
+        "tornado",
+        "python-prompt-toolkit",
+        "attrs",
+        "PyBoy",
+        "pydantic",
+        "filesystem_spec",
+        "tlslite-ng",
+        "graphene",
+        "mimesis",
+        "dnspython",
+        "python-rsa",
+        "more-itertools",
+        "click",
+        "fabric",
+        "flask",
+        "sqlparse",
+    ]
+
+    # Copy file to each subdirectory
+    for idx, subdir in enumerate(subdirs):
+        target_path = os.path.join(target_dir, subdir, filename.replace(".md", f"_{subdir}.md"))
+        os.makedirs(os.path.dirname(target_path), exist_ok=True)
+        try:
+            shutil.copy2(source_file, target_path)
+            print(f"Copied {filename} to {subdir}")
+        except Exception as e:
+            print(f"Failed to copy to {subdir}: {str(e)}")
+
+
+commit0_dataset = load_dataset("wentingzhao/commit0_combined", split="test")
+
+file = "commit0_prepare/my_issue.md"
+target_dir = "commit0_prepare/repos/"
+
+copy_to_subdirs(file, target_dir)
+
+# Create directory if it doesn't exist
+os.makedirs("config/commands/commit0/", exist_ok=True)
+
+for i in commit0_dataset:
+    repo_name = i["repo"].split("/")[1]
+    test_cmd = i["test"]["test_cmd"]
+    test_dir = i["test"]["test_dir"]
+    src_dir = i["src_dir"]
+    base_commit = i["base_commit"]
+    reset_cmd = f"git reset --hard {base_commit}"
+    # submit_cmd = f"`git diff {base_commit} -- . ':(exclude)spec.pdf.bz2' > /patch.diff`"
+
+    submit_sh = f"{repo_name}_submit.sh"
+    # Read the default yaml file
+    with open("commit0_prepare/default_from_url_commit0.yaml") as f:
+        yaml_content = f.read()
+
+    yaml_content = yaml_content.replace("{src_dir}", f"{src_dir}")
+    # Replace {test_cmd} with actual test command and directory
+    yaml_content = yaml_content.replace("{test_cmd}", f"{test_cmd}")
+    yaml_content = yaml_content.replace("{test_dir}", f"{test_dir}")
+    yaml_content = yaml_content.replace("{git_reset}", f"{reset_cmd}")
+    # yaml_content = yaml_content.replace('{submit_cmd}', f'{submit_cmd}')
+    yaml_content = yaml_content.replace("{submit.sh}", f"{submit_sh}")
+
+    submit_path = f"config/commands/commit0/{repo_name}_submit.sh"
+
+    # Create submit.sh content for this repo
+    submit_content = f"""# @yaml
+# signature: submit
+# docstring: submits your current code and terminates the session
+submit() {{
+    cd /testbed
+
+    git add -A
+    git diff {base_commit} -- . ':(exclude)spec.pdf.bz2' > model.patch
+    echo "<<SUBMISSION||"
+    cat model.patch
+    echo "||SUBMISSION>>"
+}}
+"""
+    # Write submit.sh file
+    with open(submit_path, "w") as f:
+        f.write(submit_content)
+    # Create new yaml file for this repo
+    output_path = f"config/commit0/prompt/{repo_name}.yaml"
+    os.makedirs(os.path.dirname(output_path), exist_ok=True)
+    with open(output_path, "w") as f:
+        f.write(yaml_content)
diff --git a/commit0_prepare/my_issue.md b/commit0_prepare/my_issue.md
@@ -0,0 +1,12 @@
+Here is your task:
+
+  You need to complete the implementations for all functions (i.e., those with pass
+  statements) and pass the unit tests.
+
+  Do not change the names of existing functions or classes, as they may be referenced
+  from other code like unit tests, etc.
+
+  When you generate code, you must maintain the original formatting of the function
+  stubs (such as whitespaces), otherwise we will not able to search/replace blocks
+  for code modifications, and therefore you will receive a score of 0 for your generated
+  code.
diff --git a/commit0_prepare/run_commit0.sh b/commit0_prepare/run_commit0.sh
@@ -0,0 +1,78 @@
+#!/bin/bash
+
+export ANTHROPIC_API_KEY=XXX
+repos=(
+    "simpy"
+    "wcwidth"
+    "parsel"
+    "chardet"
+    "minitorch"
+    "tinydb"
+    "deprecated"
+    "voluptuous"
+    "cachetools"
+    "imapclient"
+    "marshmallow"
+    "jinja"
+    "cookiecutter"
+    "portalocker"
+    "pyjwt"
+    "babel"
+    "statsmodels"
+    "python-progressbar"
+    "xarray"
+    "imbalanced-learn"
+    "web3.py"
+    "scrapy"
+    "seaborn"
+    "pypdf"
+    "pexpect"
+    "pytest"
+    "pylint"
+    "joblib"
+    "dulwich"
+    "virtualenv"
+    "networkx"
+    "requests"
+    "sphinx"
+    "jedi"
+    "moviepy"
+    "loguru" #####
+    "paramiko"
+    "geopandas"
+    "bitstring"
+    "fastapi"
+    "tornado"
+    "python-prompt-toolkit"
+    "attrs"
+    "PyBoy"
+    "pydantic"
+    "filesystem_spec"
+    "tlslite-ng"
+    "graphene"
+    "mimesis"
+    "dnspython"
+    "python-rsa"
+    "more-itertools"
+    "click"
+    "fabric"
+    "flask"
+    "sqlparse"
+)
+
+for repo in "${repos[@]}"; do
+    echo "Processing $repo..."
+
+    python run.py \
+        --model_name claude-3-5-sonnet-20240620 \
+        --data_path "commit0_prepare/repos/$repo/my_issue_$repo.md" \
+        --repo_path "commit0_prepare/empty_repo" \
+        --config_file "config/commit0/prompt/$repo.yaml" \
+        --image_name wentingzhao/$repo:v0 \
+        --per_instance_cost_limit 1.00 \
+        --apply_patch_locally > "log/commit0/$repo.log" 2>&1
+
+    echo "Completed $repo"
+done
+
+echo "All repos processed"