Automate Model Solution Evaluation Against Official Solutions #12

lgabs · 2024-11-12T10:03:09Z

Automate the evaluation process of model-generated solutions by comparing them against official solutions. This would streamline the validation and accuracy assessment. Yet, final human review would be still be necessary to avoid mistakes, but would be a much more faster phase for reviewers.

An simple and efficient process would be:

store answers in a subfolder of exam_path to be solved
during the solving process, at the end of each question, make a prompt asking to gpt-4o (cheaper compared to o1) to compare model's answer and offical answer and produce a concise result, much like langchain correctness prompts; save this comparison
when compiling the pdf, put the comparison after the model's answer for easy assessment

The text was updated successfully, but these errors were encountered:

lgabs added the help wanted Extra attention is needed label Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate Model Solution Evaluation Against Official Solutions #12

Automate Model Solution Evaluation Against Official Solutions #12

lgabs commented Nov 12, 2024

Automate Model Solution Evaluation Against Official Solutions #12

Automate Model Solution Evaluation Against Official Solutions #12

Comments

lgabs commented Nov 12, 2024