Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harness: Detector only #833

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d9e7f28
detectoronly harness
vidushiMaheshwari Aug 15, 2024
c8b7e77
Update attempt.py
vidushiMaheshwari Aug 15, 2024
13c89fe
change
vidushiMaheshwari Aug 15, 2024
b87068f
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 15, 2024
df107c2
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
d9d43a6
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
662fbf0
100_pass_mod
vidushiMaheshwari Aug 15, 2024
15c1097
change
vidushiMaheshwari Aug 15, 2024
3f8e263
Update attempt.py
vidushiMaheshwari Aug 15, 2024
4714757
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
9f63ab1
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
1b5aa46
100_pass_mod
vidushiMaheshwari Aug 15, 2024
239cfc8
docs
vidushiMaheshwari Aug 19, 2024
65c0c36
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 19, 2024
e9eb742
harness config options and files
vidushiMaheshwari Aug 19, 2024
a35cb5a
Probewise harness is a dictionary instead of attributed class
vidushiMaheshwari Aug 20, 2024
eab5c67
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
c3d33d4
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
153ff2e
Update garak/harnesses/base.py
vidushiMaheshwari Aug 22, 2024
5086e80
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
6526194
Merge branch 'main' of https://github.com/vidushiMaheshwari/garak int…
vidushiMaheshwari Aug 23, 2024
a7b3e1f
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 23, 2024
dbe916a
decouple harness only run from execution
vidushiMaheshwari Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/source/garak.harnesses.detectoronly.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
garak.harnesses.detectoronly
====================

.. automodule:: garak.harnesses.detectoronly
:members:
:undoc-members:
:show-inheritance:

1 change: 1 addition & 0 deletions docs/source/harnesses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ garak.harnesses

garak.harnesses
garak.harnesses.base
garak.harnesses.detectoronly
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
garak.harnesses.probewise
garak.harnesses.pxd
18 changes: 18 additions & 0 deletions garak/attempt.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,24 @@ def as_dict(self) -> dict:
"messages": self.messages,
}

@classmethod
def from_dict(cls, dicti):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- out of scope for here, but we should implement serialization/deserialization for Attempts

"""Initializes an attempt object from dictionary"""
attempt_obj = cls()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this skip the attempt constructor? Can we add an explicit type signature to signal what cls is expected to be?

Copy link
Collaborator

@jmartin-tech jmartin-tech Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cls is the callable for the class which will be an Attempt. This will call the __init__() method with all defaults.

Due to the current overrides in the class attempt_obj.outputs below may not produce the same in memory object for a multi-turn conversation attempt since the existing as_dict() method serialized outputs into the log and not the full messages history.

For the purposes of this PR I suspect this is acceptable, however it is worth noting.

attempt_obj.uuid = dicti['uuid']
attempt_obj.seq = dicti['seq']
attempt_obj.status = dicti['status']
attempt_obj.probe_classname = dicti['probe_classname']
attempt_obj.probe_params = dicti['probe_params']
attempt_obj.targets = dicti['targets']
attempt_obj.prompt = dicti['prompt']
attempt_obj.outputs = dicti['outputs']
attempt_obj.detector_results = dicti['detector_results']
attempt_obj.notes = dicti['notes']
attempt_obj.goal = dicti['goal']
attempt_obj.messages = dicti['messages']
return attempt_obj

def __getattribute__(self, name: str) -> Any:
"""override prompt and outputs access to take from history"""
if name == "prompt":
Expand Down
47 changes: 42 additions & 5 deletions garak/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,21 @@ def main(arguments=None) -> None:
action="store_true",
help="If detectors aren't specified on the command line, should we run all detectors? (default is just the primary detector, if given, else everything)",
)

# harness
parser.add_argument(
"--harness_options",
type=str,
help="Type of harness to use. Default is probewise."
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
)

parser.add_argument(
"--harness_option_file",
"-H",
type=str,
help="path to JSON file containing information about harnesses"
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
)

# buffs
parser.add_argument(
"--buffs",
Expand Down Expand Up @@ -337,7 +352,7 @@ def main(arguments=None) -> None:
import garak.evaluators

try:
plugin_types = ["probe", "generator"]
plugin_types = ["probe", "generator", "harness"]
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
# do a special thing for CLI probe options, generator options
for plugin_type in plugin_types:
opts_arg = f"{plugin_type}_options"
Expand Down Expand Up @@ -368,7 +383,10 @@ def main(arguments=None) -> None:
)
raise e

config_plugin_type = getattr(_config.plugins, f"{plugin_type}s")
if plugin_type.endswith('s'):
config_plugin_type = getattr(_config.plugins, f"{plugin_type}es")
else:
config_plugin_type = getattr(_config.plugins, f"{plugin_type}s")
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved

config_plugin_type = _config._combine_into(
opts_cli_config, config_plugin_type
Expand Down Expand Up @@ -498,20 +516,39 @@ def main(arguments=None) -> None:
command.start_run() # start the run now that all config validation is complete
print(f"📜 reporting to {_config.transient.report_filename}")

if parsed_specs["detector"] == []:
if not _config.plugins.harnesses:
# Set the _config.plugins.harnesses
if parsed_specs["detector"] == []:
_config.plugins.harnesses["Probewise"] = {}
else:
_config.plugins.harnesses["Pxd"] = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just run me through the reasoning here? would this clobber harness config loaded from global / site / cli-specific config YAML?

Copy link
Collaborator

@jmartin-tech jmartin-tech Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this could be avoided. We should not clobber config without an explicit override from a command line flag.

If no specific harness was provided via config and no detectors were provided per parsed_spec that is the determining factor on which default harness to load when _config.plugins.harnesses does not contain any configuration data for a specific harness. This does expose that there may be a missing top level parameter to select a specific harness if default config were to provide for various harness types. Currently, finding config for DetectorOnly to be a selection criteria seems a bit brittle.

If there is a desire to consolidate harness selection I maybe something like:

    harness_command = command.pxd_run
    if not _config.plugins.harnesses:
        if parsed_specs["detector"] == []:
            harness_command = command.probewise_run
    elif "detectoronly" in _config.plugins.harnesses:
        harness_command = command.detector_only_run
    match harness_command:
        case command.detector_only_run:
            harness_command()
        case command.probewise_run:
            harness_command(
                generator, parsed_specs["probe"], evaluator, parsed_specs["buff"]
            )
        case command.pxd_run:
            harness_command(
                    generator,
                    parsed_specs["probe"],
                    parsed_specs["detector"],
                    evaluator,
                    parsed_specs["buff"],
                    parsed_specs["buff"],
            )
        case _: # base case for invalid callable, currently not a reachable case
            logging.warn("no valid harness selected")

    command.end_run()

There is probably more abstraction possible here but this offers an idea.


if "Probewise" in _config.plugins.harnesses:
command.probewise_run(
generator, parsed_specs["probe"], evaluator, parsed_specs["buff"]
)
else:

if "Pxd" in _config.plugins.harnesses:
command.pxd_run(
generator,
parsed_specs["probe"],
parsed_specs["detector"],
evaluator,
parsed_specs["buff"],
parsed_specs["buff"]
)

command.end_run()

elif "DetectorOnly" in _config.plugins.harnesses:
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved

if "report_path" not in _config.plugins.harnesses["DetectorOnly"]:
logging.error("report path not specified")
raise ValueError("Specify jsonl report path using report_path")

command.start_run()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the refactor for harness selection offered is not used, this needs to be removed as start_run() was called before entering this conditional.

Suggested change
command.start_run()

command.detector_only_run()
command.end_run()

else:
print("nothing to do 🤷 try --help")
if _config.plugins.model_name and not _config.plugins.model_type:
Expand Down
47 changes: 46 additions & 1 deletion garak/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
import logging
import json


vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
def start_logging():
from garak import _config

Expand Down Expand Up @@ -92,6 +91,7 @@ def start_run():
list,
set,
type(None),
float, # Without float eval_threshold was not being stored
):
setup_dict[f"{subset}.{k}"] = v

Expand Down Expand Up @@ -255,3 +255,48 @@ def write_report_digest(report_filename, digest_filename):
digest = report_digest.compile_digest(report_filename)
with open(digest_filename, "w", encoding="utf-8") as f:
f.write(digest)

def detector_only_run():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmartin-tech What do you think this is telling us about where responsibility for orchestrating runs lies? Is the existing Harness interface just too inflexible to make invoking novel things like DetectorOnly from garak.cli?

import garak.harnesses.detectoronly
import garak.attempt
from garak import _config

config = _config.plugins.harnesses["DetectorOnly"]

with open(config["report_path"]) as f:
data = [json.loads(line) for line in f]

## Get detectors and evaluator from report if not specified by the user
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite what I was thinking in terms of obtaining the detectors from the original log. The actual extractions looks good as the start_run will contain the expanded detector list however it ignores existing top level arguments and the spec parsing support for options set on the harness.

The harness could accept the list of detectors provided via the parsed_spec for detectors from the command line and an evaluator as other harnesses do, if no detectors were provided then the list of detectors can be obtained based on config from the original report.

If I am reading this correctly, this is expecting detectors and eval_threshold to be set in the harness config and falling back if not found, this would not account for the top level command line options that as a user I would expect to be applied.

It looks like the current expectation would be a config like:

h_options.json:

{
  "detectoronly": {
    "Detectoronly": {
      "report_path": "<file_path>",
      "detectors" : [ "d1", "d2" ],
      "eval_threshold": 0.9
    }
  }
}

With a command line like:

python -m garak --harness_option_file h_options.json --m nim -n meta/llama3-70b-instruct

However based on the existing options a user may have expectations for -d all to apply all detectors when passed as an option.

h_options.json:

{
  "detectoronly": {
    "Detectoronly": {
      "report_path": "<file_path>"
    }
  }
}

With a command line like:

python -m garak --harness_option_file h_options.json --m nim -n meta/llama3-70b-instruct -d all --eval_threshold 0.5

My thought here is that the garak.command module should not need access to data from the cli but should be be provided information that takes advantage of cli having parsed all the setup options and it's support for things like expanding a detector classes based on a module name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I will add the config and then support only --detectors / -d as a top-level argument, and if that is not present, fall back to the ones present within the report. It makes more sense from a user perspective 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vidushiMaheshwari, circling back to check on progress.

I am happy to monitor this PR or offer parts of what I suggested as a PR to your branch in the coming weeks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for being inactive, just pushed the changes which I believe should suffice the comments. Would appreciate a PR with suggested changes!

if "detectors" not in config or "eval_threshold" not in config:
try:
for d in data:
if 'entry_type' in d and d['entry_type'] == 'start_run setup':
entry_line = d
break
except:
raise ValueError("Unexpected start_run setup line in report.jsonl")

if "detectors" not in config:
detectors = entry_line['plugins.detector_spec'].split(',')
# setattr(_config.plugins.harnesses.DetectorOnly, "detectors", detectors)
_config.plugins.harnesses["DetectorOnly"]["detectors"] = detectors

if "eval_threshold" not in config:
eval_threshold = _config.run.eval_threshold
if "run.eval_threshold" in entry_line:
eval_threshold = entry_line["run.eval_threshold"]
_config.plugins.harnesses["DetectorOnly"]["eval_threshold"] = eval_threshold

data = [d for d in data if d["entry_type"] == "attempt" and d["status"] == 1]
attempts = [garak.attempt.Attempt.from_dict(d) for d in data]

if len(attempts) == 0:
raise ValueError("No attempts found in report.jsonl")

detector_only_h = garak.harnesses.detectoronly.DetectorOnly()
config = _config.plugins.harnesses["DetectorOnly"]
evaluator = garak.evaluators.ThresholdEvaluator(config["eval_threshold"])

print(config["detectors"])

detector_only_h.run(attempts, config["detectors"], evaluator)
47 changes: 24 additions & 23 deletions garak/harnesses/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,28 +109,29 @@ def run(self, model, probes, detectors, evaluator, announce_probe=True) -> None:
assert isinstance(
attempt_results, (list, types.GeneratorType)
), "probing should always return an ordered iterable"

for d in detectors:
logging.debug("harness: run detector %s", d.detectorname)
attempt_iterator = tqdm.tqdm(attempt_results, leave=False)
detector_probe_name = d.detectorname.replace("garak.detectors.", "")
attempt_iterator.set_description("detectors." + detector_probe_name)
for attempt in attempt_iterator:
attempt.detector_results[detector_probe_name] = list(
d.detect(attempt)
)

for attempt in attempt_results:
attempt.status = garak.attempt.ATTEMPT_COMPLETE
_config.transient.reportfile.write(json.dumps(attempt.as_dict()) + "\n")

if len(attempt_results) == 0:
logging.warning(
"zero attempt results: probe %s, detector %s",
probe.probename,
detector_probe_name,
)
else:
evaluator.evaluate(attempt_results)
self.run_detectors(detectors, attempt_results, evaluator, probe)

logging.debug("harness: probe list iteration completed")

def run_detectors(self, detectors, attempt_results, evaluator, probe=None):
for d in detectors:
logging.debug("harness: run detector %s", d.detectorname)
attempt_iterator = tqdm.tqdm(attempt_results, leave=False)
detector_probe_name = d.detectorname.replace("garak.detectors.", "")
attempt_iterator.set_description("detectors." + detector_probe_name)
for attempt in attempt_iterator:
attempt.detector_results[detector_probe_name] = list(
d.detect(attempt)
)

for attempt in attempt_results:
attempt.status = garak.attempt.ATTEMPT_COMPLETE
_config.transient.reportfile.write(json.dumps(attempt.as_dict()) + "\n")

if len(attempt_results) == 0:
logging.warning(
"zero attempt results: probe %s",
probe.probename
)
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
else:
evaluator.evaluate(attempt_results)
44 changes: 44 additions & 0 deletions garak/harnesses/detectoronly.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# SPDX-FileCopyrightText: Portions Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""Detector only harness

Runs specified detectors on already existing prompt-response pairs from parsing a report.jsonl file.
"""

import logging

from garak import _config, _plugins
from garak.harnesses import Harness
from garak.detectors import Detector

class DetectorOnly(Harness):
def __init__(self, config_root=_config):
super().__init__(config_root)

def _load_detector(self, detector_name: str) -> Detector:
detector = _plugins.load_plugin(
"detectors." + detector_name, break_on_fail=False
)
if detector:
return detector
else:
print(f" detector load failed: {detector_name}, skipping >>")
logging.error(f" detector load failed: {detector_name}, skipping >>")
return False

def run(self, attempts, detector_names, evaluator):
detectors = []
for detector in sorted(detector_names):
d = self._load_detector(detector)
if d:
detectors.append(d)

if len(detectors) == 0:
msg = "No detectors, nothing to do"
logging.warning(msg)
if hasattr(_config.system, "verbose") and _config.system.verbose >= 2:
print(msg)
raise ValueError(msg)

super().run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this work?

Suggested change
super().run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.
self.run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.

1 change: 1 addition & 0 deletions garak/resources/garak.core.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ run:
eval_threshold: 0.5
generations: 10
probe_tags:
probed_report_path:

plugins:
model_type:
Expand Down
Loading