Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harness: Detector only #833

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d9e7f28
detectoronly harness
vidushiMaheshwari Aug 15, 2024
c8b7e77
Update attempt.py
vidushiMaheshwari Aug 15, 2024
13c89fe
change
vidushiMaheshwari Aug 15, 2024
b87068f
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 15, 2024
df107c2
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
d9d43a6
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
662fbf0
100_pass_mod
vidushiMaheshwari Aug 15, 2024
15c1097
change
vidushiMaheshwari Aug 15, 2024
3f8e263
Update attempt.py
vidushiMaheshwari Aug 15, 2024
4714757
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
9f63ab1
Update garak.core.yaml
vidushiMaheshwari Aug 15, 2024
1b5aa46
100_pass_mod
vidushiMaheshwari Aug 15, 2024
239cfc8
docs
vidushiMaheshwari Aug 19, 2024
65c0c36
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 19, 2024
e9eb742
harness config options and files
vidushiMaheshwari Aug 19, 2024
a35cb5a
Probewise harness is a dictionary instead of attributed class
vidushiMaheshwari Aug 20, 2024
eab5c67
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
c3d33d4
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
153ff2e
Update garak/harnesses/base.py
vidushiMaheshwari Aug 22, 2024
5086e80
Update garak/cli.py
vidushiMaheshwari Aug 22, 2024
6526194
Merge branch 'main' of https://github.com/vidushiMaheshwari/garak int…
vidushiMaheshwari Aug 23, 2024
a7b3e1f
Merge branch 'detector-only-run' of https://github.com/vidushiMaheshw…
vidushiMaheshwari Aug 23, 2024
dbe916a
decouple harness only run from execution
vidushiMaheshwari Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/source/garak.harnesses.detectoronly.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
garak.harnesses.detectoronly
====================

.. automodule:: garak.harnesses.detectoronly
:members:
:undoc-members:
:show-inheritance:

1 change: 1 addition & 0 deletions docs/source/harnesses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ garak.harnesses

garak.harnesses
garak.harnesses.base
garak.harnesses.detectoronly
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
garak.harnesses.probewise
garak.harnesses.pxd
2 changes: 1 addition & 1 deletion garak/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
system_params = (
"verbose narrow_output parallel_requests parallel_attempts skip_unknown".split()
)
run_params = "seed deprefix eval_threshold generations probe_tags interactive".split()
run_params = "seed deprefix eval_threshold generations probe_tags interactive probed_report_path".split()
plugins_params = "model_type model_name extended_detectors".split()
reporting_params = "taxonomy report_prefix".split()
project_dir_name = "garak"
Expand Down
18 changes: 18 additions & 0 deletions garak/attempt.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,24 @@ def as_dict(self) -> dict:
"messages": self.messages,
}

@classmethod
def from_dict(cls, dicti):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- out of scope for here, but we should implement serialization/deserialization for Attempts

"""Initializes an attempt object from dictionary"""
attempt_obj = cls()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this skip the attempt constructor? Can we add an explicit type signature to signal what cls is expected to be?

Copy link
Collaborator

@jmartin-tech jmartin-tech Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cls is the callable for the class which will be an Attempt. This will call the __init__() method with all defaults.

Due to the current overrides in the class attempt_obj.outputs below may not produce the same in memory object for a multi-turn conversation attempt since the existing as_dict() method serialized outputs into the log and not the full messages history.

For the purposes of this PR I suspect this is acceptable, however it is worth noting.

attempt_obj.uuid = dicti['uuid']
attempt_obj.seq = dicti['seq']
attempt_obj.status = dicti['status']
attempt_obj.probe_classname = dicti['probe_classname']
attempt_obj.probe_params = dicti['probe_params']
attempt_obj.targets = dicti['targets']
attempt_obj.prompt = dicti['prompt']
attempt_obj.outputs = dicti['outputs']
attempt_obj.detector_results = dicti['detector_results']
attempt_obj.notes = dicti['notes']
attempt_obj.goal = dicti['goal']
attempt_obj.messages = dicti['messages']
return attempt_obj

def __getattribute__(self, name: str) -> Any:
"""override prompt and outputs access to take from history"""
if name == "prompt":
Expand Down
37 changes: 36 additions & 1 deletion garak/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

"""Flow for invoking garak from the command line"""

command_options = "list_detectors list_probes list_generators list_buffs list_config plugin_info interactive report version".split()
command_options = "list_detectors list_probes list_generators list_buffs list_config plugin_info interactive report version detector_only".split()


def main(arguments=None) -> None:
Expand Down Expand Up @@ -107,6 +107,12 @@ def main(arguments=None) -> None:
parser.add_argument(
"--config", type=str, default=None, help="YAML config file for this run"
)
parser.add_argument(
"--probed_report_path",
type=str,
default=None,
help="Path to jsonl report that stores the generators responses"
)

## PLUGINS
# generator
Expand Down Expand Up @@ -247,6 +253,11 @@ def main(arguments=None) -> None:
action="store_true",
help="Launch garak in interactive.py mode",
)
parser.add_argument(
"--detector_only",
action="store_true",
help="run detector on jsonl report"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this might shift to a harness type options to mimic generator_options and probe_options?

--harness_options for inline json
--harness_options_file that could take a json config file

Some validation may be need on the object received to ensure options provided are for a valid harness type and meet the requirements for launching the harness.

This would then remove the need to also add --probed_report_path as that is currently only used when this option is set and json or file config aligns with other plugins.

{ 
  "DetectorOnly":
  {
    "report_path": "file.report.jsonl"
  }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure if continue or rescore has been implemented yet (or maybe in some other branch?). But I agree with creating harness_options instead of exposing a lot of unnecessary higher-level options. I have incorporated the idea of harness_options in the new changes.


logging.debug("args - raw argument string received: %s", arguments)

Expand Down Expand Up @@ -512,6 +523,30 @@ def main(arguments=None) -> None:
)

command.end_run()

elif args.detector_only:
# Run detector only detection
if not _config.plugins.detector_spec:
logging.error("Detector(s) not specified. Use --detectors")
raise ValueError("use --detectors to specify some detectors")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default the detectors to use should probably be extracted from the start_run setup entry in the provided report file with the command line option being an override to allow reprocessing results against a different detector.


if not _config.run.probed_report_path:
logging.error("report path not specified")
raise ValueError("Specify jsonl report path using --probed_report_path")

evaluator = garak.evaluators.ThresholdEvaluator(_config.run.eval_threshold)
print(_config.plugins.detector_spec.split(","))

detector_names, detector_rejected = _config.parse_plugin_spec(
getattr(_config.plugins, "detector_spec", ""),
"detectors",
getattr(_config.run, "detector_tags", "")
)

command.start_run()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the refactor for harness selection offered is not used, this needs to be removed as start_run() was called before entering this conditional.

Suggested change
command.start_run()

command.detector_only_run(_config.run.probed_report_path, detector_names, evaluator)
command.end_run()

else:
print("nothing to do 🤷 try --help")
if _config.plugins.model_name and not _config.plugins.model_type:
Expand Down
14 changes: 13 additions & 1 deletion garak/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
import logging
import json


vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
def start_logging():
from garak import _config

Expand Down Expand Up @@ -255,3 +254,16 @@ def write_report_digest(report_filename, digest_filename):
digest = report_digest.compile_digest(report_filename)
with open(digest_filename, "w", encoding="utf-8") as f:
f.write(digest)

def detector_only_run(report_filename, detectors, evaluator):
import garak.harnesses.detectoronly
import garak.attempt

with open(report_filename) as f:
data = [json.loads(line) for line in f]

data = [d for d in data if d["entry_type"] == "attempt" and d["status"] == 1]
attempts = [garak.attempt.Attempt.from_dict(d) for d in data]

detector_only_h = garak.harnesses.detectoronly.DetectorOnly()
detector_only_h.run(attempts, detectors, evaluator)
47 changes: 24 additions & 23 deletions garak/harnesses/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,28 +109,29 @@ def run(self, model, probes, detectors, evaluator, announce_probe=True) -> None:
assert isinstance(
attempt_results, (list, types.GeneratorType)
), "probing should always return an ordered iterable"

for d in detectors:
logging.debug("harness: run detector %s", d.detectorname)
attempt_iterator = tqdm.tqdm(attempt_results, leave=False)
detector_probe_name = d.detectorname.replace("garak.detectors.", "")
attempt_iterator.set_description("detectors." + detector_probe_name)
for attempt in attempt_iterator:
attempt.detector_results[detector_probe_name] = list(
d.detect(attempt)
)

for attempt in attempt_results:
attempt.status = garak.attempt.ATTEMPT_COMPLETE
_config.transient.reportfile.write(json.dumps(attempt.as_dict()) + "\n")

if len(attempt_results) == 0:
logging.warning(
"zero attempt results: probe %s, detector %s",
probe.probename,
detector_probe_name,
)
else:
evaluator.evaluate(attempt_results)
self.run_detectors(detectors, attempt_results, evaluator, probe)

logging.debug("harness: probe list iteration completed")

def run_detectors(self, detectors, attempt_results, evaluator, probe=None):
for d in detectors:
logging.debug("harness: run detector %s", d.detectorname)
attempt_iterator = tqdm.tqdm(attempt_results, leave=False)
detector_probe_name = d.detectorname.replace("garak.detectors.", "")
attempt_iterator.set_description("detectors." + detector_probe_name)
for attempt in attempt_iterator:
attempt.detector_results[detector_probe_name] = list(
d.detect(attempt)
)

for attempt in attempt_results:
attempt.status = garak.attempt.ATTEMPT_COMPLETE
_config.transient.reportfile.write(json.dumps(attempt.as_dict()) + "\n")

if len(attempt_results) == 0:
logging.warning(
"zero attempt results: probe %s",
probe.probename
)
vidushiMaheshwari marked this conversation as resolved.
Show resolved Hide resolved
else:
evaluator.evaluate(attempt_results)
44 changes: 44 additions & 0 deletions garak/harnesses/detectoronly.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# SPDX-FileCopyrightText: Portions Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

"""Detector only harness

Runs specified detectors on already existing prompt-response pairs from parsing a report.jsonl file.
"""

import logging

from garak import _config, _plugins
from garak.harnesses import Harness
from garak.detectors import Detector

class DetectorOnly(Harness):
def __init__(self, config_root=_config):
super().__init__(config_root)

def _load_detector(self, detector_name: str) -> Detector:
detector = _plugins.load_plugin(
detector_name, break_on_fail=False
)
if detector:
return detector
else:
print(f" detector load failed: {detector_name}, skipping >>")
logging.error(f" detector load failed: {detector_name}, skipping >>")
return False

def run(self, attempts, detector_names, evaluator):
detectors = []
for detector in sorted(detector_names):
d = self._load_detector(detector)
if d:
detectors.append(d)

if len(detectors) == 0:
msg = "No detectors, nothing to do"
logging.warning(msg)
if hasattr(_config.system, "verbose") and _config.system.verbose >= 2:
print(msg)
raise ValueError(msg)

super().run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this work?

Suggested change
super().run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.
self.run_detectors(detectors, attempts, evaluator) # The probe is None, but hopefully no errors occur with probe.

1 change: 1 addition & 0 deletions garak/resources/garak.core.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ run:
eval_threshold: 0.5
generations: 10
probe_tags:
probed_report_path:

plugins:
model_type:
Expand Down
Loading