Feature/multilingual #943

SnowMasaya · 2024-10-09T00:40:33Z

Add multilingual support

Verification

[pass] Run the tests and ensure they pass python -m pytest tests/
[pass] Verify the thing does what it should: I add docs/source/translator.rst

Feature explain

You need an API key for the preferred service.

DeepL
NIM

DeepL

export DEEPL_API_KEY=xxxx

Config file

You can pass the translation service, source language, and target language as arguments.

translation_service: "nim" or "deepl", "local"
lang_spec: "ja", "ja,fr" etc. (you can set multiple language codes)

Note: The Helsinki-NLP/opus-mt-en-{lang} case uses different language formats. The language codes used to name models are inconsistent. Two-digit codes can usually be found here, while three-digit codes require a search like "language code {code}".

run:
  translation_service: {your chosen translation service "nim" or "deepl", "local"}
  lang_spec: {your chosen language code}

Examples for multilingual support

DeepL

To use the translation option for garak, run the following command:

export DEEPL_API_KEY=xxxx
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --translation_service deepl --lang_spec ja

If you save the config file as "garak/configs/simple_translate_config_deepl.yaml", use this command:

export DEEPL_API_KEY=xxxx
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config garak/configs/simple_translate_config_deepl.yaml

Example config file:

run:
  translation_service: "deepl"
  lang_spec: "ja"

NIM

For NIM, run the following command:

export NIM_API_KEY=xxxx
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --translation_service nim --lang_spec ja

If you save the config file as "garak/configs/simple_translate_config_nim.yaml", use this command:

export NIM_API_KEY=xxxx
python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config garak/configs/simple_translate_config_nim.yaml

Example config file:

run:
  translation_service: "nim"
  lang_spec: "ja"

Local

For local translation, use the following command:

python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --translation_service local --lang_spec ja

If you save the config file as "garak/configs/simple_translate_config_local.yaml", use this command:

python3 -m garak --model_type nim --model_name meta/llama-3.1-8b-instruct --probes encoding --config garak/configs/simple_translate_config_local.yaml

Example config file:

run:
  translation_service: local
  local_model_name: "facebook/m2m100_418M"
  local_tokenizer_name: "facebook/m2m100_418M"
  lang_spec: "ja"

Specific Hardware Examples:

Local Translation needs GPU

GPU related
- Specific support required cuda
- Minium GPU Memory:
  - Helsinki-NLP/opus-mt-en-{}: around 1500MB
  - facebook/m2m100_418M: around 8000MB
  - facebook/m2m100_1.2B: around 15000MB

github-actions · 2024-10-09T00:40:47Z

DCO Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by just posting a Pull Request Comment same as the below format.

I have read the DCO Document and I hereby sign the DCO

1 out of 2 committers have signed the DCO.
✅ (SnowMasaya)[https://github.com/SnowMasaya]
❌ @masayaOgushi
_{You can retrigger this bot by commenting recheck in this Pull Request}

- Integrated support for multiple translation services including local and external APIs. - local: Huggingface model uses for translation - deepl:DeepL uses for translation - nim: NIM uses for translation - Implemented utility functions for language detection and text processing. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

- Addd translation function for base probe class - prompts and triggers translate by base class method - attempt_descr translation Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

- Translation handling for detector keywords and substrings, triggers. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

- Added support for specifying translation services directly from the CLI. - Implemented options to set target languages for translation. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

- Added new dependencies required for enhanced translation features. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

- Added detailed explanations of the translationn method - Included examples of how translation services are configured and utilized within the codebase. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

SnowMasaya · 2024-10-09T04:55:52Z

I have read the DCO Document and I hereby sign the DCO

jmartin-tech

Partial review, testing is still in progress.

The test failures in macOS look to be an incomplete dependency requirement that may need to be reworked or removed. A default installation should not require install of an external library. Hence the dependency on pyenchant here may be problematic.

Traceback:
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/detectors/test_detectors_riskywords.py:8: in <module>
    import garak.detectors.base
garak/detectors/__init__.py:1: in <module>
    from .base import *
garak/detectors/base.py:17: in <module>
    from garak.translator import SimpleTranslator, LocalTranslator, is_english
garak/translator.py:15: in <module>
    import enchant
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/enchant/__init__.py:81: in <module>
    from enchant import _enchant as _e
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/enchant/_enchant.py:[157](https://github.com/leondz/garak/actions/runs/11246708414/job/31296143918?pr=943#step:5:158): in <module>
    raise ImportError(msg)
E   ImportError: The 'enchant' C library was not found and maybe needs to be installed.
E   See  https://pyenchant.github.io/pyenchant/install.html
E   for details

jmartin-tech · 2024-10-09T13:25:34Z

requirements.txt

These entires are new direct requirements and need to be above the # tests section in this file.

OK. I move it.

jmartin-tech · 2024-10-09T13:45:12Z

garak/probes/base.py

+        if hasattr(config_root, 'run'):
+            if hasattr(config_root.run, 'translation_service'):
+                translation_service = config_root.run.translation_service
+        if translation_service == "local":
+            if probename == "encoding":
+                self.translator = LocalEncodingTranslator(config_root)
+            elif probename == "goodside":
+                self.translator = LocalGoodsideTranslator(config_root)
+            elif probename == "dan":
+                self.translator = LocalDanTranslator(config_root)
+            else:
+                self.translator = LocalTranslator(config_root)
+        elif translation_service == "deepl" or translation_service == "nim":
+            if probename == "encoding":
+                self.translator = EncodingTranslator(config_root)
+            elif probename == "goodside":
+                self.translator = GoodsideTranslator(config_root)
+            elif probename == "dan":
+                self.translator = DanTranslator(config_root)
+            else:
+                self.translator = SimpleTranslator(config_root)
+
+        if hasattr(config_root, 'run'):
+            if hasattr(config_root.run, 'lang_spec'):
+                self.target_lang = config_root.run.lang_spec
+
+        if hasattr(self, 'triggers') and len(self.triggers) > 0:
+            if self.is_nested_list(self.triggers):
+                trigger_list = []
+                for trigger in self.triggers:
+                    trigger_words = self._translate(trigger)
+                    for word in trigger_words:
+                        trigger_list.append([word])
+                self.triggers = trigger_list 
+            else:
+                self.triggers = self._translate(self.triggers)


This makes assumptions that config_root will be the global _config object, this will not always be valid. The _translate() method below assumes self.translator will always exist.

Probes should not be responsible for creation of the translator object, since the service will be utilized by both probes and detectors. This may be important for impacts to local translation as It looks like the current implementation would instantiate and load the model object for each probe.

One option may be to have the translator package retain a singleton translator built from _config.run on first access request by passing the probe.bcp47 value and the target language to get a translator object.

OK. I separate translation responsibility form probes and detectors.
I add the check function self.translator exist or not each instance.
I will develop for singleton translator built from _config.run.

jmartin-tech · 2024-10-09T13:45:56Z

garak/probes/base.py

+            for prompt in prompts:
+                mean_word_judge = is_english(prompt)
+                if mean_word_judge:
+                    translate_prompt = self.translator._get_response(prompt, self.bcp47, lang)


There is currently no guarantee self.translator will exist. Probes are not required to call super().__init__(), while they should in most cases this needs a guard to ensure the attribute is set.

OK. I will set up the check function for self.translator

jmartin-tech · 2024-10-09T13:54:58Z

tests/detectors/test_detectors.py

+def detector():
+    _config.run.translation_service = 'local'
+    detector = Detector(_config)
+    detector.bcp47 = "en"
+    detector.target_lang = "jap"
+    return detector 


This will need to reset the _config via importlib.reload after the request completes.

OK. I use importlib.reload for reset the _config

jmartin-tech · 2024-10-09T14:02:25Z

tests/detectors/test_leakreplay.py

+
+
+def test_startswith_detect():
+    _config.run.translation_service = 'local'


Impacts to _config inside a test need to be reset after the test completes, this may be a case where a shared fixture could be created in a conftest.py here in the detectors path or added to the existing global conftest.py.

OK. I move set up _config part to conftest.py

jmartin-tech · 2024-10-09T14:09:08Z

tests/test_translator.py

+@pytest.fixture
+def config():
+    config = _config
+    return config


More _config impacts that need to be reset after test completion. Likely will benefit from a shared fixture.

I move the set up _config part to conftest.py

jmartin-tech · 2024-10-09T14:14:01Z

tests/test_translator.py

+@pytest.mark.parametrize("translator_class, lang_specs, model_name, tokenizer_name", [
+    (LocalTranslator, ["ja", "fr"], "facebook/m2m100_1.2B", "facebook/m2m100_1.2B"),
+    (LocalDanTranslator, ["ja", "fr"], "facebook/m2m100_1.2B", "facebook/m2m100_1.2B"),
+    (LocalEncodingTranslator, ["ja", "fr"], "facebook/m2m100_1.2B", "facebook/m2m100_1.2B"),
+    (LocalGoodsideTranslator, ["ja", "fr"], "facebook/m2m100_1.2B", "facebook/m2m100_1.2B"),
+    (LocalTranslator, ["jap", "fr"], "Helsinki-NLP/opus-mt-en-{}", "Helsinki-NLP/opus-mt-en-{}"),
+    (LocalDanTranslator, ["jap", "fr"], "Helsinki-NLP/opus-mt-en-{}", "Helsinki-NLP/opus-mt-en-{}"),
+    (LocalEncodingTranslator, ["jap", "fr"], "Helsinki-NLP/opus-mt-en-{}", "Helsinki-NLP/opus-mt-en-{}"),
+    (LocalGoodsideTranslator, ["jap", "fr"], "Helsinki-NLP/opus-mt-en-{}", "Helsinki-NLP/opus-mt-en-{}"),
+])
+def test_local_translate_multiple_languages(translator_class, lang_specs, model_name, tokenizer_name):
+    config = _config
+    input_text = "Hello, how are you?"
+    config.run.local_model_name = model_name 
+    config.run.local_tokenizer_name = tokenizer_name 
+
+    for lang_spec in lang_specs:
+        config.run.lang_spec = lang_spec
+        translator = translator_class(config_root=config)
+        translated_text = translator._get_response(input_text, 
+            source_lang="en", 
+            target_lang=lang_spec)
+        assert isinstance(translated_text, str)
+        assert translated_text != input_text


This test requires download of three models, facebook/m2m100_1.2B ~ 10GB, opus-mt-en-jap ~ 800MB and opus-mt-en-fr ~ 1GB

I will make the storage check function. If the environment does not have enough storage, those test skip.

jmartin-tech · 2024-10-09T14:34:54Z

garak/cli.py

+    parser.add_argument(
+       "--local_model_name",
+       type=str,
+       help="Model name",
+    )
+    parser.add_argument(
+       "--local_tokenizer_name",
+       type=str,
+       help="Tokenizer name",
+    )


Do these need to be exposed here? Should we accept a json option_file here to allow for a generic generator to be used here instead of restricting to HF models?

I agree with it. I think it will expand it for other eco-system such as ollama, ggml and so on.

jmartin-tech · 2024-10-10T18:43:28Z

tests/test_translator.py

+@pytest.mark.parametrize("translator_class, lang_specs, model_name, tokenizer_name", [
+    (LocalTranslator, ["ja", "fr"], "facebook/m2m100_1.2B", "facebook/m2m100_1.2B"),
+    (LocalDanTranslator, ["ja", "fr"], "facebook/m2m100_1.2B", "facebook/m2m100_1.2B"),
+    (LocalEncodingTranslator, ["ja", "fr"], "facebook/m2m100_1.2B", "facebook/m2m100_1.2B"),
+    (LocalGoodsideTranslator, ["ja", "fr"], "facebook/m2m100_1.2B", "facebook/m2m100_1.2B"),
+    (LocalTranslator, ["jap", "fr"], "Helsinki-NLP/opus-mt-en-{}", "Helsinki-NLP/opus-mt-en-{}"),
+    (LocalDanTranslator, ["jap", "fr"], "Helsinki-NLP/opus-mt-en-{}", "Helsinki-NLP/opus-mt-en-{}"),
+    (LocalEncodingTranslator, ["jap", "fr"], "Helsinki-NLP/opus-mt-en-{}", "Helsinki-NLP/opus-mt-en-{}"),
+    (LocalGoodsideTranslator, ["jap", "fr"], "Helsinki-NLP/opus-mt-en-{}", "Helsinki-NLP/opus-mt-en-{}"),
+])


Testing in github actions does not have access to GPU enabled systems, also this project cannot expect a GPU to be available in default environments.

I try to use CPU default value for self.device in the translator.py.

jmartin-tech · 2024-10-10T18:49:31Z

garak/translator.py

+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        lang_specs = getattr(config_root.run, 'lang_spec', "jap")  


cuda capability is not a default expectation for this project, while this does fall back to "cpu" there is likely work needed to enable this object on all supported platforms.

lang_spec by default based on this PR should just be "en" and should not load a model when no translation will ever occur. The translator services should be agnostic of any actual translation and source_lang to target_lang should just pass thru when both values are equal returning the original values.

cuda capability is not a default expectation for this project, while this does fall back to "cpu" there is likely work needed to enable this object on all supported platforms.

I try to change the default value "cpu". I think it can work if machine does not have GPU.

lang_spec by default based on this PR should just be "en" and should not load a model when no translation will ever occur.

OK. I change the default value of lang_spec. And if the lang_spec is "en", it will pass to load a model

source_lang to target_lang should just pass thru when both values are equal returning the original values.

OK. I add the check function.

update remove punctuation update english judge add translate function add logging translate result add Reverse translate for hf detector and snowball probes Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

check translator instance remove translate function reset config Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

check translator instance add reverse translator add test reverse translator Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

remove argument using generator_option_file Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

add load translator instance Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

check storage size set up each instance for each test Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

remove pyenchant Using nltk instead of pyenchant Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

update how to use translation function Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

…garak into feature/multilingual

Signed-off-by: SnowGushiGit <snow.akogi.pgel@gmail.com>

Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

…garak into feature/multilingual

leondz

Thank you, this is a significant amount of work.

This is for translation-based multilingual, rather than general multilingual support, so we should be careful to separate these two functions. The ability to select which languages garak uses is distinct from how that language is achieved (in prompts, detectors, and so on).

There are some refactorings needed to get the PR to a sustainable state. We may need a couple more hooks, either injected into base classes or added there. It might be simplest to add these hooks to the harness. Happy to set up a call or instant message chat to discuss this. Given the breadth of the changes, it is likely to be beneficial to discuss plans and get good alignment while doing the rest of the changes.

leondz · 2024-10-30T08:57:02Z

garak/detectors/base.py

+        if hasattr(config_root, 'plugins'):
+            if hasattr(config_root.plugins, 'generators'):
+                if "translation_service" in config_root.plugins.generators.keys():
+                    translation_service = config_root.plugins.generators["translation_service"]
+                    self.translator = _config.load_translator(translation_service=translation_service, 
+                                                        classname="")


Why does the base detector need to look for generators configurations?

I defined the translation instance on the _config.py.
It defined by the generators configurations.
I think your advice to add these hooks to the harness is the better.

leondz · 2024-10-30T08:57:23Z

garak/detectors/base.py

+        if hasattr(config_root, 'plugins'):
+            if hasattr(config_root.plugins, 'generators'):
+                if "translation_service" in config_root.plugins.generators.keys():
+                    translation_service = config_root.plugins.generators["translation_service"]
+                    self.reverse_translator = _config.load_translator(translation_service=translation_service, 
+                                                    classname="reverse")


this should be factored out

I will move this part to harness.

leondz · 2024-10-30T08:58:51Z

garak/detectors/base.py

+        if hasattr(self, 'reverse_translator'):
+            if self.reverse_translator is not None:
+                non_none_outputs = self.reverse_translator.translate_prompts(non_none_outputs)


Can we handle this in the harness rather than the detector? It's an orchestration issue

OK. I try it.

leondz · 2024-10-30T08:59:27Z

garak/detectors/base.py

+        if hasattr(self, 'translator'):
+            if self.translator is not None:
+                self.substrings = self.translator.translate_prompts(self.substrings)


can we handle this in the harness?

OK. I try it.

leondz · 2024-10-30T09:00:21Z

garak/detectors/leakreplay.py

+            if hasattr(self, 'translator'):
+                if self.translator is not None:
+                    triggers = self.translator.translate_prompts(triggers)


this piece of code appears a lot. we should really avoid this, and use/add/inject a hook in the base class

Sorry I don't understand how to implement it.
Could you share with me your example code?

leondz · 2024-10-30T09:28:40Z

garak/detectors/snowball.py

consider factoring up to harness

leondz · 2024-10-30T09:28:50Z

garak/detectors/misleading.py

consider factoring up to harness

leondz · 2024-10-30T09:29:38Z

garak/detectors/base.py

@@ -14,6 +14,7 @@
 from garak.configurable import Configurable
 from garak.generators.huggingface import HFCompatible
 import garak.attempt
+from garak.translator import SimpleTranslator, LocalTranslator, is_english


factoring this up to the harness keeps base class module-level imports light, which they must be

SimpleTranslator, LocalTranslator does not need to import into the garak/detectors/base.py.

I remove it.

leondz · 2024-10-30T09:32:11Z

docs/source/translator.rst

+DeepL
+~~~~~
+
+.. code-block:: bash
+
+    export DEEPL_API_KEY=xxxx
+
+NIM
+~~~
+
+.. code-block:: bash
+
+    export NIM_API_KEY=xxxx
+


this is non-standard - we should follow the same pattern as in garak.generators.base.Generator

Sorry. Please share the example code how to set up the API_KEY value.

leondz · 2024-10-30T09:33:57Z

docs/source/translator.rst

+- translation_service: "nim" or "deepl", "local"
+- lang_spec: "ja", "ja,fr" etc. (you can set multiple language codes)


where in the config does this go?

recommend something like:

run: language: translation: translation_service: nim api_key: xxx lang_from: en lang_to: ja,fr

It sets up value by the generator_option_file.
It needs set up the generator json file

{ "lang_spec": "ja", "translation_service": "local", "local_model_name": "facebook/m2m100_418M", "local_tokenizer_name": "facebook/m2m100_418M" }

I follow this advice

Feature/multilingual #943 (comment)

jmartin-tech

@SnowMasaya thank you, this is a significant benefit for the project.

Adding a number of my pending comments here, please be aware these are likely similar to many @leondz made, although possibly from different reasoning. Happy to continue iterating.

jmartin-tech · 2024-10-23T13:17:31Z

garak/detectors/base.py

+        if hasattr(config_root, 'plugins'):
+            if hasattr(config_root.plugins, 'generators'):
+                if "translation_service" in config_root.plugins.generators.keys():
+                    translation_service = config_root.plugins.generators["translation_service"]
+                    self.translator = _config.load_translator(translation_service=translation_service, 
+                                                        classname="")
+        if hasattr(self, 'substrings'):
+            if hasattr(self, 'translator'):
+                if self.translator is not None:
+                    self.substrings = self.translator.translate_prompts(self.substrings)
+
+        if hasattr(config_root, 'plugins'):
+            if hasattr(config_root.plugins, 'generators'):
+                if "translation_service" in config_root.plugins.generators.keys():
+                    translation_service = config_root.plugins.generators["translation_service"]
+                    self.reverse_translator = _config.load_translator(translation_service=translation_service, 
+                                                    classname="reverse")
+


As noted before base.Detector should not access _config directly.

Additionally the location where language configuration is being placed in the config file structure is inconsistent with intended use, config_root.plugins.generators is where plugins of type Generator reference to obtain injected properties used for the target application under test and should not be mixed with language selection for the run.

Do I follow the following code example?

Feature/multilingual #943 (comment)

jmartin-tech · 2024-10-23T13:22:44Z

docs/source/translator.rst

+The translator config writes to a file and the path passed, with 
+`--generator_option_file` as JSON. An example 


--generator_option_file is for the target generator under test and should not be used to pass run specific data.

I follow this advice.

#943 (comment)

But I think misunderstanding it...
Could you tell me the best way to set up config value?

jmartin-tech · 2024-10-23T13:42:52Z

garak/detectors/base.py

+        if hasattr(config_root, 'plugins'):
+            if hasattr(config_root.plugins, 'generators'):
+                if "translation_service" in config_root.plugins.generators.keys():
+                    translation_service = config_root.plugins.generators["translation_service"]
+                    self.reverse_translator = _config.load_translator(translation_service=translation_service, 
+                                                    classname="reverse")
+


Detectors should not access _config directly. The detectors also should not load the translator at instantiation. The detector should be able to request a compatible translator from the a service that owns the responsibility for handling translation.

Consider:

def detector_translator(self): from garak import translator translator.getInstance(self)

and in translator:

@staticmethod def getInstance(klass): service = _translator_instance.gel(klass, None) if service is None: service = load_translator(klass) _translator_instance[klass] = service return service

Thanks. I try it.

jmartin-tech · 2024-10-23T13:51:32Z

garak/probes/atkgen.py

+                    if hasattr(self, 'translator'):
+                        if self.translator is not None:
+                            challenges = self.translator.translate_prompts([challenge])
+                    else:
+                        challenges = [challenge]


Why does this need to create a list and then iterate inside the turn?

I suspect this is misplaced, translation in this probe should only need to happen for the initial challenge in the probe and all generators used to perform red_team actions by manipulating on that prompt using a model in the target language.

I got same review comment.

Feature/multilingual #943 (comment)

I will recheck code and considering which part is the better place to implement translation function.

If you have any recommend way to implement it, please let me know.

jmartin-tech · 2024-10-23T13:52:42Z

garak/probes/atkgen.py

+        if hasattr(config_root, 'run'):
+            if hasattr(config_root.run, 'translation_service'):
+                translation_service = config_root.run.translation_service
+                class_name = self.probename.split(".")[-2]
+                self.translator = _config.load_translator(translation_service=translation_service, 
+                                                        classname=class_name)


Same comment as detectors, the probe plugins should not own translator configuration logic.

jmartin-tech · 2024-10-23T17:38:52Z

tests/probes/test_probes_encoding.py

+from garak.probes.encoding import BaseEncodingProbe, InjectBase16, InjectBase32, InjectAscii85, \
+                                        InjectHex, InjectQP, InjectUU, InjectMime, \
+                                        InjectROT13, InjectBase2048, InjectBraille, \
+                                        InjectMorse, InjectNato, InjectEcoji, InjectZalgo
+from garak import _config, _plugins
+from garak.translator import is_english
+import pytest
+import garak
+import importlib


Limit imports, the encoding classes here are not need as they are enumerated via the PROBES constant.

Sure. I will rewrite it.

jmartin-tech · 2024-10-23T17:40:10Z

tests/probes/test_probes_encoding.py

@@ -12,6 +23,7 @@ def test_InjectBase64_len_cap():
    assert len(p.prompts) < num_payloads * num_templates * num_encoders


+@pytest.fixture(scope="function")


What is this doing? Tests should not be annotated as fixtures.

Sorry. This test does not need it. I remove it this part.

jmartin-tech · 2024-10-23T17:40:44Z

tests/probes/test_probes_encoding.py

@@ -20,6 +32,7 @@ def test_InjectBase64_prompt_trigger_match():
    assert len(p.prompts) == len(p.triggers)


+@pytest.fixture(scope="function")


What is this doing? Tests should not be annotated as fixtures.

Sorry. This part does not need it. I remove it this part.

jmartin-tech · 2024-10-23T17:53:13Z

tests/test_translator.py

+    _config.plugins.generators["translation_service"] = 'local'
+    _config.plugins.generators["local_model_name"] = model_name 
+    _config.plugins.generators["local_tokenizer_name"] = tokenizer_name 


As noted lang_spec configuration should not be in _config.plugins.generators. These values will come from config however they should not be a key under generators as the structure of plugins.<plugin_type> is meant to encapsulate shared configuration of each plugin of type in this case Generator.

Possible suggestion:

_config.run.lang_spec = "ja" _config.run.translation_service = { "type": "local", "model_name", "model_config" }

Also is there a reason for local_model_name & local_tokenizer_name? They look to be equal in all supplied tests, this suggest to me there should is not a need for specification of each independently. If we need to expand on the tokenizer configuration it should likely go inside a model_config object that can supply parameters to a Configurable object.

Prefer to place this in a dedicated sub-key of _config.run, perhaps one related to multilingual config using translate (suggestion above #943 (comment))

I follow this advice. I set up value by the generator_option_file.
I think I misunderstand about it.

#943 (comment)

I try to use _config.run set up each propriety.

Also is there a reason for local_model_name & local_tokenizer_name?

Most hugging face model uses same name tokenizer and model names.
I think model_config would be better.

jmartin-tech · 2024-10-24T13:30:01Z

garak/_config.py

Do not mix configuration and runtime service creation in the same class, the translator package can consume the _config however the _config should not instantiate any object.

leondz · 2024-10-30T13:35:57Z

Adding a number of my pending comments here, please be aware these are likely similar to many @leondz made, although possibly from different reasoning. Happy to continue iterating.

Indeed, sorry for the dupe review, there was quite a lot of code so I reviewed directly instead of processing @jmartin-tech 's comments as well

add mean judge for reverse translation change translation model size Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

translate trigger words Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

add reverse translation remove trigger translation fix test code Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

add translation instance Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

add library for reverse translation Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

SnowMasaya · 2024-10-31T00:38:34Z

Thank you, this is a significant amount of work.

This is for translation-based multilingual, rather than general multilingual support, so we should be careful to separate these two functions. The ability to select which languages garak uses is distinct from how that language is achieved (in prompts, detectors, and so on).

There are some refactorings needed to get the PR to a sustainable state. We may need a couple more hooks, either injected into base classes or added there. It might be simplest to add these hooks to the harness. Happy to set up a call or instant message chat to discuss this. Given the breadth of the changes, it is likely to be beneficial to discuss plans and get good alignment while doing the rest of the changes.

Thank you for review.
I would like to set up a call for understanding the best way to implement this function.
Please stop the review the following commit because it has not apply your review.
The following commit for my coworker request.

masayaOgushi added 6 commits October 9, 2024 10:23

Add Feature: probes add translation function

f94bb2e

- Addd translation function for base probe class - prompts and triggers translate by base class method - attempt_descr translation Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Add Feature: detector add translation capabilities

2238d18

- Translation handling for detector keywords and substrings, triggers. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Add Feature: Enhance command-line interface with new translation options

7202e19

- Added support for specifying translation services directly from the CLI. - Implemented options to set target languages for translation. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

chore: Update dependencies in requirements.txt, pyproject.toml

1105bb1

- Added new dependencies required for enhanced translation features. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

docs: Add translation documentation

6bb7da3

- Added detailed explanations of the translationn method - Included examples of how translation services are configured and utilized within the codebase. Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

SnowMasaya force-pushed the feature/multilingual branch from b781a6f to 6bb7da3 Compare October 9, 2024 01:41

Merge branch 'leondz:main' into feature/multilingual

717f0ff

jmartin-tech requested changes Oct 10, 2024

View reviewed changes

masayaOgushi added 9 commits October 23, 2024 13:50

Update Feature: Translator

b35cc1e

update remove punctuation update english judge add translate function add logging translate result add Reverse translate for hf detector and snowball probes Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Update Feature: Probes

bbb6c76

check translator instance remove translate function reset config Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Update Feature: Detectors

51baeb2

check translator instance add reverse translator add test reverse translator Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Update Feature: cli

dc3a4ab

remove argument using generator_option_file Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Update Feature: config

ee82261

add load translator instance Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Update Feature: conftest

7cb8acc

check storage size set up each instance for each test Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Remove: library

ec9b40a

remove pyenchant Using nltk instead of pyenchant Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Update Doc

d50d19e

update how to use translation function Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Merge branch 'feature/multilingual' of https://github.com/SnowMasaya/…

808f34a

…garak into feature/multilingual

github-actions bot added a commit that referenced this pull request Oct 23, 2024

@SnowMasaya has signed the CLA in #943

03356ed

SnowMasaya and others added 3 commits October 23, 2024 14:13

Merge branch 'main' into feature/multilingual

8a41c95

Signed-off-by: SnowGushiGit <snow.akogi.pgel@gmail.com>

Fix test

2fc2dd5

Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Merge branch 'feature/multilingual' of https://github.com/SnowMasaya/…

8283b65

…garak into feature/multilingual

leondz requested changes Oct 30, 2024

View reviewed changes

jmartin-tech reviewed Oct 30, 2024

View reviewed changes

masayaOgushi added 3 commits October 31, 2024 09:21

Update Feature Translation

395840d

add mean judge for reverse translation change translation model size Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Add Feature Probes

73363f9

translate trigger words Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Update Feature Detectors

57d14e5

add reverse translation remove trigger translation fix test code Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

masayaOgushi added 2 commits October 31, 2024 09:23

Update test

3b3b60a

add translation instance Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>

Add library

bae54d7

add library for reverse translation Signed-off-by: Masaya Ogushi <mogushi@nvidia.com>



		def test_startswith_detect():
		_config.run.translation_service = 'local'

		self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
		lang_specs = getattr(config_root.run, 'lang_spec', "jap")

		- translation_service: "nim" or "deepl", "local"
		- lang_spec: "ja", "ja,fr" etc. (you can set multiple language codes)

		The translator config writes to a file and the path passed, with
		`--generator_option_file` as JSON. An example

		@@ -12,6 +23,7 @@ def test_InjectBase64_len_cap():
		assert len(p.prompts) < num_payloads * num_templates * num_encoders


		@pytest.fixture(scope="function")

		@@ -20,6 +32,7 @@ def test_InjectBase64_prompt_trigger_match():
		assert len(p.prompts) == len(p.triggers)


		@pytest.fixture(scope="function")

Feature/multilingual #943

Are you sure you want to change the base?

Feature/multilingual #943

Conversation

SnowMasaya commented Oct 9, 2024

Verification

Feature explain

DeepL

Config file

Examples for multilingual support

DeepL

NIM

Local

github-actions bot commented Oct 9, 2024 • edited Loading

SnowMasaya commented Oct 9, 2024 • edited Loading

jmartin-tech left a comment

Choose a reason for hiding this comment

jmartin-tech Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmartin-tech Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

SnowMasaya Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

jmartin-tech Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leondz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SnowMasaya Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

jmartin-tech left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 9, 2024 •

edited

Loading

SnowMasaya commented Oct 9, 2024 •

edited

Loading

jmartin-tech Oct 9, 2024 •

edited

Loading

jmartin-tech Oct 9, 2024 •

edited

Loading

SnowMasaya Oct 17, 2024 •

edited

Loading

jmartin-tech Oct 9, 2024 •

edited

Loading

SnowMasaya Oct 31, 2024 •

edited

Loading

SnowMasaya Oct 31, 2024 •

edited

Loading

SnowMasaya commented Oct 31, 2024 •

edited

Loading