Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/main'
Browse files Browse the repository at this point in the history
  • Loading branch information
dchourasia committed May 8, 2024
2 parents cabad97 + 74caf85 commit a130d1c
Show file tree
Hide file tree
Showing 15 changed files with 147 additions and 96 deletions.
24 changes: 16 additions & 8 deletions architecture_records/001-trainer-controller-framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,15 @@ We have implemented a trainer callback (see [here](https://huggingface.co/docs/t
The trainer controller configuration is structured as shown below. There are list of metric definitions under `controller-metrics`, a list of operations and their actions under `operations` and a list of controllers, each of which define the rules, triggers and control operations.
```
controller-metrics:
<controller-name>:
<controller-handler-class>:
- name: <controller-name>
class: <controller-handler-class>
arguments:
<arg1>: <value>
...
operations:
<operation-name>:
<operation-handler-class>:
- name: <operation-name>
class: <operation-handler-class>
arguments:
<arg1>: <value>
...
controllers:
Expand All @@ -79,11 +81,11 @@ controllers:
- <operation-action-1>
...
```
The `controller-metrics` and `operations` are optional. We provide a set of built-in `controller-metrics` and `operations` which could be referred to without actually defining them as. For example, the below configuration defines a `controller-metric` called `loss` which refers to a built-in `Loss` controller-metric class with custom arguments (in this case, no arguments), but does not define any `operations`. It only refers to a built-in operation.
The `controller-metrics` and `operations` are optional. We provide a set of built-in `controller-metrics` and `operations` which could be referred to without actually defining them as. For example, the below configuration defines a `controller-metric` called `loss` which refers to a built-in `Loss` controller-metric class with custom arguments (in this case, no arguments. If arguments are required, then they could be listed under a `arguments` section as shown above), but does not define any `operations`. It only refers to a built-in operation.
```
controller-metrics:
loss:
Loss:
name: loss
class: Loss
controllers:
- name: loss-controller
triggers:
Expand All @@ -92,6 +94,12 @@ controllers:
operations:
- hfcontrols.should_training_stop
```

We follow the below naming convention for the above trainer controller configuration:
1. `-` could be used in the case of key names, and name of the metric, operation or controller. This is usually to break multiple words of a name phrase.
1. Python convention for [class name](https://visualgit.readthedocs.io/en/latest/pages/naming_convention.html#classes).
1. `_` are used for events and control actions.

For defining custom handler classes, we have an interface defined as an abstract class as shown below, with two abstract methods, namely: `validate()` to define the validation conditions, and `compute()` to compute the metric. The `compute()` returns an `Any` type. While it could be any value, developers should keep in mind that it should be only key-value pairs that are used in the rule(s) defined in the configuration.

Further, the `init` method of the class should accept variable arguments in the form of key-value pairs. `Important point to note is that keys used in the arguments of the above config should not conflict with any keys used by Hugging face trainer callback. Please try to use unique keys are arguments name`.
Expand Down Expand Up @@ -126,4 +134,4 @@ Following is a high-level design diagram. Following are the touch-points to the
- **Configuration**: The trainer controller configuration supplies the definition for triggers, rule, operations and metrics to orchestrate the enactment of a particular control policy. These details are split up and passed off to the respective modules by the trainer controller as shown in the figure.

- **Events**: Events supply the state and arguments required for the metric handlers to perform metric computation at the events they are registered for. The framework callback lists out all event handlers with prefix `"on_"` and loads then as event handlers. Every metric declares one or more events from this list of valid handlers. These computed metric variables are stored in a global state of the trainer controller and independently picked up the operations which could potentially be triggered on an entirely different set of events. This decouples the control loop for metrics and operations. I.e. the metric could be computed on event A, while operation could be triggered on event B. The controller rules which use the metric variables from the trainer controller state are evaluated and based on the outcomes specified actions are performed.
![High-Level Design Diagram: Trainer Controller Framework](imgs/001-arch.png)
![High-Level Design Diagram: Trainer Controller Framework](imgs/001-arch.png)
4 changes: 2 additions & 2 deletions examples/trainercontroller_configs/loss.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
controllers:
- name: loss-controller
triggers:
Expand Down
4 changes: 2 additions & 2 deletions tests/data/trainercontroller/loss_custom_metric.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
testflag:
CustomMetric:
- name: testflag
class: CustomMetric
controllers:
- name: loss-controller-custom-metric
triggers:
Expand Down
8 changes: 4 additions & 4 deletions tests/data/trainercontroller/loss_custom_operation.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
operations:
customoperation:
CustomOperation:
- name: customoperation
class: CustomOperation
controllers:
- name: loss-controller-custom-operation
triggers:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
operations:
customoperation:
CustomOperationInvalidAction:
- name: customoperation
class: CustomOperationInvalidAction
controllers:
- name: loss-controller-custom-operation-invalid-action
triggers:
Expand Down
4 changes: 2 additions & 2 deletions tests/data/trainercontroller/loss_invalid_metric.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
loss:
MissingMetricClass:
- name: loss
class: MissingMetricClass
controllers:
- name: loss-controller-invalid-metric
triggers:
Expand Down
4 changes: 2 additions & 2 deletions tests/data/trainercontroller/loss_invalid_operation.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
controllers:
- name: loss-controller-invalid-operation
triggers:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
controllers:
- name: loss-controller-invalid-operation-action
triggers:
Expand Down
4 changes: 2 additions & 2 deletions tests/data/trainercontroller/loss_invalid_trigger.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
controllers:
- name: loss-controller-invalid-trigger
triggers:
Expand Down
4 changes: 2 additions & 2 deletions tests/data/trainercontroller/loss_on_threshold.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
controllers:
- name: loss-controller
triggers:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
controllers:
- name: loss-controller-wrong-input-rule
triggers:
Expand Down
4 changes: 2 additions & 2 deletions tests/data/trainercontroller/loss_with_malicious_os_rule.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
controller-metrics:
loss:
Loss:
- name: loss
class: Loss
controllers:
- name: loss-controller-wrong-os-rule
triggers:
Expand Down
Loading

0 comments on commit a130d1c

Please sign in to comment.