-
-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Documentation & Typos #1629
base: master
Are you sure you want to change the base?
Conversation
Thank you! ping me once you are ready for a review. |
@AntonioCarta Thanks for the quick reply! I have a question not directly related to documentation: In a codebase using a past avalanche version there is this import
What's the reason that only as_classification_dataset(
dataset,
transform_groups=init_transform_groups(
target_transform=lambda x: x, # or whatever
...
),
) This could also be achieved via a class method (the constructor seems a bit complex already), e.g. as_classification_dataset(
dataset,
transform_groups=TransformGroups.create(
target_transform=lambda x: x, # or whatever
...
),
) Kind regards 🙂 PS: Providing a |
Thank you for highlighting these issues. The methods changed over time because at first
Therefore, the goal of
I think it makes sense to add the arguments to
I like this less. Classification transform groups are different from other transformations and TransformGroups tries to be agnostic to these differences (as much as possible). Also, It would not solve the verbosity problem.
This is probably a mistake from the earlier API version. |
Pull Request Test Coverage Report for Build 8382229381Details
💛 - Coveralls |
btw, if you have other comments about usability/pain point in the API/documentation do let me know. Fixing these issues takes a lot of time so we can't do everything quickly, but we try to keep track of it and improve over time. |
…et, as_taskaware_classification_dataset
Hi @jneuendorf I see that the PR is still a draft but if there are no blocking issue I think we could merge it. Doc improvements can be easily split into multiple PRs. |
Hi, your Github-comment email got lost between the other ones from Github like So feel free to review/cherry-pick the changes and decide which ones are ok for you. 😉 |
EDITFollowing the hint at How to Create an AvalancheDataset, I got it working like this: from torch import Tensor
from avalanche.benchmarks.utils import _make_taskaware_tensor_classification_dataset
def generate_class(
label: int | float,
n_samples: int = 10,
) -> tuple[Tensor, Tensor]:
...
class0_x, class0_y = generate_class(label=0)
class1_x, class1_y = generate_class(label=1)
dataset = _make_taskaware_tensor_classification_dataset(
torch.concat([class0_x, class1_x]),
torch.concat([class0_y, class1_y]),
)
print('targets', dataset.targets)
# FlatData (len=20,subset=True,cat=False,cf=True)
# list (len=20) The statement
guided me into the right direction. However, it was not obvious
I think it would be helpful to extend the documentation with examples for creating datasets from typical sources:
I guess, some useful information can be found at Benchmarks, so, maybe it can be linked from "How to Create an AvalancheDataset". Depending on what use cases are reasonable and Avalanche is willing to implement, there could be different convenience utilities - this seems to be the current approach. Maybe all of the dataset/benchmark creation utilities can be boiled down to (1) "from tensors" (I admin, I haven't fully understood all the nomenclature of the different dataset and scenario types, and experience attributes, e.g. supervised vs. detection vs. supervised-detection or task labels vs. class labels). I currently have a problem with training a benchmark from a custom dataset: My custom dataset is a PyTorch class ClassificationMixin(torch.utils.data.Dataset, ISupportedClassificationDataset):
@cached_property
def targets(self):
return [] # whatever...
class TensorDataset(torch.utils.data.TensorDataset, ClassificationMixin):
... This way, it (and its modified splits) is compatible with benchmark = class_incremental_benchmark(
{
'train': as_classification_dataset(train_dataset),
'test': as_classification_dataset(test_dataset),
},
num_experiences=1,
) I get the following error:
For me, this is a hint that there is some protocol mismatch:
? |
# Conflicts: # avalanche/models/dynamic_optimizers.py # avalanche/training/templates/observation_type/batch_observation.py
yes, we need to update that notebook. Most of the time doing AvalancheDataset(your_data) should work. You don't need the old benchark builders (from tensors, from files,...) once you converted your data into an AvalancheDataset with appropriate attributes (needed to easily split it).
This is a missing feature. Basically, I implemented the datasets without task labels but many strategies still require them even when not strictly necessary. We have to fix this problem. Early version of datasets always had task labels, which was confusing for some people. |
Okay. Based on your answer and How to Create an AvalancheDataset I tried using import torch
from torch.utils.data.dataset import TensorDataset
from avalanche.benchmarks.utils import AvalancheDataset
# Create a dataset of 100 data points described by 22 features + 1 class label
x_data = torch.rand(100, 22)
y_data = torch.randint(0, 5, (100,))
# Create the Dataset
torch_data = TensorDataset(x_data, y_data)
avl_data = AvalancheDataset(
torch_data,
data_attributes=[
DataAttribute(y_data, "targets"),
DataAttribute(ConstantSequence(0, len(y_data)), "targets_task_labels", use_in_getitem=True),
],
) instead of # ...
avl_data = _make_taskaware_tensor_classification_dataset(x_data, y_data) This is "with appropriate attributes" but quite cumbersome and typing does not work because Suggestion 1: Dataset HierarchySo it would be really useful to add a dataset hierarchy to the documentation, i.e. the inheritance chain, along with the required data attributes (and more?). Suggestion 2: Dataset constructionWe should think about whether the structure of Suggestion 1 is ideal. If so, it would be consequent to force the user to use the dataset class for the according scenario, e.g. TaskAwareClassificationDataset(
# ...
task_labels=[1, 2, 3],
) As with the "old benchmark builders", I would try to hide internal complexity from the user, such as More advanced benchmark builders could still exist with correct typing (e.g. via |
Thanks for the feedback. The API for the benchmarks is quite cumbersome for a lot of reasons (changed over time, was supposed to be only an internal tool). We need a better API for attributes and transformations. If you have some proposals I would be happy to discuss it with you.
IMO the issue is that CL benchmarks are quite difficult and have an overloaded nomenclature. Many Avalanche users are new to CL so they don't have a complete understanding of the CL literature, and we assume they do. Avalanche models different benchmarks by adding attributes to datasets/experiences and splitting experiences appropriately. CL Benchmarks have 4 large properties: task-awareness, online, boundary-awareness, supervision. These are explained in the FZTH. There is no strict hierarchy in Avalanche. Other CL frameworks tried to provide a strict hierarchy (e.g. sequoia) but we don't do it. I would add other relevant examples to the FZTH notebook if you feel anything is missing or there needs to be more explanation.
The issue with that approach is that you are mixing the dataset creation with the benchmark creation and you need to make a lot of assumptions about the datasets. It is hard to generalize it to all possible settings (e.g. sequence classification, videos, graphs, ...). We also wanted to give the possibility of creating class-incremental benchmarks without task labels, which was confusing for many users. I agree with you that the new API is not as powerful yet, but I believe it's easier to provide high level methods to manage transforms/attributes than it is to provide truly general benchmark generators. |
FYI #1652 fixes this error. Now you should be able to use class-incremental benchmarks without task labels with strategies (as long as they don't need task labels. This is an example of the benchmark creation:
|
@AntonioCarta Thanks for your thorough reply. 😊 I will see, if the latest version works for me. Generally, I agree on your thoughts. 👍 And I must admit that this issue's discussion has gone out of scope. Since I am neither an CL nor Avalanche expert, most of my comments were questions. So I thought Slack would be a better place for discussion and tried the Slack Invite Link but it appears to be dead. In case you still use Slack for discussion, could you provide a new invite link? 🙏 One note (we could further discuss on Slack 😉):
I understand the difficulty/infeasibility of creating an abstraction for different benchmark scenarios. I meant the hierarchy of datasets only. Maybe adding (data)attributes like |
where did you take this link? The one on the website seems to be working. Anyway, I think it's better to discuss on github because slack only shows the last 90 days of messages. It's easier for us to track discussions here. I check them periodically and eventually close them if they don't lead to actionable items.
For scenarios we have Protocols (example). Again, keep in mind that a scenario may need multiple protocols (e.g. task-aware + boundary-aware). For datasets we would need some work to clean the hierarchy. We have some helpers (example) but I admit it's not clear for the user which one they should use. |
I tried both, the one directly from the sidebar and the one at the bottom of The People - also across different machines. But none have worked.
I tried to get a better picture of the structure and relations between scenarios, streams experiences, and their interfaces/protocols, ABCs and implementations. Here is what I came up with This is neither complete nor mirroring the actual code base ( Some things, I figured out
Remarks / Suggestions
Let me know what you think 😉 |
The CLExperience provides a bit of additional flexibility. We have some minimal support for reinforcement learning, which is a scenario where we don't have a dataset. We also have avalanche-rl, which was supported by an older avalanche version but it is currently unmantained. For example, methods like progressive neural networks should work with task-aware RL scenarios without any modifications.
Yes, but I would keep this as an internal API. Users should be using high-level methods, because I think most people would be very confused by mixins.
I tend to agree with this, but keep in mind that some protocols don't need to be propagated. For example, scenarios/experiences can be BoundaryAware, but the dataset is not BoundaryAware since this is a property of the stream, not the data (unlike task and class labels).
I'm not an expert on python typing but last time I checked I remember that there were some serious limitations (e.g. protocols runtime checks not checking attributes or full signature of functions). We also have the additional challenge of having to support older python versions (for example right now the CI fails due to typing import errors). |
…tocol), export BaseStrategyProtocol
Still PR is still in progress and not "clean for review" - just some things, I stumbled across. I opened it already, so you know there is some progress on certain aspects that others don't have to worry about 😉
Relates to #886