Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify whether fields can be added to groups/dsets with only data_type_inc #13

Closed
rly opened this issue Apr 21, 2021 · 5 comments
Closed
Assignees
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) topic: docs Issues related to documentation
Milestone

Comments

@rly
Copy link
Contributor

rly commented Apr 21, 2021

Raised in hdmf-dev/hdmf#542 in the context of how extra fields are treated.

It is not clear in the documentation whether the schema language allows new fields for groups/datasets defined with only a data_type_inc. We should clarify this in the documentation.

It is my understanding that it is allowed (though perhaps not recommended) for fields to be added to groups/datasets defined with only a data_type_inc.

For example: a new group may contain a dataset with:

- data_type_def: MyTable
  datasets:
  - name: new_column
    data_type_inc: VectorData
    attributes:
    - name: new_attr
      dtype: text
      doc: a new attribute

A concrete example is in the Units table in NWB: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.misc.yaml#L187-L199

The resolution attribute is added to the base VectorData definition.

Or a new group may be defined as:

- data_type_def: MyContainer
  groups:
  - name: new_group
    data_type_inc: Container
    groups:
    - name: new_subgroup
      data_type_inc: Container
      doc: a new subgroup
    # also add new datasets, new links, and new attributes

A related but not fully matching example is the 'electrodes' table in the NWBFile: https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.file.yaml#L269

The 'electrodes' table lacks a data_type_def but specifies many named VectorData types. However, this differs from the above because DynamicTable explicitly allows quantity: * of data_type_inc: VectorData without a name specified.

Note that a group/dataset defined with only a data_type_inc may have a different doc, quantity, dims, shape, and sometimes even dtype than in the data_type_def definition, as long as these are compatible with the included spec (e.g., a square is a special case of a rectangle). This should also be clarified in the documentation and fully supported in HDMF.

If adding fields to types with only data_type_inc is not allowed, then the NWB schema should be amended.
If the above is allowed, then the HDMF validator should be amended to validate against the spec with any additions to the included type.

@oruebel
Copy link

oruebel commented Apr 21, 2021

Effectively, this is the question of "monkey-patching", i.e., we have an existing type and add properties without changing the class type. If we ignore backward compatibility, then my vote would be to require that we create a new data_type_def in those cases, if only to make this case easier to handle. However, I assume this would mess with a bunch of existing schema, as such I think it may be best to: 1) Explicitly discourage this in the schema docs, 2) Add a test on the ndx-catalog that checks for this and raises a warning to encourage better schema, 2) continue support as is in the API.

@rly
Copy link
Contributor Author

rly commented Apr 21, 2021

Yes, removing those cases would mess with existing schema, so I think that plan of action makes sense. And even if we remove those cases from existing schema, I think we would have to continue to support them in order to be able to read and validate files with older schema.

To be clear, based on this, in the example where the Units table has a VectorData column for spike_times and we want to add an attribute to that column, we should create a new data type SpikeTimesColumn that inherits from VectorData and adds the attribute, and use that in the Units table instead, right?

What about the cases when fields are not added but properties of a data type are modified (restricted), e.g., VectorData allows any dtype and shape, but a new data type includes a VectorData and restricts dtype to int and shape to (None, )? To be clear, these cases should not be discouraged, right?

@oruebel
Copy link

oruebel commented Apr 21, 2021

Since we need to support this case anyways, I think this is ultimately then more like a best practice. I think the best practice should be:

  • If you add new elements (attributes, datasets etc.) to a type then we should encourage to create a new data_type_def
  • If we restrict an existing type (e.g., restrict dtype or shape) it should be Ok to do so without creating a new type, but it is still Ok to create a new type, e.g., you may want to attach more functionality to your new type then what the base type has.

@rly
Copy link
Contributor Author

rly commented Jan 5, 2023

TODO: Add warnings in HDMF extensions API when adding new elements to a group/dataset with only data_type_inc.

@rly rly self-assigned this Jan 5, 2023
@rly rly added the priority: low alternative solution already working and/or relevant to only specific user(s) label Jan 5, 2023
@rly rly added this to the 2.1.0 milestone Jan 5, 2023
@rly rly added the category: enhancement improvements of code or code behavior label Jan 5, 2023
@rly
Copy link
Contributor Author

rly commented Mar 12, 2024

Discussion should be continued in #14

@rly rly closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) topic: docs Issues related to documentation
Projects
None yet
Development

No branches or pull requests

2 participants