Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add best practices section #14

Open
rly opened this issue Apr 21, 2021 · 2 comments · May be fixed by #32
Open

Add best practices section #14

rly opened this issue Apr 21, 2021 · 2 comments · May be fixed by #32
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) topic: docs Issues related to documentation
Milestone

Comments

@rly
Copy link
Contributor

rly commented Apr 21, 2021

The schema language supports some flexibility in how data types are defined, and some methods are encouraged over others for clarity and consistency. These best practices should be added to the schema language documentation:

  1. Define new data types at the root of the schema rather than nested within another data type definition. Nested type definitions may in some cases lead to errors in HDMF. See Loading namespaces from file with nested extensions fails hdmf#511, Cannot extend group with subgroup/dataset that contains type definition hdmf#316, and nested type definitions hdmf#73
  2. Use the quantity key not in the data type definition but in the group/dataset spec where the type is included. When the data type is defined at the root of the schema (as opposed to nested), then in order to use the data type, a new group (subgroup) spec is defined where the quantity key is set to a value or if omitted, the default value of 1 would be used. This makes the quantity defined in the data type definition meaningless and confusing. See also Remove quantity from OptogeneticStimulusSite and ImagingPlane definitions NeurodataWithoutBorders/nwb-schema#472
  3. Use the name key not in the data type definition but iin the group/dataset spec where the type is included. Mismatch between the name defined on the data type definition and where it is included can lead to confusion in the expected behavior and may lead to errors in HDMF. See Mismatched fixed name and included name causes errors on read/validate hdmf#582
  4. Create a new data type when adding attributes/datasets/groups/links to an existing data type. See Clarify whether fields can be added to groups/dsets with only data_type_inc #13. -- Make this a rule (stop allowing new specs with this)
  5. Modifying the dtype, shape, or quantity of a data type when using data_type_inc should only restrict the values from their original definitions. For example, if type A has dtype: text and type B extends type A (data_type_def: B, data_type_inc: A), then type B should not redefine dtype to be int which is incompatible with the dtype of type A. Same thing if type A is included and a new type is not defined (just data_type_inc: A). In other words, all children types should be valid against the parent type. This is not yet checked in HDMF but see progress in Pass parent properties to extended dataset/attribute hdmf#321 .
  6. Non-scalar values for the value and default_value keys are not yet supported by the official APIs, so these are discouraged until support is added.
  7. Don'e allow spaces in any names. Don't allow creation of NWBGroups or NWBDatasets with spaces in the name NeurodataWithoutBorders/pynwb#1421

@bendichter @oruebel @ajtritt Can you think of other best practices to add? Do you agree with the above?

@rly
Copy link
Contributor Author

rly commented May 3, 2021

Question: How to validate against this case:

data_type_def: SpecialVectorData
data_type_inc: VectorData
dtype: text

data_type_def: DynamicTable
datasets:
- data_type_inc: VectorData  (validate against this as anonymous spec DynamicTable__VectorData)
  name: spike_times
  dtype: int
  shape: [None]

# question: should this be allowed?
# and what if SpecialVectorData does not specify a dtype?
MyTableBuilder
data type: DynamicTable
datasets:
- data type: SpecialVectorData
  name: spike_times

@oruebel oruebel added category: enhancement improvements of code or code behavior topic: docs Issues related to documentation labels Sep 16, 2021
@oruebel
Copy link

oruebel commented Sep 16, 2021

See also #13 (comment) for additional best practices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: enhancement improvements of code or code behavior priority: low alternative solution already working and/or relevant to only specific user(s) topic: docs Issues related to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants