Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN/DEV: refactor codebase #122

Open
NickleDave opened this issue May 2, 2021 · 3 comments
Open

CLN/DEV: refactor codebase #122

NickleDave opened this issue May 2, 2021 · 3 comments
Assignees
Labels
CLN: clean/refactor code PEP8 fixes, refactors DEV: development release a version, change dependencies, etc.

Comments

@NickleDave
Copy link
Collaborator

starting this issue where I will make notes on how to possible refactor hvc codebase so it is less spaghetti code. May break some comments out into separate issues

@NickleDave NickleDave added DEV: development release a version, change dependencies, etc. CLN: clean/refactor code PEP8 fixes, refactors labels May 2, 2021
@NickleDave NickleDave self-assigned this May 2, 2021
@NickleDave
Copy link
Collaborator Author

NickleDave commented May 2, 2021

  • factor extract out of features
    • would make sense to have features as a namespace, with a matching entry point so that user-defined features could be installed as plug-ins
    • and move code that is currently in the features sub-package into a separate extract sub-package
      • with a functional extract module
      • and the FeatureExtractor as a separate class that just uses functions from the functional module for its methods

@NickleDave
Copy link
Collaborator Author

NickleDave commented May 2, 2021

There should be a config sub-package with the current parseconfig module inside of it.
Currently there's a parse sub-package -- naming not quite explicit enough.

edit: would prefer a sub-package named config with a module named parse having a function parse_yaml_config

A lot of the logic around configuration parsing could be moved into this module so that other functions just do straight iteration over some sort of attrs class / dataclass instance, e.g. a class called Config with an attribute todolist that is literally a list of instances called TodoItem or something, each item having spect_params, seg_params etc.
If an item in the .yml file declares its own spect_params and those need to take precedence over top-level spect params, that can be determined during config parsing to produce the TodoList class as expected, instead of what currently happens where each function internally determines whether there are spect_params for the current item in the to-do list that need to take precedence over global spect_params

Some of the "private" helper functions for parsing configs might be useful to a general user.
Of course trade-off is those would then definitely need to be unit-tested and generally better maintained.
Is there key functionality that's worth factoring out?
E.g., I am using hvc.parse.extract._validate_feature_group_and_convert_to_list for the tweetynet paper to just get the list of svm features so I can iterate over them.
Seems like having a general model_name_to_features_list function could be useful, especially under a framework where a "model" is specified in a way that includes its set of features.

@NickleDave
Copy link
Collaborator Author

NickleDave commented May 9, 2021

core code in FeatureExtractor is basically unreadable, in particular the method _from_file.
All the weird stuff I did with if x in locals and np.concatenate really obscures what's going on.
It also prevents any kind of optimization that uses parallelization, e.g. dask.
I think the easiest thing to do would be re-write this in a possibly slow but readable way, make sure I didn't break it, and then optimize from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLN: clean/refactor code PEP8 fixes, refactors DEV: development release a version, change dependencies, etc.
Projects
None yet
Development

No branches or pull requests

1 participant