-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topology featurizers debug #460
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request addresses a bug in the topology featurizers of the mofdscribe library. The main changes involve modifying how atom types are handled, introducing a new function for filtering elements in persistence homology calculations, and adjusting data types to improve robustness. The changes span across multiple files in the project, primarily focusing on the topology-related modules. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @XiaoqZhang - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
@@ -35,6 +35,39 @@ def filter_element( | |||
return Structure.from_sites(keep_sites) | |||
else: # input is molecule or IMolecule | |||
return Molecule.from_sites(keep_sites) | |||
|
|||
|
|||
def filter_element_for_ph( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider refactoring to consolidate logic into a single function.
The new code introduces unnecessary complexity by duplicating logic in the filter_element_for_ph
function, which increases maintenance overhead. The use of elements_group
as a tuple doesn't add functional value and makes the code harder to read. Additionally, the handling of elements
is inconsistent, which could lead to bugs. Consider refactoring to consolidate the logic into a single function, using list comprehensions for clarity, and ensuring consistent handling of elements
. This will improve readability and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly! To address the complexity issue, you could refactor the filter_element
and filter_element_for_ph
functions to share common logic. Here's a suggestion:
-
Create a Helper Function: Extract the common logic into a helper function that both
filter_element
andfilter_element_for_ph
can call. This will reduce duplication and make the code easier to maintain. -
Use List Comprehensions: If possible, use list comprehensions to make the filtering logic more concise and readable.
-
Consistent Handling of
elements
: Ensure that the handling ofelements
is consistent across both functions. This might involve standardizing the input format or processing steps.
Here's a rough example of how you might structure the refactoring:
def filter_elements_common(sites, elements, elements_group):
return [site for site in sites if site.specie.symbol in elements or site.specie.symbol in elements_group]
def filter_element(structure, elements, elements_group=()):
keep_sites = filter_elements_common(structure.sites, elements, elements_group)
return Structure.from_sites(keep_sites) if isinstance(structure, Structure) else Molecule.from_sites(keep_sites)
def filter_element_for_ph(structure, elements, elements_group=()):
# If there's any specific logic for persistence homology, handle it here
keep_sites = filter_elements_common(structure.sites, elements, elements_group)
return Structure.from_sites(keep_sites) if isinstance(structure, Structure) else Molecule.from_sites(keep_sites)
This approach keeps the core filtering logic in one place, making it easier to update and maintain. If filter_element_for_ph
requires additional specific logic, you can still handle that separately within its function. Let me know if you need further clarification or assistance!
interesting, thanks for looking into it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that deepsource is not heavily used at the moment, but why are those changes needed for this PR?
xyz_periodic_copies.append(coords) | ||
element_copies.append(np.array(elements).reshape(-1,1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, seems like a good catch, because the x==y==z==0
condition is skipped atm
@@ -34,14 +34,14 @@ class PHHist(MOFBaseFeaturizer): | |||
|
|||
def __init__( | |||
self, | |||
atom_types: Tuple[str] = ( | |||
atom_types: List[str] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple was originally used because it is immutable and hence a better default for arguments.
But since we do not change the atom_types
this is probably a safe choice
elements_ = [] | ||
elements_group = (elements,) | ||
for atom_type in elements_group: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure I understand this change. You are certainly right that the old code behaves in an unexpected way for elements with more than two symbols in the letter. But wouldn't then the clearer fix be something else? For example, always split by -
first and then handle other things
Hi, I found a bug in the topology featurizers. The original way of splitting
atom_types
will split'Cu'
to['C', 'u']
, for example. Besides, thetuple
type is error-prone to the last,
as highlighted by the red circle in the figure attached.Thus, I made several changes accordingly:
atom_types
fromtuple
tolist
;filter_element_for_ph
, which is a modified version offilter_element
for persistence homology. This function is added instead of modifying the original function becausefilter_element
is also called bymofdscribe.featurizers.chemistry.AMD
mofdscribe/src/mofdscribe/featurizers/topology/_tda_helpers.py
Line 107 in 44c44f7
mofdscribe/src/mofdscribe/featurizers/topology/_tda_helpers.py
Line 108 in 44c44f7
Summary by Sourcery
Fix the atom type splitting issue in topology featurizers by changing the data type from tuple to list and introduce a new function
filter_element_for_ph
for persistence homology. Uncomment previously commented lines in_tda_helpers.py
to restore functionality.Bug Fixes:
Enhancements:
filter_element_for_ph
to handle element filtering specifically for persistence homology, ensuring compatibility with existing functionality.