topology featurizers debug #460

XiaoqZhang · 2024-08-14T08:46:47Z

Hi, I found a bug in the topology featurizers. The original way of splitting atom_types will split 'Cu' to ['C', 'u'], for example. Besides, the tuple type is error-prone to the last , as highlighted by the red circle in the figure attached.

Thus, I made several changes accordingly:

change the data type of atom_types from tuple to list;
add a function filter_element_for_ph, which is a modified version of filter_element for persistence homology. This function is added instead of modifying the original function because filter_element is also called by mofdscribe.featurizers.chemistry.AMD
uncomment the two lines:

mofdscribe/src/mofdscribe/featurizers/topology/_tda_helpers.py

Line 107 in 44c44f7

# xyz_periodic_copies.append(coords)

and

mofdscribe/src/mofdscribe/featurizers/topology/_tda_helpers.py

Line 108 in 44c44f7

# element_copies.append(np.array(elements).reshape(-1,1))

Summary by Sourcery

Fix the atom type splitting issue in topology featurizers by changing the data type from tuple to list and introduce a new function filter_element_for_ph for persistence homology. Uncomment previously commented lines in _tda_helpers.py to restore functionality.

Bug Fixes:

Fix the incorrect splitting of atom types in topology featurizers by changing the data type from tuple to list.

Enhancements:

Introduce a new function filter_element_for_ph to handle element filtering specifically for persistence homology, ensuring compatibility with existing functionality.

sourcery-ai · 2024-08-14T08:46:55Z

Reviewer's Guide by Sourcery

This pull request addresses a bug in the topology featurizers of the mofdscribe library. The main changes involve modifying how atom types are handled, introducing a new function for filtering elements in persistence homology calculations, and adjusting data types to improve robustness. The changes span across multiple files in the project, primarily focusing on the topology-related modules.

File-Level Changes

Files	Changes
`src/mofdscribe/featurizers/utils/substructures.py`	Introduced a new function 'filter_element_for_ph' specifically for persistence homology calculations
`src/mofdscribe/featurizers/topology/ph_image.py` `src/mofdscribe/featurizers/topology/ph_hist.py` `src/mofdscribe/featurizers/topology/ph_stats.py` `src/mofdscribe/featurizers/topology/ph_vect.py`	Changed the data type of 'atom_types' from tuple to list in multiple classes
`src/mofdscribe/featurizers/topology/_tda_helpers.py`	Modified the 'get_persistent_images_for_structure' function to use the new 'filter_element_for_ph' function
`src/mofdscribe/featurizers/topology/_tda_helpers.py`	Uncommented two lines in the 'make_supercell' function to include original coordinates and elements
`src/mofdscribe/featurizers/topology/ph_image.py` `src/mofdscribe/featurizers/topology/ph_hist.py` `src/mofdscribe/featurizers/topology/ph_stats.py` `src/mofdscribe/featurizers/topology/ph_vect.py`	Updated the handling of atom types in various classes to accommodate the new list data type

Tips

Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
Continue your discussion with Sourcery by replying directly to review comments.
You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
You can always contact us if you have any questions or feedback.

sourcery-ai

Hey @XiaoqZhang - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

sourcery-ai · 2024-08-14T08:48:12Z

src/mofdscribe/featurizers/utils/substructures.py

@@ -35,6 +35,39 @@ def filter_element(
        return Structure.from_sites(keep_sites)
    else:  # input is molecule or IMolecule
        return Molecule.from_sites(keep_sites)
+
+
+def filter_element_for_ph(


issue (complexity): Consider refactoring to consolidate logic into a single function.

The new code introduces unnecessary complexity by duplicating logic in the filter_element_for_ph function, which increases maintenance overhead. The use of elements_group as a tuple doesn't add functional value and makes the code harder to read. Additionally, the handling of elements is inconsistent, which could lead to bugs. Consider refactoring to consolidate the logic into a single function, using list comprehensions for clarity, and ensuring consistent handling of elements. This will improve readability and maintainability.

can you make a suggestion?

Certainly! To address the complexity issue, you could refactor the filter_element and filter_element_for_ph functions to share common logic. Here's a suggestion:

Create a Helper Function: Extract the common logic into a helper function that both filter_element and filter_element_for_ph can call. This will reduce duplication and make the code easier to maintain.

Use List Comprehensions: If possible, use list comprehensions to make the filtering logic more concise and readable.

Consistent Handling of elements: Ensure that the handling of elements is consistent across both functions. This might involve standardizing the input format or processing steps.

Here's a rough example of how you might structure the refactoring:

def filter_elements_common(sites, elements, elements_group): return [site for site in sites if site.specie.symbol in elements or site.specie.symbol in elements_group] def filter_element(structure, elements, elements_group=()): keep_sites = filter_elements_common(structure.sites, elements, elements_group) return Structure.from_sites(keep_sites) if isinstance(structure, Structure) else Molecule.from_sites(keep_sites) def filter_element_for_ph(structure, elements, elements_group=()): # If there's any specific logic for persistence homology, handle it here keep_sites = filter_elements_common(structure.sites, elements, elements_group) return Structure.from_sites(keep_sites) if isinstance(structure, Structure) else Molecule.from_sites(keep_sites)

This approach keeps the core filtering logic in one place, making it easier to update and maintain. If filter_element_for_ph requires additional specific logic, you can still handle that separately within its function. Let me know if you need further clarification or assistance!

kjappelbaum · 2024-08-18T21:03:50Z

interesting, thanks for looking into it!

kjappelbaum · 2024-08-18T21:05:41Z

.deepsource.toml

I agree that deepsource is not heavily used at the moment, but why are those changes needed for this PR?

kjappelbaum · 2024-08-18T21:07:18Z

src/mofdscribe/featurizers/topology/_tda_helpers.py

+    xyz_periodic_copies.append(coords)
+    element_copies.append(np.array(elements).reshape(-1,1))


ok, seems like a good catch, because the x==y==z==0 condition is skipped atm

kjappelbaum · 2024-08-18T21:08:36Z

src/mofdscribe/featurizers/topology/ph_hist.py

@@ -34,14 +34,14 @@ class PHHist(MOFBaseFeaturizer):

    def __init__(
        self,
-        atom_types: Tuple[str] = (
+        atom_types: List[str] = [


Tuple was originally used because it is immutable and hence a better default for arguments.

But since we do not change the atom_types this is probably a safe choice

kjappelbaum · 2024-08-18T21:23:25Z

src/mofdscribe/featurizers/utils/substructures.py

+    elements_ = []
+    elements_group = (elements,)
+    for atom_type in elements_group:


i'm not sure I understand this change. You are certainly right that the old code behaves in an unexpected way for elements with more than two symbols in the letter. But wouldn't then the clearer fix be something else? For example, always split by - first and then handle other things

topology featurizers debug

eabe0dc

sourcery-ai bot reviewed Aug 14, 2024

View reviewed changes

ci: update .deepsource.toml

376c598

kjappelbaum reviewed Aug 18, 2024

View reviewed changes

.deepsource.toml

Copy link

Owner

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that deepsource is not heavily used at the moment, but why are those changes needed for this PR?

kjappelbaum reviewed Aug 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

topology featurizers debug #460

topology featurizers debug #460

XiaoqZhang commented Aug 14, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Aug 14, 2024 •

edited

Loading

sourcery-ai bot left a comment

sourcery-ai bot Aug 14, 2024

kjappelbaum Aug 18, 2024

sourcery-ai bot Aug 18, 2024

kjappelbaum commented Aug 18, 2024

kjappelbaum Aug 18, 2024

kjappelbaum Aug 18, 2024

kjappelbaum Aug 18, 2024

kjappelbaum Aug 18, 2024

		xyz_periodic_copies.append(coords)
		element_copies.append(np.array(elements).reshape(-1,1))

topology featurizers debug #460

Are you sure you want to change the base?

topology featurizers debug #460

Conversation

XiaoqZhang commented Aug 14, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Aug 14, 2024 • edited Loading

Reviewer's Guide by Sourcery

File-Level Changes

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Aug 14, 2024

Choose a reason for hiding this comment

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

sourcery-ai bot Aug 18, 2024

Choose a reason for hiding this comment

kjappelbaum commented Aug 18, 2024

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

kjappelbaum Aug 18, 2024

Choose a reason for hiding this comment

XiaoqZhang commented Aug 14, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Aug 14, 2024 •

edited

Loading