summarize_networks InvalidIndexError #191

ntalluri · 2024-10-25T19:13:49Z

PR for #188

Currently created a test case to showcase the error and how it is happening

test/ml/test_ml.py

ntalluri · 2024-11-18T22:31:04Z

spras/analysis/ml.py

@@ -78,6 +78,7 @@ def summarize_networks(file_paths: Iterable[Union[str, PathLike]]) -> pd.DataFra
                str(tup[0]): 1,
            }, index=tup[1]
        )
+        dataframe = dataframe[~dataframe.index.duplicated(keep='first')]


https://pandas.pydata.org/docs/reference/api/pandas.Index.duplicated.html

Suggested change

dataframe = dataframe[~dataframe.index.duplicated(keep='first')]

# Mark the first occurrence of a duplicate as True and all others as False

# Non-duplicates are also marked as True

dataframe = dataframe[~dataframe.index.duplicated(keep='first')]

ntalluri · 2024-11-18T22:34:38Z

spras/analysis/ml.py

@@ -78,6 +78,7 @@ def summarize_networks(file_paths: Iterable[Union[str, PathLike]]) -> pd.DataFra
                str(tup[0]): 1,
            }, index=tup[1]
        )
+        dataframe = dataframe[~dataframe.index.duplicated(keep='first')]


add a warning that edges are being dropped from a file

If this is hard to do here, we can think about whether it should be handled somewhere else. For instance, should the parse_output functions check for duplicate edges and raise a warning or error?

agitter

What you have looks good.

If you didn't already open an issue about MinCostFlow giving the duplicate issues that trigged this problem, please do that and link it to #188 (comment). We may want to follow up on that potential bug later.

agitter · 2024-11-22T22:56:48Z

spras/analysis/ml.py

@@ -78,6 +78,7 @@ def summarize_networks(file_paths: Iterable[Union[str, PathLike]]) -> pd.DataFra
                str(tup[0]): 1,
            }, index=tup[1]
        )
+        dataframe = dataframe[~dataframe.index.duplicated(keep='first')]


Suggested change

dataframe = dataframe[~dataframe.index.duplicated(keep='first')]

# Mark the first occurrence of a duplicate as True and all others as False

# Non-duplicates are also marked as True

dataframe = dataframe[~dataframe.index.duplicated(keep='first')]

agitter · 2024-11-22T22:58:19Z

spras/analysis/ml.py

@@ -78,6 +78,7 @@ def summarize_networks(file_paths: Iterable[Union[str, PathLike]]) -> pd.DataFra
                str(tup[0]): 1,
            }, index=tup[1]
        )
+        dataframe = dataframe[~dataframe.index.duplicated(keep='first')]


If this is hard to do here, we can think about whether it should be handled somewhere else. For instance, should the parse_output functions check for duplicate edges and raise a warning or error?

add repeat edge in pathway.txt test

849ebc4

ntalluri mentioned this pull request Oct 25, 2024

InvalidIndexError when running ml code #188

Open

fix InvalidIndexError

f55855a

ntalluri commented Oct 28, 2024

View reviewed changes

test/ml/test_ml.py Outdated Show resolved Hide resolved

ntalluri requested a review from agitter November 4, 2024 22:57

ntalluri commented Nov 18, 2024

View reviewed changes

agitter reviewed Nov 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

summarize_networks InvalidIndexError #191

summarize_networks InvalidIndexError #191

ntalluri commented Oct 25, 2024

ntalluri Nov 18, 2024

agitter Nov 22, 2024

ntalluri Nov 18, 2024

agitter Nov 22, 2024

agitter left a comment

agitter Nov 22, 2024

agitter Nov 22, 2024

summarize_networks InvalidIndexError #191

Are you sure you want to change the base?

summarize_networks InvalidIndexError #191

Conversation

ntalluri commented Oct 25, 2024

ntalluri Nov 18, 2024

Choose a reason for hiding this comment

agitter Nov 22, 2024

Choose a reason for hiding this comment

ntalluri Nov 18, 2024

Choose a reason for hiding this comment

agitter Nov 22, 2024

Choose a reason for hiding this comment

agitter left a comment

Choose a reason for hiding this comment

agitter Nov 22, 2024

Choose a reason for hiding this comment

agitter Nov 22, 2024

Choose a reason for hiding this comment