You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because there is not an explicit separator in the function that imports the mutation data table in /mutate/protocol.py, pandas expects a comma separator, but the /mutate/calculations.py function that deals with that table expects commas to separate the locations of mutations. This leads to an error that stops the pipeline if you use commas for both field separators and mutation separators. It also leads to a key error in pandas if you format the mutation data table as suggested, with semicolons as field separators in the mutation data table.
I fixed this issue on my installation of evcouplings by changing line 126 on mutate/protocol.py from data = pd.read_csv(dataset_file, comment="#") to data = pd.read_csv(dataset_file, comment="#", sep=";")
The text was updated successfully, but these errors were encountered:
This file: https://github.com/debbiemarkslab/EVcouplings/blob/develop/notebooks/example/PABP_YEAST_Fields2013-singles.csv was the only one I could find as an example for how to format a mutation effects data table, and is the one referenced in your mutation effects documentation. Are there two different functions used for the same calculations when using EVCouplings as a python package vs the command line interface where this difference in formatting should be used?
Sorry, this is an unfortunate mismatch between the mutation effect documentation and overall pipeline usage. I tagged this issue to fix the documentation example so it doesn't lead to misunderstandings in the future.
In the example notebook there is an explicit sep=";" argument in the pd.read_csv() , while the pipeline defaults to sep="," when reading the csv file (which is the intended behaviour to have csv files handled consistently across the pipeline, the example file dates back to an older set of files). The actual prediction functions applied afterwards are the same.
So as solution I propose to use a file that is formatted like the test_mutants.csv file I posted above, where any strings containing commas are wrapped in quotation marks, which I think is the de facto standard for handling this case.
Because there is not an explicit separator in the function that imports the mutation data table in /mutate/protocol.py, pandas expects a comma separator, but the /mutate/calculations.py function that deals with that table expects commas to separate the locations of mutations. This leads to an error that stops the pipeline if you use commas for both field separators and mutation separators. It also leads to a key error in pandas if you format the mutation data table as suggested, with semicolons as field separators in the mutation data table.
I fixed this issue on my installation of evcouplings by changing line 126 on mutate/protocol.py from
data = pd.read_csv(dataset_file, comment="#")
todata = pd.read_csv(dataset_file, comment="#", sep=";")
The text was updated successfully, but these errors were encountered: