You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the container
The rsem aggregation step failed to extract the proper sample ids when sample names contained character ., and may even crashed (error message shown below) when the partially extracted sample ids are not unique.
Traceback (most recent call last):
File "/opt/src/aggregate_rsem_results.py", line 90, in <module>
df_dict[c].to_hdf(tmp_store.name, '{0:s}{1:d}'.format(c,k))
File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 2490, in to_hdf
pytables.to_hdf(
File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 282, in to_hdf
f(store)
File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 265, in <lambda>
f = lambda store: store.put(
File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 1030, in put
self._write_to_group(
File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 1697, in _write_to_group
s.write(
File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 3093, in write
raise ValueError("Columns index has to be unique for fixed format")
ValueError: Columns index has to be unique for fixed format
This is because the sample ids are extracted through string split by ., sample_id = filename.split('.')[0]
It might be better to use a regular expression to extract the complete sample ids, like re.sub(".rsem.[a-z]+.results", "", filename)
Minghui
The text was updated successfully, but these errors were encountered:
Thanks @mw201608 ! @hsun3163 please correct me if I'm wrong but that files was from GTEx repository right? I wonder if we should just change it and maintain a local copy of it along the lines @mw201608 suggested or if you can check they have an update? I say we start changing it locally and then we send PR on their repo as we see fit?
In the container
The rsem aggregation step failed to extract the proper sample ids when sample names contained character
.
, and may even crashed (error message shown below) when the partially extracted sample ids are not unique.This is because the sample ids are extracted through string split by
.
,sample_id = filename.split('.')[0]
It might be better to use a regular expression to extract the complete sample ids, like
re.sub(".rsem.[a-z]+.results", "", filename)
Minghui
The text was updated successfully, but these errors were encountered: