Bug in extracting sample id from script aggregate_rsem_results.py #524

mw201608 · 2023-02-08T16:36:29Z

In the container
The rsem aggregation step failed to extract the proper sample ids when sample names contained character ., and may even crashed (error message shown below) when the partially extracted sample ids are not unique.

Traceback (most recent call last):
  File "/opt/src/aggregate_rsem_results.py", line 90, in <module>
    df_dict[c].to_hdf(tmp_store.name, '{0:s}{1:d}'.format(c,k))
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 2490, in to_hdf
    pytables.to_hdf(
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 282, in to_hdf
    f(store)
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 265, in <lambda>
    f = lambda store: store.put(
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 1030, in put
    self._write_to_group(
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 1697, in _write_to_group
    s.write(
  File "/hpc/users/wangm05/.local/lib/python3.8/site-packages/pandas/io/pytables.py", line 3093, in write
    raise ValueError("Columns index has to be unique for fixed format")
ValueError: Columns index has to be unique for fixed format

This is because the sample ids are extracted through string split by .,
sample_id = filename.split('.')[0]

It might be better to use a regular expression to extract the complete sample ids, like
re.sub(".rsem.[a-z]+.results", "", filename)

Minghui

The text was updated successfully, but these errors were encountered:

gaow · 2023-02-08T16:39:25Z

Thanks @mw201608 ! @hsun3163 please correct me if I'm wrong but that files was from GTEx repository right? I wonder if we should just change it and maintain a local copy of it along the lines @mw201608 suggested or if you can check they have an update? I say we start changing it locally and then we send PR on their repo as we see fit?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in extracting sample id from script aggregate_rsem_results.py #524

Bug in extracting sample id from script aggregate_rsem_results.py #524

mw201608 commented Feb 8, 2023 •

edited

Loading

gaow commented Feb 8, 2023

Bug in extracting sample id from script aggregate_rsem_results.py #524

Bug in extracting sample id from script aggregate_rsem_results.py #524

Comments

mw201608 commented Feb 8, 2023 • edited Loading

gaow commented Feb 8, 2023

mw201608 commented Feb 8, 2023 •

edited

Loading