You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am cross-posting a discussion from the mailing list with regard to multi-PDB files containing MODEL & ENDMDL tags, which are currently not handled by BioPandas.
However, it should definitely be handled in one way or the other. Currently, I don't have any best idea on how to handle that and would welcome and thoughts and feedback (let me cross-post that on the GitHub issue tracker -- maybe better to continue the discussion about potential ways to implement it there).
I think one of the problems with the DataFrame format is that having them all in one DataFrame would probably result in a lot of weird -- or unexpected -- results, thus it would probably best to separate the structures one way or the other ...
which would preserve the current functionality of the library without any e.g., backwards-incompatible changes. This would then also help with using the multiprocessing library more easily and efficiently for the analysis of multiple PandasPdb objects in parallel.
Right now, the PandasPdb objects have a dictionary containing multiple DataFrames
dict_keys(['ATOM', 'HETATM', 'ANISOU', 'OTHERS'])
For multi-PDB files, the dictionary could be expanded to
I strongly favor scenario 1) though; however, I would love to hear feedback on this and are open to other suggestions!
In any case, also an error (or at least a warning) should be raised if MODEL & ENDMDL tags are found in a PDB file if the current read_pdb method is used such that this doesn't lead to any unexpected behavior.
The text was updated successfully, but these errors were encountered:
I am cross-posting a discussion from the mailing list with regard to multi-PDB files containing MODEL & ENDMDL tags, which are currently not handled by BioPandas.
However, it should definitely be handled in one way or the other. Currently, I don't have any best idea on how to handle that and would welcome and thoughts and feedback (let me cross-post that on the GitHub issue tracker -- maybe better to continue the discussion about potential ways to implement it there).
I think one of the problems with the DataFrame format is that having them all in one DataFrame would probably result in a lot of weird -- or unexpected -- results, thus it would probably best to separate the structures one way or the other ...
One option would be to provide a utility function (analogous to the split_multimol2 function, http://rasbt.github.io/biopandas/tutorials/Working_with_MOL2_Structures_in_DataFrames/#parsing-multi-mol2-files) that generates multiple PandasPdb objects from such a file. I.e., it would simply be a list
pdbs = [pdb_1, pdb_2, .... pdb_n]
which would preserve the current functionality of the library without any e.g., backwards-incompatible changes. This would then also help with using the multiprocessing library more easily and efficiently for the analysis of multiple PandasPdb objects in parallel.
dict_keys(['ATOM', 'HETATM', 'ANISOU', 'OTHERS'])
For multi-PDB files, the dictionary could be expanded to
I strongly favor scenario 1) though; however, I would love to hear feedback on this and are open to other suggestions!
In any case, also an error (or at least a warning) should be raised if MODEL & ENDMDL tags are found in a PDB file if the current read_pdb method is used such that this doesn't lead to any unexpected behavior.
The text was updated successfully, but these errors were encountered: