How do I open a DVC-tracked directory from Python API ? #7379
-
ContextHello, ProblemNow I want to test loading the data from the Python API. Code
import dvc.api
with dvc.api.open(
path='datasets/mydataset',
repo='git@gitlab.com:private_repo/my-data-registry',
rev='v1.0'
) as fd:
print("It worked!") # This never gets printed, it crashes with an IsADirectoryError exception QuestionHow would I go about loading a directory ? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 13 replies
This comment has been hidden.
This comment has been hidden.
-
This is expected, |
Beta Was this translation helpful? Give feedback.
-
Can we provide a new API something like |
Beta Was this translation helpful? Give feedback.
-
There's also a proposal PR for traversable pathlib-like API on #6850. |
Beta Was this translation helpful? Give feedback.
-
for the record: we plan on publishing dvcfs - fsspec implementation for dvc repos, that has methods like ls/walk/open/download/etc, which would likely be much more handy than trying to fit dir handling into existing api.open. |
Beta Was this translation helpful? Give feedback.
-
ResolutionClosing this discussion since my issue can be resolved by using That being said I think it would be very appreciated to have good doc on how to handle this use case since such use case should be fairly common for Deep Learning cases. Lacking to do so might mean losing a good portion of the community affected by this use case since their first impression will be that this use case is not supported. I don't think it's super obvious for new users that the I think this will be solved by @skshetry 's PR as his suggestion with Thanks everyone for your help, the DVC community is solid! |
Beta Was this translation helpful? Give feedback.
This is expected,
api.open
can only be used to open files. If you passed it something likedatasets/mydataset/some_file_inside_dataset
then it would open the file.