Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Necessity of placeholder sanity check inside LABLoader #102

Open
alephpi opened this issue Sep 25, 2024 · 0 comments
Open

Necessity of placeholder sanity check inside LABLoader #102

alephpi opened this issue Sep 25, 2024 · 0 comments

Comments

@alephpi
Copy link

alephpi commented Sep 25, 2024

Related to #99

Hi, I don't think LABLoader is behaved as expected. If I call it directly, e.g.
loader = LABLoader(path='../only_words/labs/dev/{uri}.lab')
It works smoothly.

But when I use it with a protocol pipeline, with database.yml as following:

Protocols
  AMI:
    SpeakerDiarization:
      only_words:
        train:
            uri: ../lists/train.meetings.txt
            annotation: ../only_words/rttms/train/{uri}.rttm
            annotated: ../uems/train/{uri}.uem
            lab: ../only_words/labs/train/{uri}.lab
        development:
            uri: ../lists/dev.meetings.txt
            annotation: ../only_words/rttms/dev/{uri}.rttm
            annotated: ../uems/dev/{uri}.uem
            lab: ../only_words/labs/dev/{uri}.lab

and something like

for file in only_words.development():
    lab=file['lab']

It will throw the error ValueError: path must contain the {uri} placeholder.

By debugging, I notice that in LABLoader, you explicitly check if the path contains the {uri} placeholder.

_, placeholders, _, _ = zip(*string.Formatter().parse(self.path))
self.placeholders_ = set(placeholders) - set([None])
if "uri" not in self.placeholders_:
raise ValueError("`path` must contain the {uri} placeholder.")

While in load function in Template class, this loader is called after the resolve_path.

def load(current_file: ProtocolFile):
path = resolve_path(Path(template.format(**abs(current_file))), database_yml)
# check if file exists
if not path.is_file():
msg = f"No such file or directory: '{path}' (via '{template}' template)."
raise FileNotFoundError(msg)
loader = Loader(path)
return loader(current_file)

Hence, when the LABLoader is called during the protocol pipeline, the placeholder in the template has already been resolved to certain uri. And when you check over the resolved path, it will always raise the error of no placeholder.

So, I'm doubting the necessity of doing this sanity check in the LABLoader function, since in gather_loader function, you have already done such sanity check.

# check whether value (path) contains placeholders such as {uri} or {subset}
_, placeholders, _, _ = zip(*string.Formatter().parse(value))
is_template = len(set(placeholders) - set([None])) > 0

Unless you are assuming a use case that the LABLoader is called solely, outside the protocol pipeline. In that case I think it better to simply use load_lab.

I expect your opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant